data_importer/README.md

117 lines
No EOL
3.8 KiB
Markdown

# data_importer
This is a rails/postgres application that will serve json data from the following data sources:
- Cves
- Cpes
- CNA security advisories
- GHSA Github security advisories
- Github repositories that track public exploits for cves.
- A list of github usernames github API data.
Check the HTTP API section below for specific endpoints that can be queried via http.
## Supported data models:
- `Cve` data from [cve_list](https://github.com/CVEProject/cvelist) github repo.
- `Cpe` data from [nvd](https://nvd.nist.gov/products/cpe) 2.2 format.
- `Cna` data from [mitre](https://raw.githubusercontent.com/CVEProject/cve-website/dev/src/assets/data/CNAsList.json).
- `GithubPoc` data from [nomi-sec](https://github.com/nomi-sec/PoC-in-GitHub) github repo.
- `GithubAdvisory` data from [github_advisories_database](https://github.com/github/advisory-database/) github repo.
- `GithubUser` data from [github_graphql_api](https://docs.github.com/en/graphql)
- `InthewildCveExploit` data from [inthewild.io](https://inthewild.io/api/exploited) exploited feed.
- `TrickestPocCve` data from [trickest](https://github.com/trickest/cve) github repo.
- `CvemonCve` data from [ARPSyndicate](https://raw.githubusercontent.com/ARPSyndicate/cvemon/main/data.json) github repo.
## Initial Setup
### Environment files
Create the following file that will contain the environment variables we need to login to APIs:
`credentials.env`
```
# Twitter stuff doesnt work right now.
# twitter_bearer_token=
# twitter_api_key=
# twitter_access_token_secret=
# twitter_access_token=
# twitter_api_key_secret=
github_api_token=
```
### Build container
`docker-compose build`
### Database creation and seeding initial data
```
docker-compose run web rake db:create
docker-compose run web rake db:migrate
docker-compose run web rake db:seed
```
### Launch Pry console
`docker-compose run web rails console`
### HTTP API
For now unauthenticated api over localhost:3000 until I put in some basic token auth. All response data is json rendered.
#### Cves
```
get "/cves", to: "cves#index"
get "/cves/:cve_id", to: "cves#show"
get "/cves/years/:year", to: "cves#show_year"
```
#### Cpes
```
get "/cpes", to: "cpes#index"
get "/cpes/:id", to: "cpes#show"
```
#### Cnas
```
get "/cnas", to: "cnas#index"
get "/cnas/:id", to: "cnas#show"
get "/cnas/cna/:cna_id", to: "cnas#show_for_cna"
```
#### GithubAdvisories
```
get "/github_advisories", to: "github_advisories#index"
get "/github_advisories/:ghsa_id", to: "github_advisories#show"
```
#### GithubUsers
Create a text file named `./data/github_usernames.txt` with one username per line
There is a seed task that will read this file and perform an API call to github API and store the data in DB for each user.
```
get "/github_users", to: "github_users#index"
get "/github_users/:username", to: "github_users#show"
```
#### GithubPocs
```
get "/github_pocs", to: "github_pocs#index"
get "/github_pocs/:id", to: "github_pocs#show"
get "/github_pocs/cve/:cve_id", to: "github_pocs#show_for_cve"
get "/github_pocs/years/:year", to: "github_pocs#show_year"
```
#### InthewildCveExploits
```
get "/inthewild_cve_exploits", to: "inthewild_cve_exploits#index"
get "/inthewild_cve_exploits/:cve_id", to: "inthewild_cve_exploits#show"
```
#### TrickestPocCves
```
get "/trickest_poc_cves", to: "trickest_poc_cves#index"
get "/trickest_poc_cves/:id", to: "trickest_poc_cves#show"
get "/trickest_poc_cves/cve/:cve_id", to: "trickest_poc_cves#show_for_cve"
get "/trickest_poc_cves/years/:year", to: "trickest_poc_cves#show_year"
```
#### CvemonCves
```
get "/cvemon_cves", to: "cvemon_cves#index"
get "/cvemon_cves/:id", to: "cvemon_cves#show"
get "/cvemon_cves/cve/:cve_id", to: "cvemon_cves#show_for_cve"
get "/cvemon_cves/years/:year", to: "cvemon_cves#show_year"
```