data_importer/README.md

129 lines
4.5 KiB
Markdown
Raw Normal View History

2022-03-30 16:02:23 +00:00
# data_importer
2022-04-07 04:39:50 -05:00
This is a rails/postgres application that will serve json data from the following data sources:
- Cves
- Cpes
2022-04-11 18:45:02 -05:00
- CNA security advisories
- GHSA Github security advisories
- Github repositories that track public exploits for cves.
2022-04-11 20:55:11 -05:00
- A list of github usernames github API data.
2022-04-07 04:39:50 -05:00
Check the HTTP API section below for specific endpoints that can be queried via http.
2022-03-30 16:02:23 +00:00
2022-04-06 02:35:02 -05:00
## Supported data models:
- `Cve` data from [cve_list](https://github.com/CVEProject/cvelist) github repo.
2022-04-06 02:35:02 -05:00
- `Cpe` data from [nvd](https://nvd.nist.gov/products/cpe) 2.2 format.
2022-04-07 18:15:21 -05:00
- `Cna` data from [mitre](https://raw.githubusercontent.com/CVEProject/cve-website/dev/src/assets/data/CNAsList.json).
- `GithubPoc` data from [nomi-sec](https://github.com/nomi-sec/PoC-in-GitHub) github repo.
2022-04-11 20:57:00 -05:00
- `GithubAdvisory` data from [github_advisories_database](https://github.com/github/advisory-database/) github repo.
- `GithubUser` data from [github_graphql_api](https://docs.github.com/en/graphql)
2022-04-06 02:35:02 -05:00
- `InthewildCveExploit` data from [inthewild.io](https://inthewild.io/api/exploited) exploited feed.
- `TrickestPocCve` data from [trickest](https://github.com/trickest/cve) github repo.
2022-04-07 04:33:56 -05:00
- `CvemonCve` data from [ARPSyndicate](https://raw.githubusercontent.com/ARPSyndicate/cvemon/main/data.json) github repo.
2022-04-06 02:35:02 -05:00
## Initial Setup
2022-03-30 16:02:23 +00:00
### Environment files
Create the following file that will contain the environment variables we need to login to APIs:
`credentials.env`
```
# Twitter stuff doesnt work right now.
# twitter_bearer_token=
# twitter_api_key=
# twitter_access_token_secret=
# twitter_access_token=
# twitter_api_key_secret=
github_api_token=
```
2022-04-01 14:10:16 -05:00
### Build container
`docker-compose build`
2022-03-30 16:02:23 +00:00
2022-04-01 14:10:16 -05:00
### Database creation and seeding initial data
2022-04-01 14:14:33 -05:00
```
docker-compose run web rake db:create
docker-compose run web rake db:migrate
2022-04-06 02:35:02 -05:00
docker-compose run web rake db:seed
2022-04-01 14:14:33 -05:00
```
2022-03-30 16:02:23 +00:00
### Running faktory
```
# Launch containers
docker-compose up -d
```
visit http://localhost:7420 in a web browser for faktory web UI.
### Scheduling import jobs
A default crontab.yaml has been provided with a reasonable schedule. It uses the [faktory_cron](https://github.com/cdrx/faktory_cron) to schedule and ship importer worker jobs to faktory.
2022-04-01 14:10:16 -05:00
### Launch Pry console
`docker-compose run web rails console`
2022-04-06 02:35:02 -05:00
### HTTP API
2022-04-06 02:45:11 -05:00
For now unauthenticated api over localhost:3000 until I put in some basic token auth. All response data is json rendered.
2022-04-06 02:35:02 -05:00
#### Cves
```
get "/cves", to: "cves#index"
get "/cves/:cve_id", to: "cves#show"
get "/cves/years/:year", to: "cves#show_year"
```
#### Cpes
```
get "/cpes", to: "cpes#index"
get "/cpes/:id", to: "cpes#show"
```
2022-04-07 18:15:21 -05:00
#### Cnas
```
get "/cnas", to: "cnas#index"
get "/cnas/:id", to: "cnas#show"
get "/cnas/cna/:cna_id", to: "cnas#show_for_cna"
```
2022-04-11 18:45:02 -05:00
#### GithubAdvisories
```
get "/github_advisories", to: "github_advisories#index"
get "/github_advisories/:ghsa_id", to: "github_advisories#show"
```
#### GithubUsers
Create a text file named `./data/github_usernames.txt` with one username per line
2022-04-12 14:55:04 -05:00
There is a seed task that will read this file and perform an API call to github API and store the data in DB for each user. The API calls made are using the following graphQL endpoints:
- [User](https://docs.github.com/en/graphql/reference/objects#user) Note: the following keys are returned - github_id, login, name, avatar_url, bio, bio_html, location
- [RepositoryInfo](https://docs.github.com/en/graphql/reference/interfaces#repositoryinfo) Note: An array is returned of each public repository of the user.
```
get "/github_users", to: "github_users#index"
get "/github_users/:username", to: "github_users#show"
```
2022-04-06 02:35:02 -05:00
#### GithubPocs
```
get "/github_pocs", to: "github_pocs#index"
get "/github_pocs/:id", to: "github_pocs#show"
get "/github_pocs/cve/:cve_id", to: "github_pocs#show_for_cve"
get "/github_pocs/years/:year", to: "github_pocs#show_year"
2022-04-06 02:35:02 -05:00
```
#### InthewildCveExploits
```
get "/inthewild_cve_exploits", to: "inthewild_cve_exploits#index"
get "/inthewild_cve_exploits/:cve_id", to: "inthewild_cve_exploits#show"
```
#### TrickestPocCves
```
get "/trickest_poc_cves", to: "trickest_poc_cves#index"
get "/trickest_poc_cves/:id", to: "trickest_poc_cves#show"
get "/trickest_poc_cves/cve/:cve_id", to: "trickest_poc_cves#show_for_cve"
get "/trickest_poc_cves/years/:year", to: "trickest_poc_cves#show_year"
2022-04-07 04:32:08 -05:00
```
#### CvemonCves
```
get "/cvemon_cves", to: "cvemon_cves#index"
get "/cvemon_cves/:id", to: "cvemon_cves#show"
get "/cvemon_cves/cve/:cve_id", to: "cvemon_cves#show_for_cve"
get "/cvemon_cves/years/:year", to: "cvemon_cves#show_year"
2022-04-06 02:35:02 -05:00
```