data_importer/README.md

4.5 KiB

data_importer

This is a rails/postgres application that will serve json data from the following data sources:

  • Cves
  • Cpes
  • CNA security advisories
  • GHSA Github security advisories
  • Github repositories that track public exploits for cves.
  • A list of github usernames github API data.

Check the HTTP API section below for specific endpoints that can be queried via http.

Supported data models:

Initial Setup

Environment files

Create the following file that will contain the environment variables we need to login to APIs: credentials.env

# Twitter stuff doesnt work right now.
# twitter_bearer_token=
# twitter_api_key=
# twitter_access_token_secret=
# twitter_access_token=
# twitter_api_key_secret=

github_api_token=

Build container

docker-compose build

Database creation and seeding initial data

docker-compose run web rake db:create
docker-compose run web rake db:migrate
docker-compose run web rake db:seed

Running faktory

# Launch containers
docker-compose up -d

visit http://localhost:7420 in a web browser for faktory web UI.

Scheduling import jobs

A default crontab.yaml has been provided with a reasonable schedule. It uses the faktory_cron to schedule and ship importer worker jobs to faktory.

Launch Pry console

docker-compose run web rails console

HTTP API

For now unauthenticated api over localhost:3000 until I put in some basic token auth. All response data is json rendered.

Cves

  get "/cves", to: "cves#index"
  get "/cves/:cve_id", to: "cves#show"
  get "/cves/years/:year", to: "cves#show_year"

Cpes

  get "/cpes", to: "cpes#index"
  get "/cpes/:id", to: "cpes#show"

Cnas

  get "/cnas", to: "cnas#index"
  get "/cnas/:id", to: "cnas#show"
  get "/cnas/cna/:cna_id", to: "cnas#show_for_cna"

GithubAdvisories

  get "/github_advisories", to: "github_advisories#index"
  get "/github_advisories/:ghsa_id", to: "github_advisories#show"

GithubUsers

Create a text file named ./data/github_usernames.txt with one username per line There is a seed task that will read this file and perform an API call to github API and store the data in DB for each user. The API calls made are using the following graphQL endpoints:

  • User Note: the following keys are returned - github_id, login, name, avatar_url, bio, bio_html, location
  • RepositoryInfo Note: An array is returned of each public repository of the user.
  get "/github_users", to: "github_users#index"
  get "/github_users/:username", to: "github_users#show"

GithubPocs

  get "/github_pocs", to: "github_pocs#index"
  get "/github_pocs/:id", to: "github_pocs#show"
  get "/github_pocs/cve/:cve_id", to: "github_pocs#show_for_cve"
  get "/github_pocs/years/:year", to: "github_pocs#show_year"

InthewildCveExploits

  get "/inthewild_cve_exploits", to: "inthewild_cve_exploits#index"
  get "/inthewild_cve_exploits/:cve_id", to: "inthewild_cve_exploits#show"

TrickestPocCves

  get "/trickest_poc_cves", to: "trickest_poc_cves#index"
  get "/trickest_poc_cves/:id", to: "trickest_poc_cves#show"
  get "/trickest_poc_cves/cve/:cve_id", to: "trickest_poc_cves#show_for_cve"
  get "/trickest_poc_cves/years/:year", to: "trickest_poc_cves#show_year"

CvemonCves

  get "/cvemon_cves", to: "cvemon_cves#index"
  get "/cvemon_cves/:id", to: "cvemon_cves#show"
  get "/cvemon_cves/cve/:cve_id", to: "cvemon_cves#show_for_cve"
  get "/cvemon_cves/years/:year", to: "cvemon_cves#show_year"