48 lines
1.7 KiB
Markdown
48 lines
1.7 KiB
Markdown
## Step 1:
|
|
I used this python script https://github.com/x4nth055/pythoncode-tutorials/tree/master/web-scraping/html-table-extractor
|
|
to extract all of the tables from a redhat documentation URL.
|
|
|
|
```
|
|
# mk some datadirs
|
|
mkdir data
|
|
mkdir -p data/redhat8/security_api_results
|
|
mkdir -p data/redhat7/security_api_results
|
|
mkdir -p data/redhat6/security_api_results
|
|
|
|
# run the program to scrape and convert the data to csv
|
|
python html_table_extractor.py "https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/package_manifest/index"
|
|
[+] Found a total of 9 tables.
|
|
[+] Saving table-1
|
|
[+] Saving table-2
|
|
[+] Saving table-3
|
|
[+] Saving table-4
|
|
[+] Saving table-5
|
|
[+] Saving table-6
|
|
[+] Saving table-7
|
|
[+] Saving table-8
|
|
[+] Saving table-9
|
|
```
|
|
|
|
This will create a csv file per table found in the html-single page result of a given distro.
|
|
|
|
## Step 2:
|
|
To process and de-duplicate all of the packages further, I created one master CSV file in each directory for each distro by doing the following filtering on the commandline against each table csv file.
|
|
|
|
```
|
|
cat table-* | cut -f 2 -d , | sort | uniq | sort > all_redhat7_rpm_package_manifest.csv
|
|
```
|
|
|
|
and this step was repeated for redhat 8, 7, and 6.
|
|
|
|
## Step 3:
|
|
After creating a list of each base set pkg name in the distro, we can then feed these pkgs into a query against the redhat security api using the following example loop:
|
|
|
|
```
|
|
cd data/redhat8
|
|
|
|
for pkg in $(cat all_redhat8_rpm_package_manifest.csv);
|
|
do curl "https://access.redhat.com/hydra/rest/securitydata/cve.json?package=$pkg" > ./security_api_results/${pkg}_security_api_results.json;
|
|
done
|
|
```
|
|
|
|
this will send out api calls to the security api asking for cves in json format of the given pkgname.
|