bpmcdevitt/auto_sigma_rule_generator

Fork 0

This project is a proof of concept to see if we can have a program create SIGMA rules based on information in new CVEs that are published. - Extracts CVE records from the National Vulnerability Database - Extracts exploit data from Github repoositories, ExploitDB, and the CISA Known Exploited Vulnerabilities catalog - Extracts text data from reference links found on both exploit records + CVE records - Sends exploit data + reference data to LLM to create SIGMA rules based on the content This data is not meant for production use and is considered experimental. Inspired from: https://blogs.night-wolf.io/sigmagen-ai-powered-attck-mapped-threat-detection-with-sigma-rules

Find a file

bpmcdevitt a6fb367ed4 refactor: modularize backend architecture for improved maintainability - Extract database models from monolithic main.py (2,373 lines) into organized modules - Implement service layer pattern with dedicated business logic classes - Split API endpoints into modular FastAPI routers by functionality - Add centralized configuration management with environment variable handling - Create proper separation of concerns across data, service, and presentation layers Architecture Changes: - models/: SQLAlchemy database models (CVE, SigmaRule, RuleTemplate, BulkProcessingJob) - config/: Centralized settings and database configuration - services/: Business logic (CVEService, SigmaRuleService, GitHubExploitAnalyzer) - routers/: Modular API endpoints (cves, sigma_rules, bulk_operations, llm_operations) - schemas/: Pydantic request/response models Key Improvements: - 95% reduction in main.py size (2,373 → 120 lines) - Updated 15+ backend files with proper import structure - Eliminated circular dependencies and tight coupling - Enhanced testability with isolated service components - Better code organization for team collaboration Backward Compatibility: - All API endpoints maintain same URLs and behavior - Zero breaking changes to existing functionality - Database schema unchanged - Environment variables preserved 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-07-14 17:51:23 -05:00
backend	refactor: modularize backend architecture for improved maintainability	2025-07-14 17:51:23 -05:00
exploit-db-mirror@99e10e9ba8	add kev support, exploitDB mirror support	2025-07-10 16:19:43 -05:00
frontend	add job scheduler	2025-07-11 09:16:57 -05:00
github_poc_collector@5c171fb9a9	added git submodule for more exploits. added template dir for base yaml templates for sigma rules	2025-07-09 11:58:29 -05:00
.env.example	add claude client + generic llm client using langchain	2025-07-09 18:02:45 -05:00
.gitignore	fix build errors	2025-07-08 09:10:25 -05:00
.gitmodules	add kev support, exploitDB mirror support	2025-07-10 16:19:43 -05:00
docker-compose.yml	add ollama to docker-compose for local model testing	2025-07-10 21:32:15 -05:00
init.sql	add reference data gathering	2025-07-10 17:30:12 -05:00
Makefile	fix build errors	2025-07-08 09:10:25 -05:00
README.md	add cve2capec client to map mitre attack data to cves	2025-07-14 15:48:10 -05:00
REFACTOR_NOTES.md	refactor: modularize backend architecture for improved maintainability	2025-07-14 17:51:23 -05:00
start.sh	more updates for bulk	2025-07-08 17:50:01 -05:00

README.md

CVE-SIGMA Auto Generator

Automated platform that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis.

✨ Key Features

Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
Real-time Monitoring: Live job tracking and progress dashboard
Advanced Indicators: Extract processes, files, network patterns from actual exploits

🚀 Quick Start

Prerequisites

Docker and Docker Compose
(Optional) API keys for enhanced features

Installation

# Clone and start
git clone <repository-url>
cd auto_sigma_rule_generator
chmod +x start.sh
./start.sh

Access Points:

First Run

The application automatically:

Initializes database with rule templates
Fetches recent CVEs from NVD
Generates SIGMA rules with AI enhancement
Polls for new CVEs hourly

🎯 Usage

Web Interface

Dashboard: Statistics and system overview
CVEs: Complete CVE listing with PoC data
SIGMA Rules: Generated detection rules
Bulk Jobs: Processing status and controls

API Endpoints

Core Operations

# Fetch CVEs
curl -X POST http://localhost:8000/api/fetch-cves

# Bulk processing
curl -X POST http://localhost:8000/api/bulk-seed
curl -X POST http://localhost:8000/api/incremental-update

# LLM-enhanced rules
curl -X POST http://localhost:8000/api/llm-enhanced-rules

Data Access

GET /api/cves - List CVEs
GET /api/sigma-rules - List rules
GET /api/stats - Statistics
GET /api/llm-status - LLM provider status

⚙️ Configuration

Environment Variables

Core Settings

DATABASE_URL=postgresql://user:pass@db:5432/dbname
NVD_API_KEY=your_nvd_key          # Optional: 5→50 req/30s
GITHUB_TOKEN=your_github_token     # Optional: Enhanced PoC analysis

LLM Configuration

LLM_PROVIDER=ollama               # Default: ollama (local)
LLM_MODEL=llama3.2               # Provider-specific model
OLLAMA_BASE_URL=http://ollama:11434

# External providers (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key

API Keys Setup

NVD API (Recommended)

Get key: https://nvd.nist.gov/developers/request-an-api-key
Add to .env: NVD_API_KEY=your_key
Benefit: 10x rate limit increase

GitHub Token (Optional)

Create: https://github.com/settings/tokens (public_repo scope)
Add to .env: GITHUB_TOKEN=your_token
Benefit: Enhanced exploit-based rules

LLM APIs (Optional)

Local Ollama: No setup required (default)
OpenAI: Get key from https://platform.openai.com/api-keys
Anthropic: Get key from https://console.anthropic.com/

🧠 Rule Generation

AI-Enhanced Generation

PoC Analysis: LLM analyzes actual exploit code
Intelligent Detection: Creates sophisticated SIGMA rules
Context Awareness: Maps CVE descriptions to detection patterns
Validation: Automatic SIGMA syntax verification
Fallback: Template-based generation if LLM unavailable

Quality Tiers

Excellent (80+ pts): High-quality PoCs with recent updates
Good (60-79 pts): Moderate quality indicators
Fair (40-59 pts): Basic PoCs with some validation
Poor (20-39 pts): Minimal quality indicators
Very Poor (<20 pts): Low-quality PoCs

Rule Types

🤖 AI-Enhanced: LLM-generated with PoC analysis
🔍 Exploit-Based: Template + GitHub exploit indicators
⚡ Basic: CVE description only

Example Output

title: CVE-2025-1234 AI-Enhanced Detection
description: Detection for CVE-2025-1234 RCE [AI-Enhanced with PoC analysis]
tags:
    - attack.t1059.001
    - cve-2025-1234
    - ai.enhanced
detection:
    selection_process:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - 'bypass'
    selection_network:
        DestinationPort: [443, 80]
    condition: selection_process and selection_network
level: high

🛠️ Development

Local Development

# Start dependencies
docker-compose up -d db redis ollama

# Backend
cd backend && pip install -r requirements.txt
uvicorn main:app --reload

# Frontend  
cd frontend && npm install && npm start

Testing LLM Integration

# Check Ollama
curl http://localhost:11434/api/tags

# Test LLM status
curl http://localhost:8000/api/llm-status

# Switch providers
curl -X POST http://localhost:8000/api/llm-switch \
  -H "Content-Type: application/json" \
  -d '{"provider": "ollama", "model": "llama3.2"}'

📊 Architecture

Backend: FastAPI + SQLAlchemy ORM
Frontend: React + Tailwind CSS
Database: PostgreSQL with enhanced schema
Cache: Redis (optional)
LLM: Ollama container + multi-provider support
Deployment: Docker Compose

Enhanced Database Schema

CVEs: PoC metadata, bulk processing fields
SIGMA Rules: Quality scoring, nomi-sec data
Rule Templates: Pattern templates for generation
Bulk Jobs: Job tracking and status

🔧 Troubleshooting

Common Issues

CVE Fetch Issues

Verify NVD API key in .env
Check API connectivity: Use "Test NVD API" button
Review logs: docker-compose logs -f backend

No Rules Generated

Ensure LLM provider is accessible
Check /api/llm-status for provider health
Verify PoC data quality in CVE details

Performance Issues

Start with recent years (2020+) for faster initial setup
Use smaller batch sizes for bulk operations
Monitor system resources during processing

Port Conflicts

Default ports: 3000 (frontend), 8000 (backend), 5432 (db)
Modify docker-compose.yml if ports are in use

Rate Limits

NVD API: 5/30s (no key) → 50/30s (with key)
nomi-sec API: 1/second (built-in limiting)
GitHub API: 60/hour (no token) → 5000/hour (with token)

🛡️ Security

Store API keys in environment variables
Validate generated rules before production deployment
Rules marked as "experimental" - require analyst review
Use strong database passwords in production

📈 Monitoring

# View logs
docker-compose logs -f backend
docker-compose logs -f frontend

# Check service health
docker-compose ps

# Monitor bulk jobs
curl http://localhost:8000/api/bulk-status

🗺️ Roadmap

Custom rule template editor
Advanced MITRE ATT&CK mapping
SIEM platform export
ML-based rule optimization
Threat intelligence integration

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

Fork repository
Create feature branch
Add tests and documentation
Submit pull request

📞 Support

Check troubleshooting section
Review application logs
Open GitHub issue for bugs/questions