This project is a proof of concept to see if we can have a program create SIGMA rules based on information in new CVEs that are published.
- Extracts CVE records from the National Vulnerability Database
- Extracts exploit data from Github repoositories, ExploitDB, and the CISA Known Exploited Vulnerabilities catalog
- Extracts text data from reference links found on both exploit records + CVE records
- Sends exploit data + reference data to LLM to create SIGMA rules based on the content
This data is not meant for production use and is considered experimental. Inspired from: https://blogs.night-wolf.io/sigmagen-ai-powered-attck-mapped-threat-detection-with-sigma-rules
- Extract database models from monolithic main.py (2,373 lines) into organized modules - Implement service layer pattern with dedicated business logic classes - Split API endpoints into modular FastAPI routers by functionality - Add centralized configuration management with environment variable handling - Create proper separation of concerns across data, service, and presentation layers **Architecture Changes:** - models/: SQLAlchemy database models (CVE, SigmaRule, RuleTemplate, BulkProcessingJob) - config/: Centralized settings and database configuration - services/: Business logic (CVEService, SigmaRuleService, GitHubExploitAnalyzer) - routers/: Modular API endpoints (cves, sigma_rules, bulk_operations, llm_operations) - schemas/: Pydantic request/response models **Key Improvements:** - 95% reduction in main.py size (2,373 → 120 lines) - Updated 15+ backend files with proper import structure - Eliminated circular dependencies and tight coupling - Enhanced testability with isolated service components - Better code organization for team collaboration **Backward Compatibility:** - All API endpoints maintain same URLs and behavior - Zero breaking changes to existing functionality - Database schema unchanged - Environment variables preserved 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
---|---|---|
backend | ||
exploit-db-mirror@99e10e9ba8 | ||
frontend | ||
github_poc_collector@5c171fb9a9 | ||
.env.example | ||
.gitignore | ||
.gitmodules | ||
docker-compose.yml | ||
init.sql | ||
Makefile | ||
README.md | ||
REFACTOR_NOTES.md | ||
start.sh |
CVE-SIGMA Auto Generator
Automated platform that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis.
✨ Key Features
- Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
- AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
- Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
- Real-time Monitoring: Live job tracking and progress dashboard
- Advanced Indicators: Extract processes, files, network patterns from actual exploits
🚀 Quick Start
Prerequisites
- Docker and Docker Compose
- (Optional) API keys for enhanced features
Installation
# Clone and start
git clone <repository-url>
cd auto_sigma_rule_generator
chmod +x start.sh
./start.sh
Access Points:
- Frontend: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
First Run
The application automatically:
- Initializes database with rule templates
- Fetches recent CVEs from NVD
- Generates SIGMA rules with AI enhancement
- Polls for new CVEs hourly
🎯 Usage
Web Interface
- Dashboard: Statistics and system overview
- CVEs: Complete CVE listing with PoC data
- SIGMA Rules: Generated detection rules
- Bulk Jobs: Processing status and controls
API Endpoints
Core Operations
# Fetch CVEs
curl -X POST http://localhost:8000/api/fetch-cves
# Bulk processing
curl -X POST http://localhost:8000/api/bulk-seed
curl -X POST http://localhost:8000/api/incremental-update
# LLM-enhanced rules
curl -X POST http://localhost:8000/api/llm-enhanced-rules
Data Access
GET /api/cves
- List CVEsGET /api/sigma-rules
- List rulesGET /api/stats
- StatisticsGET /api/llm-status
- LLM provider status
⚙️ Configuration
Environment Variables
Core Settings
DATABASE_URL=postgresql://user:pass@db:5432/dbname
NVD_API_KEY=your_nvd_key # Optional: 5→50 req/30s
GITHUB_TOKEN=your_github_token # Optional: Enhanced PoC analysis
LLM Configuration
LLM_PROVIDER=ollama # Default: ollama (local)
LLM_MODEL=llama3.2 # Provider-specific model
OLLAMA_BASE_URL=http://ollama:11434
# External providers (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
API Keys Setup
NVD API (Recommended)
- Get key: https://nvd.nist.gov/developers/request-an-api-key
- Add to
.env
:NVD_API_KEY=your_key
- Benefit: 10x rate limit increase
GitHub Token (Optional)
- Create: https://github.com/settings/tokens (public_repo scope)
- Add to
.env
:GITHUB_TOKEN=your_token
- Benefit: Enhanced exploit-based rules
LLM APIs (Optional)
- Local Ollama: No setup required (default)
- OpenAI: Get key from https://platform.openai.com/api-keys
- Anthropic: Get key from https://console.anthropic.com/
🧠 Rule Generation
AI-Enhanced Generation
- PoC Analysis: LLM analyzes actual exploit code
- Intelligent Detection: Creates sophisticated SIGMA rules
- Context Awareness: Maps CVE descriptions to detection patterns
- Validation: Automatic SIGMA syntax verification
- Fallback: Template-based generation if LLM unavailable
Quality Tiers
- Excellent (80+ pts): High-quality PoCs with recent updates
- Good (60-79 pts): Moderate quality indicators
- Fair (40-59 pts): Basic PoCs with some validation
- Poor (20-39 pts): Minimal quality indicators
- Very Poor (<20 pts): Low-quality PoCs
Rule Types
- 🤖 AI-Enhanced: LLM-generated with PoC analysis
- 🔍 Exploit-Based: Template + GitHub exploit indicators
- ⚡ Basic: CVE description only
Example Output
title: CVE-2025-1234 AI-Enhanced Detection
description: Detection for CVE-2025-1234 RCE [AI-Enhanced with PoC analysis]
tags:
- attack.t1059.001
- cve-2025-1234
- ai.enhanced
detection:
selection_process:
Image|endswith: '\powershell.exe'
CommandLine|contains:
- '-EncodedCommand'
- 'bypass'
selection_network:
DestinationPort: [443, 80]
condition: selection_process and selection_network
level: high
🛠️ Development
Local Development
# Start dependencies
docker-compose up -d db redis ollama
# Backend
cd backend && pip install -r requirements.txt
uvicorn main:app --reload
# Frontend
cd frontend && npm install && npm start
Testing LLM Integration
# Check Ollama
curl http://localhost:11434/api/tags
# Test LLM status
curl http://localhost:8000/api/llm-status
# Switch providers
curl -X POST http://localhost:8000/api/llm-switch \
-H "Content-Type: application/json" \
-d '{"provider": "ollama", "model": "llama3.2"}'
📊 Architecture
- Backend: FastAPI + SQLAlchemy ORM
- Frontend: React + Tailwind CSS
- Database: PostgreSQL with enhanced schema
- Cache: Redis (optional)
- LLM: Ollama container + multi-provider support
- Deployment: Docker Compose
Enhanced Database Schema
- CVEs: PoC metadata, bulk processing fields
- SIGMA Rules: Quality scoring, nomi-sec data
- Rule Templates: Pattern templates for generation
- Bulk Jobs: Job tracking and status
🔧 Troubleshooting
Common Issues
CVE Fetch Issues
- Verify NVD API key in
.env
- Check API connectivity: Use "Test NVD API" button
- Review logs:
docker-compose logs -f backend
No Rules Generated
- Ensure LLM provider is accessible
- Check
/api/llm-status
for provider health - Verify PoC data quality in CVE details
Performance Issues
- Start with recent years (2020+) for faster initial setup
- Use smaller batch sizes for bulk operations
- Monitor system resources during processing
Port Conflicts
- Default ports: 3000 (frontend), 8000 (backend), 5432 (db)
- Modify
docker-compose.yml
if ports are in use
Rate Limits
- NVD API: 5/30s (no key) → 50/30s (with key)
- nomi-sec API: 1/second (built-in limiting)
- GitHub API: 60/hour (no token) → 5000/hour (with token)
🛡️ Security
- Store API keys in environment variables
- Validate generated rules before production deployment
- Rules marked as "experimental" - require analyst review
- Use strong database passwords in production
📈 Monitoring
# View logs
docker-compose logs -f backend
docker-compose logs -f frontend
# Check service health
docker-compose ps
# Monitor bulk jobs
curl http://localhost:8000/api/bulk-status
🗺️ Roadmap
- Custom rule template editor
- Advanced MITRE ATT&CK mapping
- SIEM platform export
- ML-based rule optimization
- Threat intelligence integration
📝 License
MIT License - see LICENSE file for details.
🤝 Contributing
- Fork repository
- Create feature branch
- Add tests and documentation
- Submit pull request
📞 Support
- Check troubleshooting section
- Review application logs
- Open GitHub issue for bugs/questions