This project is a proof of concept to see if we can have a program create SIGMA rules based on information in new CVEs that are published. - Extracts CVE records from the National Vulnerability Database - Extracts exploit data from Github repoositories, ExploitDB, and the CISA Known Exploited Vulnerabilities catalog - Extracts text data from reference links found on both exploit records + CVE records - Sends exploit data + reference data to LLM to create SIGMA rules based on the content This data is not meant for production use and is considered experimental. Inspired from: https://blogs.night-wolf.io/sigmagen-ai-powered-attck-mapped-threat-detection-with-sigma-rules
Find a file
bpmcdevitt a6fb367ed4 refactor: modularize backend architecture for improved maintainability
- Extract database models from monolithic main.py (2,373 lines) into organized modules
- Implement service layer pattern with dedicated business logic classes
- Split API endpoints into modular FastAPI routers by functionality
- Add centralized configuration management with environment variable handling
- Create proper separation of concerns across data, service, and presentation layers

**Architecture Changes:**
- models/: SQLAlchemy database models (CVE, SigmaRule, RuleTemplate, BulkProcessingJob)
- config/: Centralized settings and database configuration
- services/: Business logic (CVEService, SigmaRuleService, GitHubExploitAnalyzer)
- routers/: Modular API endpoints (cves, sigma_rules, bulk_operations, llm_operations)
- schemas/: Pydantic request/response models

**Key Improvements:**
- 95% reduction in main.py size (2,373 → 120 lines)
- Updated 15+ backend files with proper import structure
- Eliminated circular dependencies and tight coupling
- Enhanced testability with isolated service components
- Better code organization for team collaboration

**Backward Compatibility:**
- All API endpoints maintain same URLs and behavior
- Zero breaking changes to existing functionality
- Database schema unchanged
- Environment variables preserved

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-14 17:51:23 -05:00
backend refactor: modularize backend architecture for improved maintainability 2025-07-14 17:51:23 -05:00
exploit-db-mirror@99e10e9ba8 add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
frontend add job scheduler 2025-07-11 09:16:57 -05:00
github_poc_collector@5c171fb9a9 added git submodule for more exploits. added template dir for base yaml templates for sigma rules 2025-07-09 11:58:29 -05:00
.env.example add claude client + generic llm client using langchain 2025-07-09 18:02:45 -05:00
.gitignore fix build errors 2025-07-08 09:10:25 -05:00
.gitmodules add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
docker-compose.yml add ollama to docker-compose for local model testing 2025-07-10 21:32:15 -05:00
init.sql add reference data gathering 2025-07-10 17:30:12 -05:00
Makefile fix build errors 2025-07-08 09:10:25 -05:00
README.md add cve2capec client to map mitre attack data to cves 2025-07-14 15:48:10 -05:00
REFACTOR_NOTES.md refactor: modularize backend architecture for improved maintainability 2025-07-14 17:51:23 -05:00
start.sh more updates for bulk 2025-07-08 17:50:01 -05:00

CVE-SIGMA Auto Generator

Automated platform that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis.

Key Features

  • Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
  • AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
  • Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
  • Real-time Monitoring: Live job tracking and progress dashboard
  • Advanced Indicators: Extract processes, files, network patterns from actual exploits

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose
  • (Optional) API keys for enhanced features

Installation

# Clone and start
git clone <repository-url>
cd auto_sigma_rule_generator
chmod +x start.sh
./start.sh

Access Points:

First Run

The application automatically:

  1. Initializes database with rule templates
  2. Fetches recent CVEs from NVD
  3. Generates SIGMA rules with AI enhancement
  4. Polls for new CVEs hourly

🎯 Usage

Web Interface

  • Dashboard: Statistics and system overview
  • CVEs: Complete CVE listing with PoC data
  • SIGMA Rules: Generated detection rules
  • Bulk Jobs: Processing status and controls

API Endpoints

Core Operations

# Fetch CVEs
curl -X POST http://localhost:8000/api/fetch-cves

# Bulk processing
curl -X POST http://localhost:8000/api/bulk-seed
curl -X POST http://localhost:8000/api/incremental-update

# LLM-enhanced rules
curl -X POST http://localhost:8000/api/llm-enhanced-rules

Data Access

  • GET /api/cves - List CVEs
  • GET /api/sigma-rules - List rules
  • GET /api/stats - Statistics
  • GET /api/llm-status - LLM provider status

⚙️ Configuration

Environment Variables

Core Settings

DATABASE_URL=postgresql://user:pass@db:5432/dbname
NVD_API_KEY=your_nvd_key          # Optional: 5→50 req/30s
GITHUB_TOKEN=your_github_token     # Optional: Enhanced PoC analysis

LLM Configuration

LLM_PROVIDER=ollama               # Default: ollama (local)
LLM_MODEL=llama3.2               # Provider-specific model
OLLAMA_BASE_URL=http://ollama:11434

# External providers (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key

API Keys Setup

NVD API (Recommended)

  1. Get key: https://nvd.nist.gov/developers/request-an-api-key
  2. Add to .env: NVD_API_KEY=your_key
  3. Benefit: 10x rate limit increase

GitHub Token (Optional)

  1. Create: https://github.com/settings/tokens (public_repo scope)
  2. Add to .env: GITHUB_TOKEN=your_token
  3. Benefit: Enhanced exploit-based rules

LLM APIs (Optional)

🧠 Rule Generation

AI-Enhanced Generation

  1. PoC Analysis: LLM analyzes actual exploit code
  2. Intelligent Detection: Creates sophisticated SIGMA rules
  3. Context Awareness: Maps CVE descriptions to detection patterns
  4. Validation: Automatic SIGMA syntax verification
  5. Fallback: Template-based generation if LLM unavailable

Quality Tiers

  • Excellent (80+ pts): High-quality PoCs with recent updates
  • Good (60-79 pts): Moderate quality indicators
  • Fair (40-59 pts): Basic PoCs with some validation
  • Poor (20-39 pts): Minimal quality indicators
  • Very Poor (<20 pts): Low-quality PoCs

Rule Types

  • 🤖 AI-Enhanced: LLM-generated with PoC analysis
  • 🔍 Exploit-Based: Template + GitHub exploit indicators
  • Basic: CVE description only

Example Output

title: CVE-2025-1234 AI-Enhanced Detection
description: Detection for CVE-2025-1234 RCE [AI-Enhanced with PoC analysis]
tags:
    - attack.t1059.001
    - cve-2025-1234
    - ai.enhanced
detection:
    selection_process:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - 'bypass'
    selection_network:
        DestinationPort: [443, 80]
    condition: selection_process and selection_network
level: high

🛠️ Development

Local Development

# Start dependencies
docker-compose up -d db redis ollama

# Backend
cd backend && pip install -r requirements.txt
uvicorn main:app --reload

# Frontend  
cd frontend && npm install && npm start

Testing LLM Integration

# Check Ollama
curl http://localhost:11434/api/tags

# Test LLM status
curl http://localhost:8000/api/llm-status

# Switch providers
curl -X POST http://localhost:8000/api/llm-switch \
  -H "Content-Type: application/json" \
  -d '{"provider": "ollama", "model": "llama3.2"}'

📊 Architecture

  • Backend: FastAPI + SQLAlchemy ORM
  • Frontend: React + Tailwind CSS
  • Database: PostgreSQL with enhanced schema
  • Cache: Redis (optional)
  • LLM: Ollama container + multi-provider support
  • Deployment: Docker Compose

Enhanced Database Schema

  • CVEs: PoC metadata, bulk processing fields
  • SIGMA Rules: Quality scoring, nomi-sec data
  • Rule Templates: Pattern templates for generation
  • Bulk Jobs: Job tracking and status

🔧 Troubleshooting

Common Issues

CVE Fetch Issues

  • Verify NVD API key in .env
  • Check API connectivity: Use "Test NVD API" button
  • Review logs: docker-compose logs -f backend

No Rules Generated

  • Ensure LLM provider is accessible
  • Check /api/llm-status for provider health
  • Verify PoC data quality in CVE details

Performance Issues

  • Start with recent years (2020+) for faster initial setup
  • Use smaller batch sizes for bulk operations
  • Monitor system resources during processing

Port Conflicts

  • Default ports: 3000 (frontend), 8000 (backend), 5432 (db)
  • Modify docker-compose.yml if ports are in use

Rate Limits

  • NVD API: 5/30s (no key) → 50/30s (with key)
  • nomi-sec API: 1/second (built-in limiting)
  • GitHub API: 60/hour (no token) → 5000/hour (with token)

🛡️ Security

  • Store API keys in environment variables
  • Validate generated rules before production deployment
  • Rules marked as "experimental" - require analyst review
  • Use strong database passwords in production

📈 Monitoring

# View logs
docker-compose logs -f backend
docker-compose logs -f frontend

# Check service health
docker-compose ps

# Monitor bulk jobs
curl http://localhost:8000/api/bulk-status

🗺️ Roadmap

  • Custom rule template editor
  • Advanced MITRE ATT&CK mapping
  • SIEM platform export
  • ML-based rule optimization
  • Threat intelligence integration

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

  1. Fork repository
  2. Create feature branch
  3. Add tests and documentation
  4. Submit pull request

📞 Support

  • Check troubleshooting section
  • Review application logs
  • Open GitHub issue for bugs/questions