This project is a proof of concept to see if we can have a program create SIGMA rules based on information in new CVEs that are published. - Extracts CVE records from the National Vulnerability Database - Extracts exploit data from Github repoositories, ExploitDB, and the CISA Known Exploited Vulnerabilities catalog - Extracts text data from reference links found on both exploit records + CVE records - Sends exploit data + reference data to LLM to create SIGMA rules based on the content This data is not meant for production use and is considered experimental. Inspired from: https://blogs.night-wolf.io/sigmagen-ai-powered-attck-mapped-threat-detection-with-sigma-rules
Find a file
bpmcdevitt e579c91b5e MAJOR: Transform web application to professional CLI-based SIGMA rule generator
🎉 **Architecture Transformation (v2.0)**
- Complete migration from web app to professional CLI tool
- File-based SIGMA rule management system
- Git-friendly directory structure organized by year/CVE-ID
- Multiple rule variants per CVE (template, LLM, hybrid)

 **New CLI System**
- Professional command-line interface with Click framework
- 8 command groups: process, generate, search, stats, export, migrate
- Modular command architecture for maintainability
- Comprehensive help system and configuration management

📁 **File-Based Storage Architecture**
- Individual CVE directories: cves/YEAR/CVE-ID/
- Multiple SIGMA rule variants per CVE
- JSON metadata with processing history and PoC data
- Native YAML files perfect for version control

🚀 **Core CLI Commands**
- process: CVE processing and bulk operations
- generate: SIGMA rule generation with multiple methods
- search: Advanced CVE and rule searching with filters
- stats: Comprehensive statistics and analytics
- export: Multiple output formats for different workflows
- migrate: Database-to-file migration tools

🔧 **Migration Support**
- Complete migration utilities from web database
- Data validation and integrity checking
- Backward compatibility with existing processors
- Legacy web interface maintained for transition

📊 **Enhanced Features**
- Advanced search with complex filtering (severity, PoC presence, etc.)
- Multi-format exports (YAML, JSON, CSV)
- Comprehensive statistics and coverage reports
- File-based rule versioning and management

🎯 **Production Benefits**
- No database dependency - runs anywhere
- Perfect for cybersecurity teams using git workflows
- Direct integration with SIGMA ecosystems
- Portable architecture for CI/CD pipelines
- Multiple rule variants for different detection scenarios

📝 **Documentation Updates**
- Complete README rewrite for CLI-first approach
- Updated CLAUDE.md with new architecture details
- Detailed CLI documentation with examples
- Migration guides and troubleshooting

**Perfect for security teams wanting production-ready SIGMA rules with version control\! 🛡️**

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:11:03 -05:00
backend Migrate task tracking from BulkProcessingJob to Celery-based monitoring 2025-07-21 09:23:26 -05:00
cli MAJOR: Transform web application to professional CLI-based SIGMA rule generator 2025-07-21 13:11:03 -05:00
exploit-db-mirror@99e10e9ba8 add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
frontend Migrate task tracking from BulkProcessingJob to Celery-based monitoring 2025-07-21 09:23:26 -05:00
github_poc_collector@5c171fb9a9 added git submodule for more exploits. added template dir for base yaml templates for sigma rules 2025-07-09 11:58:29 -05:00
.env.example add claude client + generic llm client using langchain 2025-07-09 18:02:45 -05:00
.gitignore fix build errors 2025-07-08 09:10:25 -05:00
.gitmodules add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
CLAUDE.md MAJOR: Transform web application to professional CLI-based SIGMA rule generator 2025-07-21 13:11:03 -05:00
docker-compose.yml Migrate task tracking from BulkProcessingJob to Celery-based monitoring 2025-07-21 09:23:26 -05:00
init.sql add reference data gathering 2025-07-10 17:30:12 -05:00
Makefile fix build errors 2025-07-08 09:10:25 -05:00
README.md MAJOR: Transform web application to professional CLI-based SIGMA rule generator 2025-07-21 13:11:03 -05:00
start.sh more updates for bulk 2025-07-08 17:50:01 -05:00

CVE-SIGMA Auto Generator - CLI Edition

Professional file-based SIGMA rule generation system for cybersecurity workflows

Automated CLI tool that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis. Now optimized for git workflows and production SIGMA rule management with a file-based architecture.

🌟 Major Architecture Update

🎉 New in v2.0: Transformed from web application to professional CLI tool with file-based SIGMA rule management!

  • Git-Friendly: Native YAML files perfect for version control
  • Industry Standard: Direct integration with SIGMA ecosystems
  • Portable: No database dependency, works anywhere
  • Scalable: Process specific years/CVEs as needed
  • Multiple Variants: Different generation methods per CVE

Key Features

  • Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
  • AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
  • File-Based Storage: Organized directory structure for each CVE and rule variant
  • Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
  • Advanced Search & Filtering: Find CVEs and rules with complex criteria
  • Comprehensive Statistics: Coverage reports and generation analytics
  • Export Tools: Multiple output formats for different workflows

🚀 Quick Start

Prerequisites

  • Python 3.8+ with pip
  • (Optional) Docker for legacy web interface
  • (Optional) API keys for enhanced features

Installation

# Clone repository
git clone <repository-url>
cd auto_sigma_rule_generator

# Install CLI dependencies
pip install -r cli/requirements.txt

# Make CLI executable
chmod +x cli/sigma_cli.py

# Initialize configuration
./cli/sigma_cli.py config-init

First Run - Migration from Web App (If Applicable)

# If migrating from previous web version
./cli/sigma_cli.py migrate from-database --database-url "postgresql://user:pass@localhost:5432/db"

# Validate migration
./cli/sigma_cli.py migrate validate

# Or start fresh with new CVE processing
./cli/sigma_cli.py process year 2024

🎯 CLI Usage

Core Commands

# Process CVEs and generate rules
./cli/sigma_cli.py process year 2024                    # Process specific year
./cli/sigma_cli.py process cve CVE-2024-0001            # Process specific CVE
./cli/sigma_cli.py process bulk --start-year 2020       # Bulk process multiple years
./cli/sigma_cli.py process incremental --days 7         # Process recent changes

# Generate rules for existing CVEs
./cli/sigma_cli.py generate cve CVE-2024-0001 --method all        # All generation methods
./cli/sigma_cli.py generate regenerate --year 2024 --method llm   # Regenerate with LLM

# Search CVEs and rules
./cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
./cli/sigma_cli.py search rules "powershell" --method llm

# View statistics and reports
./cli/sigma_cli.py stats overview --year 2024 --output ./reports/2024-stats.json
./cli/sigma_cli.py stats poc --year 2024               # PoC coverage statistics
./cli/sigma_cli.py stats rules --method template       # Rule generation statistics

# Export data
./cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
./cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv

Available Generation Methods

  • template - Template-based rule generation
  • llm - AI/LLM-enhanced generation (OpenAI, Anthropic, Ollama)
  • hybrid - Combined template + LLM approach
  • all - Generate all variants

📁 File Structure

The CLI organizes everything in a clean, git-friendly structure:

auto_sigma_rule_generator/
├── cves/                           # CVE data organized by year
│   ├── 2024/
│   │   ├── CVE-2024-0001/
│   │   │   ├── metadata.json           # CVE info & generation metadata
│   │   │   ├── rule_template.sigma     # Template-based rule
│   │   │   ├── rule_llm_openai.sigma   # OpenAI-generated rule
│   │   │   ├── rule_llm_anthropic.sigma# Anthropic-generated rule  
│   │   │   ├── rule_hybrid.sigma       # Hybrid-generated rule
│   │   │   └── poc_analysis.json       # PoC analysis data
│   │   └── CVE-2024-0002/...
│   └── 2023/...
├── cli/                            # CLI tool and commands
│   ├── sigma_cli.py               # Main CLI executable
│   ├── commands/                  # Command modules
│   └── README.md                  # Detailed CLI documentation
└── reports/                       # Generated reports and exports

File Formats

metadata.json - CVE information and processing history

{
  "cve_info": {
    "cve_id": "CVE-2024-0001",
    "description": "Remote code execution vulnerability...",
    "cvss_score": 9.8,
    "severity": "critical",
    "published_date": "2024-01-01T00:00:00Z"
  },
  "poc_data": {
    "poc_count": 3,
    "poc_data": {"nomi_sec": [...], "github": [...]}
  },
  "rule_generation": {
    "template": {"generated_at": "2024-01-01T12:00:00Z"},
    "llm_openai": {"generated_at": "2024-01-01T12:30:00Z"}
  }
}

SIGMA Rule Files - Ready-to-use detection rules

# rule_llm_openai.sigma
title: CVE-2024-0001 Remote Code Execution Detection
id: 12345678-1234-5678-9abc-123456789012
status: experimental
description: Detects exploitation attempts for CVE-2024-0001
author: CVE-SIGMA Auto Generator (OpenAI Enhanced)
date: 2024/01/01
references:
    - https://nvd.nist.gov/vuln/detail/CVE-2024-0001
tags:
    - attack.t1059.001
    - cve.2024.0001
    - ai.enhanced
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - 'bypass'
    condition: selection
falsepositives:
    - Legitimate administrative scripts
level: high

⚙️ Configuration

CLI Configuration (~/.sigma-cli/config.yaml)

# API Keys for enhanced functionality
api_keys:
  nvd_api_key: "your_nvd_key"           # Optional: 5→50 req/30s rate limit
  github_token: "your_github_token"     # Optional: Enhanced PoC analysis
  openai_api_key: "your_openai_key"     # Optional: AI rule generation
  anthropic_api_key: "your_anthropic_key" # Optional: AI rule generation

# LLM Settings  
llm_settings:
  default_provider: "ollama"           # Default: ollama (local)
  default_model: "llama3.2"           # Provider-specific model
  ollama_base_url: "http://localhost:11434"

# Processing Settings
processing:
  default_batch_size: 50               # CVEs per batch
  default_methods: ["template"]       # Default generation methods

API Keys Setup

NVD API Key (Recommended)

GitHub Token (Optional)

LLM APIs (Optional)

🧠 AI-Enhanced Rule Generation

How It Works

  1. CVE Analysis: Extract vulnerability details from NVD data
  2. PoC Collection: Gather exploit code from nomi-sec, GitHub, ExploitDB
  3. Quality Assessment: Score PoCs based on stars, recency, completeness
  4. AI Enhancement: LLM analyzes actual exploit code to create detection logic
  5. SIGMA Generation: Produce valid, tested SIGMA rules with proper syntax
  6. Multi-Variant Output: Generate template, LLM, and hybrid versions

Quality Tiers

  • Excellent (80+ pts): High-star PoCs with recent updates, detailed analysis
  • Good (60-79 pts): Moderate quality with some validation
  • Fair (40-59 pts): Basic PoCs with minimal indicators
  • Poor (20-39 pts): Low-quality or outdated PoCs
  • Very Poor (<20 pts): Minimal or unreliable PoCs

Rule Variants Generated

  • 🤖 AI-Enhanced (rule_llm_*.sigma): LLM analysis of actual exploit code
  • 🔧 Template-Based (rule_template.sigma): Pattern-based generation
  • Hybrid (rule_hybrid.sigma): Best of both approaches

📊 Advanced Features

Search & Analytics

# Complex CVE searches
./cli/sigma_cli.py search cve "remote code execution" \
  --year 2024 --severity critical --has-poc --has-rules --limit 50

# Rule analysis  
./cli/sigma_cli.py search rules "powershell" \
  --rule-type process --method llm --limit 20

# Comprehensive statistics
./cli/sigma_cli.py stats overview                    # Overall system stats
./cli/sigma_cli.py stats poc --year 2024            # PoC coverage analysis
./cli/sigma_cli.py stats rules --method llm         # AI generation statistics

Export & Integration

# Export for SIEM integration
./cli/sigma_cli.py export sigma ./siem-rules \
  --format yaml --year 2024 --method llm

# Metadata for analysis
./cli/sigma_cli.py export metadata ./analysis/cve-data.csv \
  --format csv --year 2024

# Consolidated ruleset  
./cli/sigma_cli.py export ruleset ./complete-rules.json \
  --year 2024 --include-metadata

🛠️ Development & Legacy Support

CLI Development

The new CLI system is built with:

  • Click: Professional CLI framework
  • Modular Commands: Separate modules for each command group
  • Async Processing: Efficient handling of bulk operations
  • File-Based Storage: Git-friendly YAML and JSON formats

Legacy Web Interface (Optional)

The original web interface is still available for migration purposes:

# Start legacy web interface (if needed for migration)
docker-compose up -d db redis backend frontend

# Access points:
# - Frontend: http://localhost:3000  
# - API: http://localhost:8000
# - Flower (Celery): http://localhost:5555

Migration Path

  1. Export Data: Use CLI migration tools to export from database
  2. Validate: Verify all data transferred correctly
  3. Switch: Use CLI for all new operations
  4. Cleanup: Optionally remove web components

🔧 Troubleshooting

Common Issues

CLI Import Errors

  • Ensure you're running from project root directory
  • Install dependencies: pip install -r cli/requirements.txt
  • Check Python version (3.8+ required)

CVE Processing Failures

  • Verify NVD API key in configuration
  • Check network connectivity and rate limits
  • Use --verbose flag for detailed logging

No Rules Generated

  • Ensure LLM provider is accessible (test with ./cli/sigma_cli.py stats overview)
  • Check PoC data availability with --has-poc filter
  • Verify API keys for external LLM providers

File Permission Issues

  • Ensure write permissions to cves/ directory
  • Check CLI executable permissions: chmod +x cli/sigma_cli.py

Performance Optimization

  • Use --batch-size parameter for large datasets
  • Process recent years first (2020+) for faster initial results
  • Use incremental processing for regular updates
  • Monitor system resources during bulk operations

🛡️ Security Best Practices

  • Store API keys in configuration file (~/.sigma-cli/config.yaml)
  • Validate generated rules before production deployment
  • Rules marked as "experimental" require analyst review
  • Use version control to track rule changes and improvements
  • Regularly update PoC data sources for current threat landscape

📈 Monitoring & Maintenance

# System health checks
./cli/sigma_cli.py stats overview                   # Overall system status
./cli/sigma_cli.py migrate validate                 # Data integrity check

# Regular maintenance
./cli/sigma_cli.py process incremental --days 7     # Weekly updates  
./cli/sigma_cli.py generate regenerate --filter-quality excellent  # Refresh high-quality rules

# Performance monitoring
./cli/sigma_cli.py stats rules --year 2024         # Generation statistics
./cli/sigma_cli.py stats poc --year 2024           # Coverage analysis

🗺️ Roadmap

CLI Enhancements

  • Rule quality scoring and validation
  • Custom template editor
  • Integration with popular SIEM platforms
  • Advanced MITRE ATT&CK mapping
  • Threat intelligence feed integration

Export Features

  • Splunk app export format
  • Elastic Stack integration
  • QRadar rule format
  • YARA rule generation
  • IOC extraction

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Test with both CLI and legacy systems
  4. Add tests and documentation
  5. Submit a pull request

📞 Support

CLI Issues

  • Check cli/README.md for detailed CLI documentation
  • Use --verbose flag for debugging
  • Ensure proper configuration in ~/.sigma-cli/config.yaml

General Support

  • Review troubleshooting section above
  • Check application logs with --verbose
  • Open GitHub issue with specific error details

🎉 What's New in v2.0

Complete CLI System - Professional command-line interface File-Based Storage - Git-friendly YAML and JSON files
Multiple Rule Variants - Template, AI, and hybrid generation Advanced Search - Complex filtering and analytics Export Tools - Multiple output formats for different workflows Migration Tools - Seamless transition from web application Portable Architecture - No database dependency, runs anywhere

Perfect for cybersecurity teams who want production-ready SIGMA rules with version control integration! 🚀