This project is a proof of concept to see if we can have a program create SIGMA rules based on information in new CVEs that are published. - Extracts CVE records from the National Vulnerability Database - Extracts exploit data from Github repoositories, ExploitDB, and the CISA Known Exploited Vulnerabilities catalog - Extracts text data from reference links found on both exploit records + CVE records - Sends exploit data + reference data to LLM to create SIGMA rules based on the content This data is not meant for production use and is considered experimental. Inspired from: https://blogs.night-wolf.io/sigmagen-ai-powered-attck-mapped-threat-detection-with-sigma-rules
Find a file
bpmcdevitt eca51167af FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation
This commit adds complete Docker Compose support to the CLI application, making it easy to run
the SIGMA rule generator in a containerized environment:

DOCKER INFRASTRUCTURE:
- docker-compose.yml: Complete service orchestration (CLI app, PostgreSQL, Redis, optional Ollama)
- Dockerfile: Optimized CLI application container with all dependencies
- init.sql: Database initialization for PostgreSQL
- .env.example: Updated environment configuration for both Docker and native setups
- Makefile: Convenient commands for Docker operations (setup, up, down, shell, cli execution)

DOCUMENTATION UPDATES:
- README.md: Comprehensive Docker vs Native comparison with detailed usage examples
- CLAUDE.md: Updated project guidance with Docker Compose as recommended approach
- Added step-by-step setup instructions for both deployment methods
- Included command examples for both Docker Compose and native execution

DOCKER SERVICES:
- sigma-cli: Main CLI application container with volume mounts for data persistence
- db: PostgreSQL database for legacy migrations and data processing
- redis: Redis cache for performance optimization
- ollama: Optional local LLM service (profile-based)

DATA PERSISTENCE:
- Host-mounted directories: ./cves/, ./reports/, ./logs/, ./backend/templates/
- Named volumes: postgres_data, redis_data, ollama_data
- Complete data preservation between container restarts

This provides users with multiple deployment options:
1. Quick Docker Compose setup (recommended for testing/evaluation)
2. Native installation (recommended for production/development)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:52:28 -05:00
backend FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
cli CLEANUP: Remove legacy web application components and streamline for CLI-first architecture 2025-07-21 13:24:38 -05:00
exploit-db-mirror@99e10e9ba8 add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
github_poc_collector@5c171fb9a9 added git submodule for more exploits. added template dir for base yaml templates for sigma rules 2025-07-09 11:58:29 -05:00
models/sigma_llama_finetuned FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
.env.example FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
.gitignore FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
.gitmodules add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
CLAUDE.md FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
docker-compose.yml FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
Dockerfile FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
init.sql FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
Makefile FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00
README.md FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation 2025-07-21 13:52:28 -05:00

CVE-SIGMA Auto Generator - CLI Edition

Professional file-based SIGMA rule generation system for cybersecurity workflows

Automated CLI tool that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis. Now optimized for git workflows and production SIGMA rule management with a file-based architecture.

🌟 Major Architecture Update

🎉 New in v2.0: Transformed from web application to professional CLI tool with file-based SIGMA rule management!

  • Git-Friendly: Native YAML files perfect for version control
  • Industry Standard: Direct integration with SIGMA ecosystems
  • Portable: No database dependency, works anywhere
  • Scalable: Process specific years/CVEs as needed
  • Multiple Variants: Different generation methods per CVE

Key Features

  • Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
  • AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
  • File-Based Storage: Organized directory structure for each CVE and rule variant
  • Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
  • Advanced Search & Filtering: Find CVEs and rules with complex criteria
  • Comprehensive Statistics: Coverage reports and generation analytics
  • Export Tools: Multiple output formats for different workflows

🚀 Quick Start

Prerequisites

  • Python 3.8+ with pip OR Docker & Docker Compose
  • (Optional) API keys for enhanced features

The easiest way to get started is with Docker Compose, which provides isolated environment with all dependencies:

# Clone repository
git clone <repository-url>
cd auto_sigma_rule_generator

# Copy environment configuration
cp .env.example .env
# Edit .env with your API keys (optional but recommended)

# Start services (database, redis, CLI container)
docker-compose up -d

# Access the CLI in the container
docker-compose exec sigma-cli bash

# Inside container, run CLI commands
python cli/sigma_cli.py --help
python cli/sigma_cli.py process year 2024
python cli/sigma_cli.py stats overview

Docker Compose Services

  • sigma-cli: Main CLI application container
  • db: PostgreSQL database for migration/legacy data
  • redis: Redis cache for performance
  • ollama: (Optional) Local LLM server with --profile ollama

Docker Commands

# Start all services
docker-compose up -d

# Start with local Ollama LLM
docker-compose --profile ollama up -d

# View logs
docker-compose logs -f sigma-cli

# Access CLI interactively  
docker-compose exec sigma-cli bash

# Run specific CLI commands
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024

# Stop services
docker-compose down

# Clean up (removes data volumes)
docker-compose down -v

Data Persistence

Docker Compose persists data in:

  • ./cves/ - CVE data and SIGMA rules (mounted from host)
  • ./reports/ - Generated reports (mounted from host)
  • postgres_data - Database volume
  • redis_data - Redis cache volume
  • ollama_data - Ollama models volume (if using --profile ollama)

🔧 Native Installation

For direct installation without Docker:

# Clone repository
git clone <repository-url>
cd auto_sigma_rule_generator

# Install CLI dependencies
pip install -r backend/requirements.txt
pip install click rich tabulate pyyaml

# Make CLI executable
chmod +x cli/sigma_cli.py

# Initialize configuration
./cli/sigma_cli.py config-init

# Set up database (optional, for migrations)
sudo -u postgres createdb cve_sigma_db
sudo -u postgres createuser cve_user

📋 Docker vs Native Comparison

Feature Docker Compose Native Installation
Setup Time ~2 minutes ~10 minutes
Dependencies Automatic Manual setup required
Database Included PostgreSQL Manual PostgreSQL setup
Local LLM Optional Ollama service Manual Ollama installation
Isolation Complete environment Uses system Python
Resource Usage Higher (containers) Lower (direct)
Best For Quick start, testing Production, development

Command Examples: Docker vs Native

# Docker Compose Usage
docker-compose up -d                                    # Start services
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py stats overview

# Native Usage  
./cli/sigma_cli.py process year 2024                   # Direct execution
./cli/sigma_cli.py stats overview

🚀 First Run Examples

First Run - Migration from Web App (If Applicable)

# If migrating from previous web version
./cli/sigma_cli.py migrate from-database --database-url "postgresql://user:pass@localhost:5432/db"

# Validate migration
./cli/sigma_cli.py migrate validate

# Or start fresh with new CVE processing
./cli/sigma_cli.py process year 2024

🎯 CLI Usage

Core Commands

Native Commands:

# Process CVEs and generate rules
./cli/sigma_cli.py process year 2024                    # Process specific year
./cli/sigma_cli.py process cve CVE-2024-0001            # Process specific CVE
./cli/sigma_cli.py process bulk --start-year 2020       # Bulk process multiple years
./cli/sigma_cli.py process incremental --days 7         # Process recent changes

# Generate rules for existing CVEs
./cli/sigma_cli.py generate cve CVE-2024-0001 --method all        # All generation methods
./cli/sigma_cli.py generate regenerate --year 2024 --method llm   # Regenerate with LLM

# Search CVEs and rules
./cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
./cli/sigma_cli.py search rules "powershell" --method llm

# View statistics and reports
./cli/sigma_cli.py stats overview --year 2024 --output ./reports/2024-stats.json
./cli/sigma_cli.py stats poc --year 2024               # PoC coverage statistics
./cli/sigma_cli.py stats rules --method template       # Rule generation statistics

# Export data
./cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
./cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv

Docker Compose Commands:

# Process CVEs and generate rules
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py process cve CVE-2024-0001
docker-compose exec sigma-cli python cli/sigma_cli.py process bulk --start-year 2020
docker-compose exec sigma-cli python cli/sigma_cli.py process incremental --days 7

# Generate rules for existing CVEs
docker-compose exec sigma-cli python cli/sigma_cli.py generate cve CVE-2024-0001 --method all
docker-compose exec sigma-cli python cli/sigma_cli.py generate regenerate --year 2024 --method llm

# Search CVEs and rules
docker-compose exec sigma-cli python cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
docker-compose exec sigma-cli python cli/sigma_cli.py search rules "powershell" --method llm

# View statistics and reports
docker-compose exec sigma-cli python cli/sigma_cli.py stats overview --year 2024 --output ./reports/2024-stats.json
docker-compose exec sigma-cli python cli/sigma_cli.py stats poc --year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py stats rules --method template

# Export data
docker-compose exec sigma-cli python cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv

# Interactive shell access
docker-compose exec sigma-cli bash                     # Access container shell

Available Generation Methods

  • template - Template-based rule generation
  • llm - AI/LLM-enhanced generation (OpenAI, Anthropic, Ollama)
  • hybrid - Combined template + LLM approach
  • all - Generate all variants

📁 File Structure

The CLI organizes everything in a clean, git-friendly structure:

auto_sigma_rule_generator/
├── cves/                           # CVE data organized by year
│   ├── 2024/
│   │   ├── CVE-2024-0001/
│   │   │   ├── metadata.json           # CVE info & generation metadata
│   │   │   ├── rule_template.sigma     # Template-based rule
│   │   │   ├── rule_llm_openai.sigma   # OpenAI-generated rule
│   │   │   ├── rule_llm_anthropic.sigma# Anthropic-generated rule  
│   │   │   ├── rule_hybrid.sigma       # Hybrid-generated rule
│   │   │   └── poc_analysis.json       # PoC analysis data
│   │   └── CVE-2024-0002/...
│   └── 2023/...
├── cli/                            # CLI tool and commands
│   ├── sigma_cli.py               # Main CLI executable
│   ├── commands/                  # Command modules
│   └── README.md                  # Detailed CLI documentation
└── reports/                       # Generated reports and exports

File Formats

metadata.json - CVE information and processing history

{
  "cve_info": {
    "cve_id": "CVE-2024-0001",
    "description": "Remote code execution vulnerability...",
    "cvss_score": 9.8,
    "severity": "critical",
    "published_date": "2024-01-01T00:00:00Z"
  },
  "poc_data": {
    "poc_count": 3,
    "poc_data": {"nomi_sec": [...], "github": [...]}
  },
  "rule_generation": {
    "template": {"generated_at": "2024-01-01T12:00:00Z"},
    "llm_openai": {"generated_at": "2024-01-01T12:30:00Z"}
  }
}

SIGMA Rule Files - Ready-to-use detection rules

# rule_llm_openai.sigma
title: CVE-2024-0001 Remote Code Execution Detection
id: 12345678-1234-5678-9abc-123456789012
status: experimental
description: Detects exploitation attempts for CVE-2024-0001
author: CVE-SIGMA Auto Generator (OpenAI Enhanced)
date: 2024/01/01
references:
    - https://nvd.nist.gov/vuln/detail/CVE-2024-0001
tags:
    - attack.t1059.001
    - cve.2024.0001
    - ai.enhanced
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - 'bypass'
    condition: selection
falsepositives:
    - Legitimate administrative scripts
level: high

⚙️ Configuration

CLI Configuration (~/.sigma-cli/config.yaml)

# API Keys for enhanced functionality
api_keys:
  nvd_api_key: "your_nvd_key"           # Optional: 5→50 req/30s rate limit
  github_token: "your_github_token"     # Optional: Enhanced PoC analysis
  openai_api_key: "your_openai_key"     # Optional: AI rule generation
  anthropic_api_key: "your_anthropic_key" # Optional: AI rule generation

# LLM Settings  
llm_settings:
  default_provider: "ollama"           # Default: ollama (local)
  default_model: "llama3.2"           # Provider-specific model
  ollama_base_url: "http://localhost:11434"

# Processing Settings
processing:
  default_batch_size: 50               # CVEs per batch
  default_methods: ["template"]       # Default generation methods

API Keys Setup

NVD API Key (Recommended)

GitHub Token (Optional)

LLM APIs (Optional)

🧠 AI-Enhanced Rule Generation

How It Works

  1. CVE Analysis: Extract vulnerability details from NVD data
  2. PoC Collection: Gather exploit code from nomi-sec, GitHub, ExploitDB
  3. Quality Assessment: Score PoCs based on stars, recency, completeness
  4. AI Enhancement: LLM analyzes actual exploit code to create detection logic
  5. SIGMA Generation: Produce valid, tested SIGMA rules with proper syntax
  6. Multi-Variant Output: Generate template, LLM, and hybrid versions

Quality Tiers

  • Excellent (80+ pts): High-star PoCs with recent updates, detailed analysis
  • Good (60-79 pts): Moderate quality with some validation
  • Fair (40-59 pts): Basic PoCs with minimal indicators
  • Poor (20-39 pts): Low-quality or outdated PoCs
  • Very Poor (<20 pts): Minimal or unreliable PoCs

Rule Variants Generated

  • 🤖 AI-Enhanced (rule_llm_*.sigma): LLM analysis of actual exploit code
  • 🔧 Template-Based (rule_template.sigma): Pattern-based generation
  • Hybrid (rule_hybrid.sigma): Best of both approaches

📊 Advanced Features

Search & Analytics

# Complex CVE searches
./cli/sigma_cli.py search cve "remote code execution" \
  --year 2024 --severity critical --has-poc --has-rules --limit 50

# Rule analysis  
./cli/sigma_cli.py search rules "powershell" \
  --rule-type process --method llm --limit 20

# Comprehensive statistics
./cli/sigma_cli.py stats overview                    # Overall system stats
./cli/sigma_cli.py stats poc --year 2024            # PoC coverage analysis
./cli/sigma_cli.py stats rules --method llm         # AI generation statistics

Export & Integration

# Export for SIEM integration
./cli/sigma_cli.py export sigma ./siem-rules \
  --format yaml --year 2024 --method llm

# Metadata for analysis
./cli/sigma_cli.py export metadata ./analysis/cve-data.csv \
  --format csv --year 2024

# Consolidated ruleset  
./cli/sigma_cli.py export ruleset ./complete-rules.json \
  --year 2024 --include-metadata

🛠️ Development & Legacy Support

CLI Development

The new CLI system is built with:

  • Click: Professional CLI framework
  • Modular Commands: Separate modules for each command group
  • Async Processing: Efficient handling of bulk operations
  • File-Based Storage: Git-friendly YAML and JSON formats

Legacy Web Interface (Optional)

The original web interface is still available for migration purposes:

# Start legacy web interface (if needed for migration)
docker-compose up -d db redis backend frontend

# Access points:
# - Frontend: http://localhost:3000  
# - API: http://localhost:8000
# - Flower (Celery): http://localhost:5555

Migration Path

  1. Export Data: Use CLI migration tools to export from database
  2. Validate: Verify all data transferred correctly
  3. Switch: Use CLI for all new operations
  4. Cleanup: Optionally remove web components

🔧 Troubleshooting

Common Issues

CLI Import Errors

  • Ensure you're running from project root directory
  • Install dependencies: pip install -r cli/requirements.txt
  • Check Python version (3.8+ required)

CVE Processing Failures

  • Verify NVD API key in configuration
  • Check network connectivity and rate limits
  • Use --verbose flag for detailed logging

No Rules Generated

  • Ensure LLM provider is accessible (test with ./cli/sigma_cli.py stats overview)
  • Check PoC data availability with --has-poc filter
  • Verify API keys for external LLM providers

File Permission Issues

  • Ensure write permissions to cves/ directory
  • Check CLI executable permissions: chmod +x cli/sigma_cli.py

Performance Optimization

  • Use --batch-size parameter for large datasets
  • Process recent years first (2020+) for faster initial results
  • Use incremental processing for regular updates
  • Monitor system resources during bulk operations

🛡️ Security Best Practices

  • Store API keys in configuration file (~/.sigma-cli/config.yaml)
  • Validate generated rules before production deployment
  • Rules marked as "experimental" require analyst review
  • Use version control to track rule changes and improvements
  • Regularly update PoC data sources for current threat landscape

📈 Monitoring & Maintenance

# System health checks
./cli/sigma_cli.py stats overview                   # Overall system status
./cli/sigma_cli.py migrate validate                 # Data integrity check

# Regular maintenance
./cli/sigma_cli.py process incremental --days 7     # Weekly updates  
./cli/sigma_cli.py generate regenerate --filter-quality excellent  # Refresh high-quality rules

# Performance monitoring
./cli/sigma_cli.py stats rules --year 2024         # Generation statistics
./cli/sigma_cli.py stats poc --year 2024           # Coverage analysis

🗺️ Roadmap

CLI Enhancements

  • Rule quality scoring and validation
  • Custom template editor
  • Integration with popular SIEM platforms
  • Advanced MITRE ATT&CK mapping
  • Threat intelligence feed integration

Export Features

  • Splunk app export format
  • Elastic Stack integration
  • QRadar rule format
  • YARA rule generation
  • IOC extraction

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Test with both CLI and legacy systems
  4. Add tests and documentation
  5. Submit a pull request

📞 Support

CLI Issues

  • Check cli/README.md for detailed CLI documentation
  • Use --verbose flag for debugging
  • Ensure proper configuration in ~/.sigma-cli/config.yaml

General Support

  • Review troubleshooting section above
  • Check application logs with --verbose
  • Open GitHub issue with specific error details

🎉 What's New in v2.0

Complete CLI System - Professional command-line interface File-Based Storage - Git-friendly YAML and JSON files
Multiple Rule Variants - Template, AI, and hybrid generation Advanced Search - Complex filtering and analytics Export Tools - Multiple output formats for different workflows Migration Tools - Seamless transition from web application Portable Architecture - No database dependency, runs anywhere

Perfect for cybersecurity teams who want production-ready SIGMA rules with version control integration! 🚀