This commit adds complete Docker Compose support to the CLI application, making it easy to run the SIGMA rule generator in a containerized environment: DOCKER INFRASTRUCTURE: - docker-compose.yml: Complete service orchestration (CLI app, PostgreSQL, Redis, optional Ollama) - Dockerfile: Optimized CLI application container with all dependencies - init.sql: Database initialization for PostgreSQL - .env.example: Updated environment configuration for both Docker and native setups - Makefile: Convenient commands for Docker operations (setup, up, down, shell, cli execution) DOCUMENTATION UPDATES: - README.md: Comprehensive Docker vs Native comparison with detailed usage examples - CLAUDE.md: Updated project guidance with Docker Compose as recommended approach - Added step-by-step setup instructions for both deployment methods - Included command examples for both Docker Compose and native execution DOCKER SERVICES: - sigma-cli: Main CLI application container with volume mounts for data persistence - db: PostgreSQL database for legacy migrations and data processing - redis: Redis cache for performance optimization - ollama: Optional local LLM service (profile-based) DATA PERSISTENCE: - Host-mounted directories: ./cves/, ./reports/, ./logs/, ./backend/templates/ - Named volumes: postgres_data, redis_data, ollama_data - Complete data preservation between container restarts This provides users with multiple deployment options: 1. Quick Docker Compose setup (recommended for testing/evaluation) 2. Native installation (recommended for production/development) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
---|---|---|
backend | ||
cli | ||
exploit-db-mirror@99e10e9ba8 | ||
github_poc_collector@5c171fb9a9 | ||
models/sigma_llama_finetuned | ||
.env.example | ||
.gitignore | ||
.gitmodules | ||
CLAUDE.md | ||
docker-compose.yml | ||
Dockerfile | ||
init.sql | ||
Makefile | ||
README.md |
CVE-SIGMA Auto Generator - CLI Edition
Professional file-based SIGMA rule generation system for cybersecurity workflows
Automated CLI tool that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis. Now optimized for git workflows and production SIGMA rule management with a file-based architecture.
🌟 Major Architecture Update
🎉 New in v2.0: Transformed from web application to professional CLI tool with file-based SIGMA rule management!
- Git-Friendly: Native YAML files perfect for version control
- Industry Standard: Direct integration with SIGMA ecosystems
- Portable: No database dependency, works anywhere
- Scalable: Process specific years/CVEs as needed
- Multiple Variants: Different generation methods per CVE
✨ Key Features
- Bulk CVE Processing: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
- AI-Powered Rule Generation: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
- File-Based Storage: Organized directory structure for each CVE and rule variant
- Quality-Based PoC Analysis: 5-tier quality scoring system for exploit reliability
- Advanced Search & Filtering: Find CVEs and rules with complex criteria
- Comprehensive Statistics: Coverage reports and generation analytics
- Export Tools: Multiple output formats for different workflows
🚀 Quick Start
Prerequisites
- Python 3.8+ with pip OR Docker & Docker Compose
- (Optional) API keys for enhanced features
🐳 Docker Compose Setup (Recommended)
The easiest way to get started is with Docker Compose, which provides isolated environment with all dependencies:
# Clone repository
git clone <repository-url>
cd auto_sigma_rule_generator
# Copy environment configuration
cp .env.example .env
# Edit .env with your API keys (optional but recommended)
# Start services (database, redis, CLI container)
docker-compose up -d
# Access the CLI in the container
docker-compose exec sigma-cli bash
# Inside container, run CLI commands
python cli/sigma_cli.py --help
python cli/sigma_cli.py process year 2024
python cli/sigma_cli.py stats overview
Docker Compose Services
- sigma-cli: Main CLI application container
- db: PostgreSQL database for migration/legacy data
- redis: Redis cache for performance
- ollama: (Optional) Local LLM server with
--profile ollama
Docker Commands
# Start all services
docker-compose up -d
# Start with local Ollama LLM
docker-compose --profile ollama up -d
# View logs
docker-compose logs -f sigma-cli
# Access CLI interactively
docker-compose exec sigma-cli bash
# Run specific CLI commands
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
# Stop services
docker-compose down
# Clean up (removes data volumes)
docker-compose down -v
Data Persistence
Docker Compose persists data in:
./cves/
- CVE data and SIGMA rules (mounted from host)./reports/
- Generated reports (mounted from host)postgres_data
- Database volumeredis_data
- Redis cache volumeollama_data
- Ollama models volume (if using --profile ollama)
🔧 Native Installation
For direct installation without Docker:
# Clone repository
git clone <repository-url>
cd auto_sigma_rule_generator
# Install CLI dependencies
pip install -r backend/requirements.txt
pip install click rich tabulate pyyaml
# Make CLI executable
chmod +x cli/sigma_cli.py
# Initialize configuration
./cli/sigma_cli.py config-init
# Set up database (optional, for migrations)
sudo -u postgres createdb cve_sigma_db
sudo -u postgres createuser cve_user
📋 Docker vs Native Comparison
Feature | Docker Compose | Native Installation |
---|---|---|
Setup Time | ~2 minutes | ~10 minutes |
Dependencies | Automatic | Manual setup required |
Database | Included PostgreSQL | Manual PostgreSQL setup |
Local LLM | Optional Ollama service | Manual Ollama installation |
Isolation | Complete environment | Uses system Python |
Resource Usage | Higher (containers) | Lower (direct) |
Best For | Quick start, testing | Production, development |
Command Examples: Docker vs Native
# Docker Compose Usage
docker-compose up -d # Start services
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py stats overview
# Native Usage
./cli/sigma_cli.py process year 2024 # Direct execution
./cli/sigma_cli.py stats overview
🚀 First Run Examples
First Run - Migration from Web App (If Applicable)
# If migrating from previous web version
./cli/sigma_cli.py migrate from-database --database-url "postgresql://user:pass@localhost:5432/db"
# Validate migration
./cli/sigma_cli.py migrate validate
# Or start fresh with new CVE processing
./cli/sigma_cli.py process year 2024
🎯 CLI Usage
Core Commands
Native Commands:
# Process CVEs and generate rules
./cli/sigma_cli.py process year 2024 # Process specific year
./cli/sigma_cli.py process cve CVE-2024-0001 # Process specific CVE
./cli/sigma_cli.py process bulk --start-year 2020 # Bulk process multiple years
./cli/sigma_cli.py process incremental --days 7 # Process recent changes
# Generate rules for existing CVEs
./cli/sigma_cli.py generate cve CVE-2024-0001 --method all # All generation methods
./cli/sigma_cli.py generate regenerate --year 2024 --method llm # Regenerate with LLM
# Search CVEs and rules
./cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
./cli/sigma_cli.py search rules "powershell" --method llm
# View statistics and reports
./cli/sigma_cli.py stats overview --year 2024 --output ./reports/2024-stats.json
./cli/sigma_cli.py stats poc --year 2024 # PoC coverage statistics
./cli/sigma_cli.py stats rules --method template # Rule generation statistics
# Export data
./cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
./cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv
Docker Compose Commands:
# Process CVEs and generate rules
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py process cve CVE-2024-0001
docker-compose exec sigma-cli python cli/sigma_cli.py process bulk --start-year 2020
docker-compose exec sigma-cli python cli/sigma_cli.py process incremental --days 7
# Generate rules for existing CVEs
docker-compose exec sigma-cli python cli/sigma_cli.py generate cve CVE-2024-0001 --method all
docker-compose exec sigma-cli python cli/sigma_cli.py generate regenerate --year 2024 --method llm
# Search CVEs and rules
docker-compose exec sigma-cli python cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
docker-compose exec sigma-cli python cli/sigma_cli.py search rules "powershell" --method llm
# View statistics and reports
docker-compose exec sigma-cli python cli/sigma_cli.py stats overview --year 2024 --output ./reports/2024-stats.json
docker-compose exec sigma-cli python cli/sigma_cli.py stats poc --year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py stats rules --method template
# Export data
docker-compose exec sigma-cli python cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
docker-compose exec sigma-cli python cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv
# Interactive shell access
docker-compose exec sigma-cli bash # Access container shell
Available Generation Methods
template
- Template-based rule generationllm
- AI/LLM-enhanced generation (OpenAI, Anthropic, Ollama)hybrid
- Combined template + LLM approachall
- Generate all variants
📁 File Structure
The CLI organizes everything in a clean, git-friendly structure:
auto_sigma_rule_generator/
├── cves/ # CVE data organized by year
│ ├── 2024/
│ │ ├── CVE-2024-0001/
│ │ │ ├── metadata.json # CVE info & generation metadata
│ │ │ ├── rule_template.sigma # Template-based rule
│ │ │ ├── rule_llm_openai.sigma # OpenAI-generated rule
│ │ │ ├── rule_llm_anthropic.sigma# Anthropic-generated rule
│ │ │ ├── rule_hybrid.sigma # Hybrid-generated rule
│ │ │ └── poc_analysis.json # PoC analysis data
│ │ └── CVE-2024-0002/...
│ └── 2023/...
├── cli/ # CLI tool and commands
│ ├── sigma_cli.py # Main CLI executable
│ ├── commands/ # Command modules
│ └── README.md # Detailed CLI documentation
└── reports/ # Generated reports and exports
File Formats
metadata.json - CVE information and processing history
{
"cve_info": {
"cve_id": "CVE-2024-0001",
"description": "Remote code execution vulnerability...",
"cvss_score": 9.8,
"severity": "critical",
"published_date": "2024-01-01T00:00:00Z"
},
"poc_data": {
"poc_count": 3,
"poc_data": {"nomi_sec": [...], "github": [...]}
},
"rule_generation": {
"template": {"generated_at": "2024-01-01T12:00:00Z"},
"llm_openai": {"generated_at": "2024-01-01T12:30:00Z"}
}
}
SIGMA Rule Files - Ready-to-use detection rules
# rule_llm_openai.sigma
title: CVE-2024-0001 Remote Code Execution Detection
id: 12345678-1234-5678-9abc-123456789012
status: experimental
description: Detects exploitation attempts for CVE-2024-0001
author: CVE-SIGMA Auto Generator (OpenAI Enhanced)
date: 2024/01/01
references:
- https://nvd.nist.gov/vuln/detail/CVE-2024-0001
tags:
- attack.t1059.001
- cve.2024.0001
- ai.enhanced
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '\powershell.exe'
CommandLine|contains:
- '-EncodedCommand'
- 'bypass'
condition: selection
falsepositives:
- Legitimate administrative scripts
level: high
⚙️ Configuration
CLI Configuration (~/.sigma-cli/config.yaml
)
# API Keys for enhanced functionality
api_keys:
nvd_api_key: "your_nvd_key" # Optional: 5→50 req/30s rate limit
github_token: "your_github_token" # Optional: Enhanced PoC analysis
openai_api_key: "your_openai_key" # Optional: AI rule generation
anthropic_api_key: "your_anthropic_key" # Optional: AI rule generation
# LLM Settings
llm_settings:
default_provider: "ollama" # Default: ollama (local)
default_model: "llama3.2" # Provider-specific model
ollama_base_url: "http://localhost:11434"
# Processing Settings
processing:
default_batch_size: 50 # CVEs per batch
default_methods: ["template"] # Default generation methods
API Keys Setup
NVD API Key (Recommended)
- Get key: https://nvd.nist.gov/developers/request-an-api-key
- Benefit: 10x rate limit increase (5 → 50 requests/30s)
GitHub Token (Optional)
- Create: https://github.com/settings/tokens (public_repo scope)
- Benefit: Enhanced PoC analysis and exploit indicators
LLM APIs (Optional)
- Local Ollama: No setup required (default) - runs locally
- OpenAI: Get key from https://platform.openai.com/api-keys
- Anthropic: Get key from https://console.anthropic.com/
🧠 AI-Enhanced Rule Generation
How It Works
- CVE Analysis: Extract vulnerability details from NVD data
- PoC Collection: Gather exploit code from nomi-sec, GitHub, ExploitDB
- Quality Assessment: Score PoCs based on stars, recency, completeness
- AI Enhancement: LLM analyzes actual exploit code to create detection logic
- SIGMA Generation: Produce valid, tested SIGMA rules with proper syntax
- Multi-Variant Output: Generate template, LLM, and hybrid versions
Quality Tiers
- Excellent (80+ pts): High-star PoCs with recent updates, detailed analysis
- Good (60-79 pts): Moderate quality with some validation
- Fair (40-59 pts): Basic PoCs with minimal indicators
- Poor (20-39 pts): Low-quality or outdated PoCs
- Very Poor (<20 pts): Minimal or unreliable PoCs
Rule Variants Generated
- 🤖 AI-Enhanced (
rule_llm_*.sigma
): LLM analysis of actual exploit code - 🔧 Template-Based (
rule_template.sigma
): Pattern-based generation - ⚡ Hybrid (
rule_hybrid.sigma
): Best of both approaches
📊 Advanced Features
Search & Analytics
# Complex CVE searches
./cli/sigma_cli.py search cve "remote code execution" \
--year 2024 --severity critical --has-poc --has-rules --limit 50
# Rule analysis
./cli/sigma_cli.py search rules "powershell" \
--rule-type process --method llm --limit 20
# Comprehensive statistics
./cli/sigma_cli.py stats overview # Overall system stats
./cli/sigma_cli.py stats poc --year 2024 # PoC coverage analysis
./cli/sigma_cli.py stats rules --method llm # AI generation statistics
Export & Integration
# Export for SIEM integration
./cli/sigma_cli.py export sigma ./siem-rules \
--format yaml --year 2024 --method llm
# Metadata for analysis
./cli/sigma_cli.py export metadata ./analysis/cve-data.csv \
--format csv --year 2024
# Consolidated ruleset
./cli/sigma_cli.py export ruleset ./complete-rules.json \
--year 2024 --include-metadata
🛠️ Development & Legacy Support
CLI Development
The new CLI system is built with:
- Click: Professional CLI framework
- Modular Commands: Separate modules for each command group
- Async Processing: Efficient handling of bulk operations
- File-Based Storage: Git-friendly YAML and JSON formats
Legacy Web Interface (Optional)
The original web interface is still available for migration purposes:
# Start legacy web interface (if needed for migration)
docker-compose up -d db redis backend frontend
# Access points:
# - Frontend: http://localhost:3000
# - API: http://localhost:8000
# - Flower (Celery): http://localhost:5555
Migration Path
- Export Data: Use CLI migration tools to export from database
- Validate: Verify all data transferred correctly
- Switch: Use CLI for all new operations
- Cleanup: Optionally remove web components
🔧 Troubleshooting
Common Issues
CLI Import Errors
- Ensure you're running from project root directory
- Install dependencies:
pip install -r cli/requirements.txt
- Check Python version (3.8+ required)
CVE Processing Failures
- Verify NVD API key in configuration
- Check network connectivity and rate limits
- Use
--verbose
flag for detailed logging
No Rules Generated
- Ensure LLM provider is accessible (test with
./cli/sigma_cli.py stats overview
) - Check PoC data availability with
--has-poc
filter - Verify API keys for external LLM providers
File Permission Issues
- Ensure write permissions to
cves/
directory - Check CLI executable permissions:
chmod +x cli/sigma_cli.py
Performance Optimization
- Use
--batch-size
parameter for large datasets - Process recent years first (2020+) for faster initial results
- Use
incremental
processing for regular updates - Monitor system resources during bulk operations
🛡️ Security Best Practices
- Store API keys in configuration file (
~/.sigma-cli/config.yaml
) - Validate generated rules before production deployment
- Rules marked as "experimental" require analyst review
- Use version control to track rule changes and improvements
- Regularly update PoC data sources for current threat landscape
📈 Monitoring & Maintenance
# System health checks
./cli/sigma_cli.py stats overview # Overall system status
./cli/sigma_cli.py migrate validate # Data integrity check
# Regular maintenance
./cli/sigma_cli.py process incremental --days 7 # Weekly updates
./cli/sigma_cli.py generate regenerate --filter-quality excellent # Refresh high-quality rules
# Performance monitoring
./cli/sigma_cli.py stats rules --year 2024 # Generation statistics
./cli/sigma_cli.py stats poc --year 2024 # Coverage analysis
🗺️ Roadmap
CLI Enhancements
- Rule quality scoring and validation
- Custom template editor
- Integration with popular SIEM platforms
- Advanced MITRE ATT&CK mapping
- Threat intelligence feed integration
Export Features
- Splunk app export format
- Elastic Stack integration
- QRadar rule format
- YARA rule generation
- IOC extraction
📝 License
MIT License - see LICENSE file for details.
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Test with both CLI and legacy systems
- Add tests and documentation
- Submit a pull request
📞 Support
CLI Issues
- Check
cli/README.md
for detailed CLI documentation - Use
--verbose
flag for debugging - Ensure proper configuration in
~/.sigma-cli/config.yaml
General Support
- Review troubleshooting section above
- Check application logs with
--verbose
- Open GitHub issue with specific error details
🎉 What's New in v2.0
✅ Complete CLI System - Professional command-line interface
✅ File-Based Storage - Git-friendly YAML and JSON files
✅ Multiple Rule Variants - Template, AI, and hybrid generation
✅ Advanced Search - Complex filtering and analytics
✅ Export Tools - Multiple output formats for different workflows
✅ Migration Tools - Seamless transition from web application
✅ Portable Architecture - No database dependency, runs anywhere
Perfect for cybersecurity teams who want production-ready SIGMA rules with version control integration! 🚀