This commit adds complete Docker Compose support to the CLI application, making it easy to run the SIGMA rule generator in a containerized environment: DOCKER INFRASTRUCTURE: - docker-compose.yml: Complete service orchestration (CLI app, PostgreSQL, Redis, optional Ollama) - Dockerfile: Optimized CLI application container with all dependencies - init.sql: Database initialization for PostgreSQL - .env.example: Updated environment configuration for both Docker and native setups - Makefile: Convenient commands for Docker operations (setup, up, down, shell, cli execution) DOCUMENTATION UPDATES: - README.md: Comprehensive Docker vs Native comparison with detailed usage examples - CLAUDE.md: Updated project guidance with Docker Compose as recommended approach - Added step-by-step setup instructions for both deployment methods - Included command examples for both Docker Compose and native execution DOCKER SERVICES: - sigma-cli: Main CLI application container with volume mounts for data persistence - db: PostgreSQL database for legacy migrations and data processing - redis: Redis cache for performance optimization - ollama: Optional local LLM service (profile-based) DATA PERSISTENCE: - Host-mounted directories: ./cves/, ./reports/, ./logs/, ./backend/templates/ - Named volumes: postgres_data, redis_data, ollama_data - Complete data preservation between container restarts This provides users with multiple deployment options: 1. Quick Docker Compose setup (recommended for testing/evaluation) 2. Native installation (recommended for production/development) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
14 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview - CLI-Based Architecture (v2.0)
This is an enhanced CVE-SIGMA Auto Generator that has been transformed from a web application to a professional CLI tool with file-based SIGMA rule management. The system now supports:
- Bulk NVD Data Processing: Downloads and processes complete NVD JSON datasets (2002-2025)
- nomi-sec PoC Integration: Uses curated PoC data from github.com/nomi-sec/PoC-in-GitHub
- Enhanced SIGMA Rule Generation: Creates intelligent rules based on real exploit indicators
- Comprehensive Database Seeding: Supports both bulk and incremental data updates
Architecture - CLI-Based System
Current Primary Architecture (v2.0)
- CLI Interface: Professional command-line tool (
cli/sigma_cli.py
) with modular commands - File-Based Storage: Git-friendly YAML and JSON files organized by year/CVE-ID
- Directory Structure:
cves/YEAR/CVE-ID/
: Individual CVE directories with metadata and multiple rule variantscli/commands/
: Modular command system (process, generate, search, stats, export, migrate)reports/
: Generated statistics and export outputs
- Data Processing:
- Reuses existing backend processors for CVE fetching and analysis
- File-based rule generation with multiple variants per CVE
- CLI-driven bulk operations and incremental updates
- Storage Format:
metadata.json
: CVE information, PoC data, processing historyrule_*.sigma
: Multiple SIGMA rule variants (template, LLM, hybrid)poc_analysis.json
: Extracted exploit indicators and analysis
Database Components (For Migration Only)
- Database Models:
backend/database_models.py
- SQLAlchemy models for data migration - Legacy Support: Core data processors maintained for CLI integration
- Migration Tools: Complete CLI-based migration utilities from legacy database
Common Development Commands
Docker Compose Setup (Recommended)
# Quick start with Docker Compose
cp .env.example .env # Edit with your API keys (optional)
docker-compose up -d # Start all services (db, redis, CLI container)
# Access CLI in container
docker-compose exec sigma-cli bash
# Run CLI commands in container
docker-compose exec sigma-cli python cli/sigma_cli.py --help
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
# Use Makefile shortcuts
make setup # Initial setup
make up # Start services
make shell # Access CLI shell
make cli CMD="stats overview" # Run specific CLI commands
Native CLI Installation (Alternative)
# Install CLI dependencies
pip install -r backend/requirements.txt
pip install click rich tabulate pyyaml
# Make CLI executable
chmod +x cli/sigma_cli.py
# Initialize configuration
./cli/sigma_cli.py config-init
# Test CLI installation
./cli/sigma_cli.py --help
CLI Primary Operations
# Process CVEs and generate SIGMA rules
./cli/sigma_cli.py process year 2024 # Process specific year
./cli/sigma_cli.py process cve CVE-2024-0001 # Process specific CVE
./cli/sigma_cli.py process bulk --start-year 2020 # Bulk process years
./cli/sigma_cli.py process incremental --days 7 # Process recent changes
# Generate rules for existing CVEs
./cli/sigma_cli.py generate cve CVE-2024-0001 --method all
./cli/sigma_cli.py generate regenerate --year 2024 --method llm
# Search and analyze
./cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
./cli/sigma_cli.py search rules "powershell" --method llm
# Statistics and reports
./cli/sigma_cli.py stats overview --year 2024
./cli/sigma_cli.py stats poc --year 2024
./cli/sigma_cli.py stats rules --method template
# Export data
./cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
./cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv
Migration from Web Application
# Migrate existing database to file structure
./cli/sigma_cli.py migrate from-database --database-url "postgresql://user:pass@localhost:5432/db"
# Validate migrated data
./cli/sigma_cli.py migrate validate --year 2024
# Check migration statistics
./cli/sigma_cli.py stats overview
Database Migration Support
# If you have an existing PostgreSQL database with CVE data
export DATABASE_URL="postgresql://user:pass@localhost:5432/cve_sigma_db"
# Migrate database to CLI file structure
./cli/sigma_cli.py migrate from-database --database-url $DATABASE_URL
Development and Testing
# CLI with verbose logging
./cli/sigma_cli.py --verbose process year 2024
# Test individual commands
./cli/sigma_cli.py version
./cli/sigma_cli.py config-init
./cli/sigma_cli.py stats overview
# Check file structure
ls -la cves/2024/ # View processed CVEs
ls -la cves/2024/CVE-2024-0001/ # View individual CVE files
Key Configuration
Environment Variables (.env)
NVD_API_KEY
: Optional NVD API key for higher rate limits (5→50 requests/30s)GITHUB_TOKEN
: Optional GitHub token for exploit analysis (enhances rule generation)OPENAI_API_KEY
: Optional OpenAI API key for AI-enhanced SIGMA rule generationANTHROPIC_API_KEY
: Optional Anthropic API key for AI-enhanced SIGMA rule generationOLLAMA_BASE_URL
: Optional Ollama base URL for local model AI-enhanced SIGMA rule generationLLM_PROVIDER
: Optional LLM provider selection (openai, anthropic, ollama)LLM_MODEL
: Optional LLM model selection (provider-specific)DATABASE_URL
: PostgreSQL connection stringREACT_APP_API_URL
: Backend API URL for frontend
CLI Configuration
- Configuration File:
~/.sigma-cli/config.yaml
(auto-created withconfig-init
) - Directory Structure:
cves/YEAR/CVE-ID/
: Individual CVE data and rulesreports/
: Generated statistics and exportscli/
: Command-line tool and modules
Database Connection (For Migration Only)
- PostgreSQL: localhost:5432 (if migrating from legacy database)
- Connection String: Set via DATABASE_URL environment variable
Enhanced API Endpoints
Bulk Processing
POST /api/bulk-seed
- Start complete bulk seeding (NVD + nomi-sec)POST /api/incremental-update
- Update with NVD modified/recent feedsPOST /api/sync-nomi-sec
- Synchronize nomi-sec PoC dataPOST /api/regenerate-rules
- Regenerate SIGMA rules with enhanced dataGET /api/bulk-jobs
- Get bulk processing job statusGET /api/bulk-status
- Get comprehensive system statusGET /api/poc-stats
- Get PoC-related statistics
Enhanced Data Access
GET /api/stats
- Enhanced statistics with PoC coverageGET /api/claude-status
- Get Claude API availability status- All existing CVE and SIGMA rule endpoints now include enhanced data fields
LLM-Enhanced Rule Generation
POST /api/llm-enhanced-rules
- Generate SIGMA rules using LLM AI analysis (supports multiple providers)GET /api/llm-status
- Check LLM API availability and configuration for all providersPOST /api/llm-switch
- Switch between LLM providers and models
Code Architecture Details
CLI Structure (Primary)
- cli/sigma_cli.py: Main executable CLI with Click framework
- cli/commands/: Modular command system
base_command.py
: Common functionality and file operationsprocess_commands.py
: CVE processing and bulk operationsgenerate_commands.py
: SIGMA rule generationsearch_commands.py
: Search and filteringstats_commands.py
: Statistics and reportingexport_commands.py
: Data export in multiple formatsmigrate_commands.py
: Database migration tools
- cli/config/: Configuration management
- cli/README.md: Detailed CLI documentation
File-Based Storage Structure
- CVE Directories:
cves/YEAR/CVE-ID/
with individual metadata and rule files - Rule Variants: Multiple SIGMA files per CVE (template, LLM, hybrid)
- Metadata Format: JSON files with processing history and PoC data
- Reports: Generated statistics and export outputs
Backend Data Processors (Reused by CLI)
- database_models.py: SQLAlchemy models for data migration
- Data Processors: Core processing logic reused by CLI
nvd_bulk_processor.py
: NVD JSON dataset processingnomi_sec_client.py
: nomi-sec PoC integrationenhanced_sigma_generator.py
: SIGMA rule generationllm_client.py
: Multi-provider LLM integrationpoc_analyzer.py
: PoC content analysis
CLI-Based Data Processing Flow
- CVE Processing: NVD data fetch → File storage → PoC analysis → Metadata generation
- Rule Generation: Template/LLM/Hybrid generation → Multiple rule variants → File storage
- Search & Analysis: File-based searching → Statistics generation → Export capabilities
- Migration Support: Database export → File conversion → Validation → Cleanup
Legacy Web Processing Flow (For Reference)
- Bulk Seeding: NVD JSON downloads → Database storage → nomi-sec PoC sync → Enhanced rule generation
- Incremental Updates: NVD modified feeds → Update existing data → Sync new PoCs
- Rule Enhancement: PoC analysis → Indicator extraction → Template selection → Enhanced SIGMA rule
- LLM-Enhanced Generation: PoC content analysis → Multi-provider LLM processing → Advanced SIGMA rule creation
Development Notes
Enhanced Rule Generation Logic
The application now uses an advanced rule generation process:
- CVE Analysis: Extract metadata from NVD bulk data
- PoC Quality Assessment: nomi-sec PoC analysis with star count, recency, quality tiers
- Advanced Indicator Extraction: Processes, files, network, registry, commands from PoC repositories
- Template Selection: Smart template matching based on PoC indicators and CVE characteristics
- Enhanced Rule Population: Incorporate real exploit indicators with quality scoring
- MITRE ATT&CK Mapping: Automatic technique identification based on indicators
- LLM AI Enhancement: Optional multi-provider LLM integration for intelligent rule generation from PoC code analysis
Quality Tiers
- Excellent (80+ points): High star count, recent updates, detailed descriptions
- Good (60-79 points): Moderate quality indicators
- Fair (40-59 points): Basic PoC with some quality indicators
- Poor (20-39 points): Minimal quality indicators
- Very Poor (<20 points): Low-quality PoCs
Multi-Provider LLM Integration Features
- Multiple LLM Providers: Support for OpenAI, Anthropic, and Ollama (local models)
- Dynamic Provider Switching: Switch between providers and models through UI or API
- Intelligent Code Analysis: LLMs analyze actual exploit code from PoC repositories
- Advanced Rule Generation: Creates sophisticated SIGMA rules with proper syntax and logic
- Contextual Understanding: Interprets CVE descriptions and maps them to appropriate detection patterns
- Automatic Validation: Generated rules are validated for SIGMA syntax compliance
- Fallback Mechanism: Automatically falls back to template-based generation if LLM is unavailable
- Enhanced Metadata: Rules include generation method tracking for quality assessment
- LangChain Integration: Uses LangChain for robust LLM integration and prompt management
Supported LLM Providers and Models
OpenAI
- API Key: Set
OPENAI_API_KEY
environment variable - Supported Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
- Default Model: gpt-4o-mini
- Rate Limits: Based on OpenAI API limits
Anthropic
- API Key: Set
ANTHROPIC_API_KEY
environment variable - Supported Models: claude-3-5-sonnet-20241022, claude-3-haiku-20240307, claude-3-opus-20240229
- Default Model: claude-3-5-sonnet-20241022
- Rate Limits: Based on Anthropic API limits
Ollama (Local Models)
- Setup: Install Ollama locally and set
OLLAMA_BASE_URL
(default: http://localhost:11434) - Supported Models: llama3.2, codellama, mistral, llama2 (any Ollama-compatible model)
- Default Model: llama3.2
- Rate Limits: No external API limits (local processing)
Testing and Validation
- Frontend tests:
npm test
(in frontend directory) - Backend testing: Use standalone scripts for bulk operations
- API testing: Use
/docs
endpoint for Swagger UI - Task Monitoring: Monitor via Flower dashboard at http://localhost:5555
- Celery Tasks: Use
celery -A celery_config worker --loglevel=info
for debugging
Security Considerations
- API Keys: Store NVD and GitHub tokens in environment variables
- PoC Analysis: Automated analysis of curated PoC repositories (safer than raw GitHub search)
- Rate Limiting: Built-in rate limiting for external APIs
- Data Validation: Enhanced validation for bulk data processing
- Audit Trail: Job tracking for all bulk operations
Troubleshooting
Common Issues
- Bulk Processing Failures: Check
/api/bulk-jobs
for detailed error messages - NVD Data Download Issues: Verify NVD API key and network connectivity
- nomi-sec API Timeouts: Built-in retry logic, check network connectivity
- Frontend build errors: Run
npm install
in frontend directory - Database schema changes: Restart backend to auto-create new tables
- Memory issues during bulk processing: Monitor system resources, consider smaller batch sizes
Enhanced Rate Limits
- NVD API: 5 requests/30s (no key) → 50 requests/30s (with key)
- nomi-sec API: 1 request/second (built-in rate limiting)
- GitHub API (fallback): 60 requests/hour (no token) → 5000 requests/hour (with token)
Performance Optimization
- Bulk Processing: Start with recent years (2020+) for faster initial setup
- PoC Sync: Use smaller batch sizes (50) for better stability
- Rule Generation: Monitor quality scores to prioritize high-value PoCs
- Database: Ensure proper indexing on CVE ID and PoC fields
Monitoring
- Frontend: Use Bulk Jobs tab for real-time progress monitoring
- Backend logs:
docker-compose logs -f backend
- Job status: Check
/api/bulk-status
for comprehensive system health - Database: Monitor PoC coverage percentage and rule enhancement progress