This commit adds complete Docker Compose support to the CLI application, making it easy to run the SIGMA rule generator in a containerized environment: DOCKER INFRASTRUCTURE: - docker-compose.yml: Complete service orchestration (CLI app, PostgreSQL, Redis, optional Ollama) - Dockerfile: Optimized CLI application container with all dependencies - init.sql: Database initialization for PostgreSQL - .env.example: Updated environment configuration for both Docker and native setups - Makefile: Convenient commands for Docker operations (setup, up, down, shell, cli execution) DOCUMENTATION UPDATES: - README.md: Comprehensive Docker vs Native comparison with detailed usage examples - CLAUDE.md: Updated project guidance with Docker Compose as recommended approach - Added step-by-step setup instructions for both deployment methods - Included command examples for both Docker Compose and native execution DOCKER SERVICES: - sigma-cli: Main CLI application container with volume mounts for data persistence - db: PostgreSQL database for legacy migrations and data processing - redis: Redis cache for performance optimization - ollama: Optional local LLM service (profile-based) DATA PERSISTENCE: - Host-mounted directories: ./cves/, ./reports/, ./logs/, ./backend/templates/ - Named volumes: postgres_data, redis_data, ollama_data - Complete data preservation between container restarts This provides users with multiple deployment options: 1. Quick Docker Compose setup (recommended for testing/evaluation) 2. Native installation (recommended for production/development) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
313 lines
No EOL
14 KiB
Markdown
313 lines
No EOL
14 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview - CLI-Based Architecture (v2.0)
|
|
|
|
This is an enhanced CVE-SIGMA Auto Generator that has been **transformed from a web application to a professional CLI tool** with file-based SIGMA rule management. The system now supports:
|
|
|
|
1. **Bulk NVD Data Processing**: Downloads and processes complete NVD JSON datasets (2002-2025)
|
|
2. **nomi-sec PoC Integration**: Uses curated PoC data from github.com/nomi-sec/PoC-in-GitHub
|
|
3. **Enhanced SIGMA Rule Generation**: Creates intelligent rules based on real exploit indicators
|
|
4. **Comprehensive Database Seeding**: Supports both bulk and incremental data updates
|
|
|
|
## Architecture - CLI-Based System
|
|
|
|
### **Current Primary Architecture (v2.0)**
|
|
- **CLI Interface**: Professional command-line tool (`cli/sigma_cli.py`) with modular commands
|
|
- **File-Based Storage**: Git-friendly YAML and JSON files organized by year/CVE-ID
|
|
- **Directory Structure**:
|
|
- `cves/YEAR/CVE-ID/`: Individual CVE directories with metadata and multiple rule variants
|
|
- `cli/commands/`: Modular command system (process, generate, search, stats, export, migrate)
|
|
- `reports/`: Generated statistics and export outputs
|
|
- **Data Processing**:
|
|
- Reuses existing backend processors for CVE fetching and analysis
|
|
- File-based rule generation with multiple variants per CVE
|
|
- CLI-driven bulk operations and incremental updates
|
|
- **Storage Format**:
|
|
- `metadata.json`: CVE information, PoC data, processing history
|
|
- `rule_*.sigma`: Multiple SIGMA rule variants (template, LLM, hybrid)
|
|
- `poc_analysis.json`: Extracted exploit indicators and analysis
|
|
|
|
### **Database Components (For Migration Only)**
|
|
- **Database Models**: `backend/database_models.py` - SQLAlchemy models for data migration
|
|
- **Legacy Support**: Core data processors maintained for CLI integration
|
|
- **Migration Tools**: Complete CLI-based migration utilities from legacy database
|
|
|
|
## Common Development Commands
|
|
|
|
### **Docker Compose Setup (Recommended)**
|
|
```bash
|
|
# Quick start with Docker Compose
|
|
cp .env.example .env # Edit with your API keys (optional)
|
|
docker-compose up -d # Start all services (db, redis, CLI container)
|
|
|
|
# Access CLI in container
|
|
docker-compose exec sigma-cli bash
|
|
|
|
# Run CLI commands in container
|
|
docker-compose exec sigma-cli python cli/sigma_cli.py --help
|
|
docker-compose exec sigma-cli python cli/sigma_cli.py process year 2024
|
|
|
|
# Use Makefile shortcuts
|
|
make setup # Initial setup
|
|
make up # Start services
|
|
make shell # Access CLI shell
|
|
make cli CMD="stats overview" # Run specific CLI commands
|
|
```
|
|
|
|
### **Native CLI Installation (Alternative)**
|
|
```bash
|
|
# Install CLI dependencies
|
|
pip install -r backend/requirements.txt
|
|
pip install click rich tabulate pyyaml
|
|
|
|
# Make CLI executable
|
|
chmod +x cli/sigma_cli.py
|
|
|
|
# Initialize configuration
|
|
./cli/sigma_cli.py config-init
|
|
|
|
# Test CLI installation
|
|
./cli/sigma_cli.py --help
|
|
```
|
|
|
|
### **CLI Primary Operations**
|
|
```bash
|
|
# Process CVEs and generate SIGMA rules
|
|
./cli/sigma_cli.py process year 2024 # Process specific year
|
|
./cli/sigma_cli.py process cve CVE-2024-0001 # Process specific CVE
|
|
./cli/sigma_cli.py process bulk --start-year 2020 # Bulk process years
|
|
./cli/sigma_cli.py process incremental --days 7 # Process recent changes
|
|
|
|
# Generate rules for existing CVEs
|
|
./cli/sigma_cli.py generate cve CVE-2024-0001 --method all
|
|
./cli/sigma_cli.py generate regenerate --year 2024 --method llm
|
|
|
|
# Search and analyze
|
|
./cli/sigma_cli.py search cve "buffer overflow" --severity critical --has-poc
|
|
./cli/sigma_cli.py search rules "powershell" --method llm
|
|
|
|
# Statistics and reports
|
|
./cli/sigma_cli.py stats overview --year 2024
|
|
./cli/sigma_cli.py stats poc --year 2024
|
|
./cli/sigma_cli.py stats rules --method template
|
|
|
|
# Export data
|
|
./cli/sigma_cli.py export sigma ./output-rules --format yaml --year 2024
|
|
./cli/sigma_cli.py export metadata ./reports/cve-data.csv --format csv
|
|
```
|
|
|
|
### **Migration from Web Application**
|
|
```bash
|
|
# Migrate existing database to file structure
|
|
./cli/sigma_cli.py migrate from-database --database-url "postgresql://user:pass@localhost:5432/db"
|
|
|
|
# Validate migrated data
|
|
./cli/sigma_cli.py migrate validate --year 2024
|
|
|
|
# Check migration statistics
|
|
./cli/sigma_cli.py stats overview
|
|
```
|
|
|
|
### **Database Migration Support**
|
|
```bash
|
|
# If you have an existing PostgreSQL database with CVE data
|
|
export DATABASE_URL="postgresql://user:pass@localhost:5432/cve_sigma_db"
|
|
|
|
# Migrate database to CLI file structure
|
|
./cli/sigma_cli.py migrate from-database --database-url $DATABASE_URL
|
|
```
|
|
|
|
### **Development and Testing**
|
|
```bash
|
|
# CLI with verbose logging
|
|
./cli/sigma_cli.py --verbose process year 2024
|
|
|
|
# Test individual commands
|
|
./cli/sigma_cli.py version
|
|
./cli/sigma_cli.py config-init
|
|
./cli/sigma_cli.py stats overview
|
|
|
|
# Check file structure
|
|
ls -la cves/2024/ # View processed CVEs
|
|
ls -la cves/2024/CVE-2024-0001/ # View individual CVE files
|
|
```
|
|
|
|
## Key Configuration
|
|
|
|
### Environment Variables (.env)
|
|
- `NVD_API_KEY`: Optional NVD API key for higher rate limits (5→50 requests/30s)
|
|
- `GITHUB_TOKEN`: Optional GitHub token for exploit analysis (enhances rule generation)
|
|
- `OPENAI_API_KEY`: Optional OpenAI API key for AI-enhanced SIGMA rule generation
|
|
- `ANTHROPIC_API_KEY`: Optional Anthropic API key for AI-enhanced SIGMA rule generation
|
|
- `OLLAMA_BASE_URL`: Optional Ollama base URL for local model AI-enhanced SIGMA rule generation
|
|
- `LLM_PROVIDER`: Optional LLM provider selection (openai, anthropic, ollama)
|
|
- `LLM_MODEL`: Optional LLM model selection (provider-specific)
|
|
- `DATABASE_URL`: PostgreSQL connection string
|
|
- `REACT_APP_API_URL`: Backend API URL for frontend
|
|
|
|
### CLI Configuration
|
|
- **Configuration File**: `~/.sigma-cli/config.yaml` (auto-created with `config-init`)
|
|
- **Directory Structure**:
|
|
- `cves/YEAR/CVE-ID/`: Individual CVE data and rules
|
|
- `reports/`: Generated statistics and exports
|
|
- `cli/`: Command-line tool and modules
|
|
|
|
### Database Connection (For Migration Only)
|
|
- **PostgreSQL**: localhost:5432 (if migrating from legacy database)
|
|
- **Connection String**: Set via DATABASE_URL environment variable
|
|
|
|
### Enhanced API Endpoints
|
|
|
|
#### Bulk Processing
|
|
- `POST /api/bulk-seed` - Start complete bulk seeding (NVD + nomi-sec)
|
|
- `POST /api/incremental-update` - Update with NVD modified/recent feeds
|
|
- `POST /api/sync-nomi-sec` - Synchronize nomi-sec PoC data
|
|
- `POST /api/regenerate-rules` - Regenerate SIGMA rules with enhanced data
|
|
- `GET /api/bulk-jobs` - Get bulk processing job status
|
|
- `GET /api/bulk-status` - Get comprehensive system status
|
|
- `GET /api/poc-stats` - Get PoC-related statistics
|
|
|
|
#### Enhanced Data Access
|
|
- `GET /api/stats` - Enhanced statistics with PoC coverage
|
|
- `GET /api/claude-status` - Get Claude API availability status
|
|
- All existing CVE and SIGMA rule endpoints now include enhanced data fields
|
|
|
|
#### LLM-Enhanced Rule Generation
|
|
- `POST /api/llm-enhanced-rules` - Generate SIGMA rules using LLM AI analysis (supports multiple providers)
|
|
- `GET /api/llm-status` - Check LLM API availability and configuration for all providers
|
|
- `POST /api/llm-switch` - Switch between LLM providers and models
|
|
|
|
## Code Architecture Details
|
|
|
|
### **CLI Structure (Primary)**
|
|
- **cli/sigma_cli.py**: Main executable CLI with Click framework
|
|
- **cli/commands/**: Modular command system
|
|
- `base_command.py`: Common functionality and file operations
|
|
- `process_commands.py`: CVE processing and bulk operations
|
|
- `generate_commands.py`: SIGMA rule generation
|
|
- `search_commands.py`: Search and filtering
|
|
- `stats_commands.py`: Statistics and reporting
|
|
- `export_commands.py`: Data export in multiple formats
|
|
- `migrate_commands.py`: Database migration tools
|
|
- **cli/config/**: Configuration management
|
|
- **cli/README.md**: Detailed CLI documentation
|
|
|
|
### **File-Based Storage Structure**
|
|
- **CVE Directories**: `cves/YEAR/CVE-ID/` with individual metadata and rule files
|
|
- **Rule Variants**: Multiple SIGMA files per CVE (template, LLM, hybrid)
|
|
- **Metadata Format**: JSON files with processing history and PoC data
|
|
- **Reports**: Generated statistics and export outputs
|
|
|
|
### **Backend Data Processors (Reused by CLI)**
|
|
- **database_models.py**: SQLAlchemy models for data migration
|
|
- **Data Processors**: Core processing logic reused by CLI
|
|
- `nvd_bulk_processor.py`: NVD JSON dataset processing
|
|
- `nomi_sec_client.py`: nomi-sec PoC integration
|
|
- `enhanced_sigma_generator.py`: SIGMA rule generation
|
|
- `llm_client.py`: Multi-provider LLM integration
|
|
- `poc_analyzer.py`: PoC content analysis
|
|
|
|
### **CLI-Based Data Processing Flow**
|
|
1. **CVE Processing**: NVD data fetch → File storage → PoC analysis → Metadata generation
|
|
2. **Rule Generation**: Template/LLM/Hybrid generation → Multiple rule variants → File storage
|
|
3. **Search & Analysis**: File-based searching → Statistics generation → Export capabilities
|
|
4. **Migration Support**: Database export → File conversion → Validation → Cleanup
|
|
|
|
### **Legacy Web Processing Flow (For Reference)**
|
|
1. **Bulk Seeding**: NVD JSON downloads → Database storage → nomi-sec PoC sync → Enhanced rule generation
|
|
2. **Incremental Updates**: NVD modified feeds → Update existing data → Sync new PoCs
|
|
3. **Rule Enhancement**: PoC analysis → Indicator extraction → Template selection → Enhanced SIGMA rule
|
|
4. **LLM-Enhanced Generation**: PoC content analysis → Multi-provider LLM processing → Advanced SIGMA rule creation
|
|
|
|
## Development Notes
|
|
|
|
### Enhanced Rule Generation Logic
|
|
The application now uses an advanced rule generation process:
|
|
1. **CVE Analysis**: Extract metadata from NVD bulk data
|
|
2. **PoC Quality Assessment**: nomi-sec PoC analysis with star count, recency, quality tiers
|
|
3. **Advanced Indicator Extraction**: Processes, files, network, registry, commands from PoC repositories
|
|
4. **Template Selection**: Smart template matching based on PoC indicators and CVE characteristics
|
|
5. **Enhanced Rule Population**: Incorporate real exploit indicators with quality scoring
|
|
6. **MITRE ATT&CK Mapping**: Automatic technique identification based on indicators
|
|
7. **LLM AI Enhancement**: Optional multi-provider LLM integration for intelligent rule generation from PoC code analysis
|
|
|
|
### Quality Tiers
|
|
- **Excellent** (80+ points): High star count, recent updates, detailed descriptions
|
|
- **Good** (60-79 points): Moderate quality indicators
|
|
- **Fair** (40-59 points): Basic PoC with some quality indicators
|
|
- **Poor** (20-39 points): Minimal quality indicators
|
|
- **Very Poor** (<20 points): Low-quality PoCs
|
|
|
|
### Multi-Provider LLM Integration Features
|
|
- **Multiple LLM Providers**: Support for OpenAI, Anthropic, and Ollama (local models)
|
|
- **Dynamic Provider Switching**: Switch between providers and models through UI or API
|
|
- **Intelligent Code Analysis**: LLMs analyze actual exploit code from PoC repositories
|
|
- **Advanced Rule Generation**: Creates sophisticated SIGMA rules with proper syntax and logic
|
|
- **Contextual Understanding**: Interprets CVE descriptions and maps them to appropriate detection patterns
|
|
- **Automatic Validation**: Generated rules are validated for SIGMA syntax compliance
|
|
- **Fallback Mechanism**: Automatically falls back to template-based generation if LLM is unavailable
|
|
- **Enhanced Metadata**: Rules include generation method tracking for quality assessment
|
|
- **LangChain Integration**: Uses LangChain for robust LLM integration and prompt management
|
|
|
|
### Supported LLM Providers and Models
|
|
|
|
#### OpenAI
|
|
- **API Key**: Set `OPENAI_API_KEY` environment variable
|
|
- **Supported Models**: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
|
|
- **Default Model**: gpt-4o-mini
|
|
- **Rate Limits**: Based on OpenAI API limits
|
|
|
|
#### Anthropic
|
|
- **API Key**: Set `ANTHROPIC_API_KEY` environment variable
|
|
- **Supported Models**: claude-3-5-sonnet-20241022, claude-3-haiku-20240307, claude-3-opus-20240229
|
|
- **Default Model**: claude-3-5-sonnet-20241022
|
|
- **Rate Limits**: Based on Anthropic API limits
|
|
|
|
#### Ollama (Local Models)
|
|
- **Setup**: Install Ollama locally and set `OLLAMA_BASE_URL` (default: http://localhost:11434)
|
|
- **Supported Models**: llama3.2, codellama, mistral, llama2 (any Ollama-compatible model)
|
|
- **Default Model**: llama3.2
|
|
- **Rate Limits**: No external API limits (local processing)
|
|
|
|
### Testing and Validation
|
|
- **Frontend tests**: `npm test` (in frontend directory)
|
|
- **Backend testing**: Use standalone scripts for bulk operations
|
|
- **API testing**: Use `/docs` endpoint for Swagger UI
|
|
- **Task Monitoring**: Monitor via Flower dashboard at http://localhost:5555
|
|
- **Celery Tasks**: Use `celery -A celery_config worker --loglevel=info` for debugging
|
|
|
|
### Security Considerations
|
|
- **API Keys**: Store NVD and GitHub tokens in environment variables
|
|
- **PoC Analysis**: Automated analysis of curated PoC repositories (safer than raw GitHub search)
|
|
- **Rate Limiting**: Built-in rate limiting for external APIs
|
|
- **Data Validation**: Enhanced validation for bulk data processing
|
|
- **Audit Trail**: Job tracking for all bulk operations
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
- **Bulk Processing Failures**: Check `/api/bulk-jobs` for detailed error messages
|
|
- **NVD Data Download Issues**: Verify NVD API key and network connectivity
|
|
- **nomi-sec API Timeouts**: Built-in retry logic, check network connectivity
|
|
- **Frontend build errors**: Run `npm install` in frontend directory
|
|
- **Database schema changes**: Restart backend to auto-create new tables
|
|
- **Memory issues during bulk processing**: Monitor system resources, consider smaller batch sizes
|
|
|
|
### Enhanced Rate Limits
|
|
- **NVD API**: 5 requests/30s (no key) → 50 requests/30s (with key)
|
|
- **nomi-sec API**: 1 request/second (built-in rate limiting)
|
|
- **GitHub API** (fallback): 60 requests/hour (no token) → 5000 requests/hour (with token)
|
|
|
|
### Performance Optimization
|
|
- **Bulk Processing**: Start with recent years (2020+) for faster initial setup
|
|
- **PoC Sync**: Use smaller batch sizes (50) for better stability
|
|
- **Rule Generation**: Monitor quality scores to prioritize high-value PoCs
|
|
- **Database**: Ensure proper indexing on CVE ID and PoC fields
|
|
|
|
### Monitoring
|
|
- **Frontend**: Use Bulk Jobs tab for real-time progress monitoring
|
|
- **Backend logs**: `docker-compose logs -f backend`
|
|
- **Job status**: Check `/api/bulk-status` for comprehensive system health
|
|
- **Database**: Monitor PoC coverage percentage and rule enhancement progress |