Commit graph

5 commits

Author SHA1 Message Date
eca51167af FEATURE: Add Docker Compose support for CLI application with comprehensive usage documentation
This commit adds complete Docker Compose support to the CLI application, making it easy to run
the SIGMA rule generator in a containerized environment:

DOCKER INFRASTRUCTURE:
- docker-compose.yml: Complete service orchestration (CLI app, PostgreSQL, Redis, optional Ollama)
- Dockerfile: Optimized CLI application container with all dependencies
- init.sql: Database initialization for PostgreSQL
- .env.example: Updated environment configuration for both Docker and native setups
- Makefile: Convenient commands for Docker operations (setup, up, down, shell, cli execution)

DOCUMENTATION UPDATES:
- README.md: Comprehensive Docker vs Native comparison with detailed usage examples
- CLAUDE.md: Updated project guidance with Docker Compose as recommended approach
- Added step-by-step setup instructions for both deployment methods
- Included command examples for both Docker Compose and native execution

DOCKER SERVICES:
- sigma-cli: Main CLI application container with volume mounts for data persistence
- db: PostgreSQL database for legacy migrations and data processing
- redis: Redis cache for performance optimization
- ollama: Optional local LLM service (profile-based)

DATA PERSISTENCE:
- Host-mounted directories: ./cves/, ./reports/, ./logs/, ./backend/templates/
- Named volumes: postgres_data, redis_data, ollama_data
- Complete data preservation between container restarts

This provides users with multiple deployment options:
1. Quick Docker Compose setup (recommended for testing/evaluation)
2. Native installation (recommended for production/development)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:52:28 -05:00
de30d4ce99 CLEANUP: Remove legacy web application components and streamline for CLI-first architecture
This commit completes the transformation to a CLI-first SIGMA rule generator by removing all legacy web application components:

REMOVED COMPONENTS:
- Frontend React application (frontend/ directory)
- Docker Compose web orchestration (docker-compose.yml, Dockerfiles)
- FastAPI web backend (main.py, celery_config.py, bulk_seeder.py)
- Web-specific task schedulers and executors
- Initialization scripts for web deployment (start.sh, init.sql, Makefile)

SIMPLIFIED ARCHITECTURE:
- Created backend/database_models.py for migration-only database access
- Updated CLI commands to use simplified database models
- Retained core processing modules (sigma generator, PoC clients, NVD processor)
- Fixed import paths in CLI migration and process commands

The application now operates as a streamlined CLI tool with file-based SIGMA rule storage,
eliminating web application complexity while maintaining all core CVE processing capabilities.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:24:38 -05:00
e579c91b5e MAJOR: Transform web application to professional CLI-based SIGMA rule generator
🎉 **Architecture Transformation (v2.0)**
- Complete migration from web app to professional CLI tool
- File-based SIGMA rule management system
- Git-friendly directory structure organized by year/CVE-ID
- Multiple rule variants per CVE (template, LLM, hybrid)

 **New CLI System**
- Professional command-line interface with Click framework
- 8 command groups: process, generate, search, stats, export, migrate
- Modular command architecture for maintainability
- Comprehensive help system and configuration management

📁 **File-Based Storage Architecture**
- Individual CVE directories: cves/YEAR/CVE-ID/
- Multiple SIGMA rule variants per CVE
- JSON metadata with processing history and PoC data
- Native YAML files perfect for version control

🚀 **Core CLI Commands**
- process: CVE processing and bulk operations
- generate: SIGMA rule generation with multiple methods
- search: Advanced CVE and rule searching with filters
- stats: Comprehensive statistics and analytics
- export: Multiple output formats for different workflows
- migrate: Database-to-file migration tools

🔧 **Migration Support**
- Complete migration utilities from web database
- Data validation and integrity checking
- Backward compatibility with existing processors
- Legacy web interface maintained for transition

📊 **Enhanced Features**
- Advanced search with complex filtering (severity, PoC presence, etc.)
- Multi-format exports (YAML, JSON, CSV)
- Comprehensive statistics and coverage reports
- File-based rule versioning and management

🎯 **Production Benefits**
- No database dependency - runs anywhere
- Perfect for cybersecurity teams using git workflows
- Direct integration with SIGMA ecosystems
- Portable architecture for CI/CD pipelines
- Multiple rule variants for different detection scenarios

📝 **Documentation Updates**
- Complete README rewrite for CLI-first approach
- Updated CLAUDE.md with new architecture details
- Detailed CLI documentation with examples
- Migration guides and troubleshooting

**Perfect for security teams wanting production-ready SIGMA rules with version control\! 🛡️**

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:11:03 -05:00
d51f3ea402 Migrate task tracking from BulkProcessingJob to Celery-based monitoring
- Remove BulkProcessingJob model and related endpoints from main.py
- Update CLAUDE.md to reference Flower dashboard for task monitoring
- Simplify enhanced_sigma_generator.py to use unified LLM client
- Remove job tracking logic from mcdevitt_poc_client.py
- Enhance CVE API with search and pagination support
- Update setup_ollama_with_sigma.py with improved checkpoint handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 09:23:26 -05:00
9bde1395bf Optimize performance and migrate to Celery-based scheduling
This commit introduces major performance improvements and migrates from custom job scheduling to Celery Beat for better reliability and scalability.

### 🚀 Performance Optimizations

**CVE2CAPEC Client Performance (Fixed startup blocking)**
- Implement lazy loading with 24-hour cache for CVE2CAPEC mappings
- Add background task for CVE2CAPEC sync (data_sync_tasks.sync_cve2capec)
- Remove blocking data fetch during client initialization
- API endpoint: POST /api/sync-cve2capec

**ExploitDB Client Performance (Fixed webapp request blocking)**
- Implement global file index cache to prevent rebuilding on every request
- Add lazy loading with 24-hour cache expiry for 46K+ exploit index
- Background task for index building (data_sync_tasks.build_exploitdb_index)
- API endpoint: POST /api/build-exploitdb-index

### 🔄 Celery Migration & Scheduling

**Celery Beat Integration**
- Migrate from custom job scheduler to Celery Beat for reliability
- Remove 'finetuned' LLM provider (logic moved to ollama container)
- Optimized daily workflow with proper timing and dependencies

**New Celery Tasks Structure**
- tasks/bulk_tasks.py - NVD bulk processing and SIGMA generation
- tasks/data_sync_tasks.py - All data synchronization tasks
- tasks/maintenance_tasks.py - System maintenance and cleanup
- tasks/sigma_tasks.py - SIGMA rule generation tasks

**Daily Schedule (Optimized)**
```
1:00 AM  → Weekly cleanup (Sundays)
1:30 AM  → Daily result cleanup
2:00 AM  → NVD incremental update
3:00 AM  → CISA KEV sync
3:15 AM  → Nomi-sec PoC sync
3:30 AM  → GitHub PoC sync
3:45 AM  → ExploitDB sync
4:00 AM  → CVE2CAPEC MITRE ATT&CK sync
4:15 AM  → ExploitDB index rebuild
5:00 AM  → Reference content sync
8:00 AM  → SIGMA rule generation
9:00 AM  → LLM-enhanced SIGMA generation
Every 15min → Health checks
```

### 🐳 Docker & Infrastructure

**Enhanced Docker Setup**
- Ollama setup with integrated SIGMA model creation (setup_ollama_with_sigma.py)
- Initial database population check and trigger (initial_setup.py)
- Proper service dependencies and health checks
- Remove manual post-rebuild script requirements

**Service Architecture**
- Celery worker with 4-queue system (default, bulk_processing, sigma_generation, data_sync)
- Flower monitoring dashboard (localhost:5555)
- Redis as message broker and result backend

### 🎯 API Improvements

**Background Task Endpoints**
- GitHub PoC sync now uses Celery (was blocking backend)
- All sync operations return task IDs and monitoring URLs
- Consistent error handling and progress tracking

**New Endpoints**
- POST /api/sync-cve2capec - CVE2CAPEC mapping sync
- POST /api/build-exploitdb-index - ExploitDB index rebuild

### 📁 Cleanup

**Removed Files**
- fix_sigma_model.sh (replaced by setup_ollama_with_sigma.py)
- Various test_* and debug_* files no longer needed
- Old training scripts related to removed 'finetuned' provider
- Utility scripts replaced by Docker services

### 🔧 Configuration

**Key Files Added/Modified**
- backend/celery_config.py - Complete Celery configuration
- backend/initial_setup.py - First-boot database population
- backend/setup_ollama_with_sigma.py - Integrated Ollama setup
- CLAUDE.md - Project documentation and development guide

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-17 18:58:47 -05:00