Commit graph

23 commits

Author SHA1 Message Date
de30d4ce99 CLEANUP: Remove legacy web application components and streamline for CLI-first architecture
This commit completes the transformation to a CLI-first SIGMA rule generator by removing all legacy web application components:

REMOVED COMPONENTS:
- Frontend React application (frontend/ directory)
- Docker Compose web orchestration (docker-compose.yml, Dockerfiles)
- FastAPI web backend (main.py, celery_config.py, bulk_seeder.py)
- Web-specific task schedulers and executors
- Initialization scripts for web deployment (start.sh, init.sql, Makefile)

SIMPLIFIED ARCHITECTURE:
- Created backend/database_models.py for migration-only database access
- Updated CLI commands to use simplified database models
- Retained core processing modules (sigma generator, PoC clients, NVD processor)
- Fixed import paths in CLI migration and process commands

The application now operates as a streamlined CLI tool with file-based SIGMA rule storage,
eliminating web application complexity while maintaining all core CVE processing capabilities.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:24:38 -05:00
e579c91b5e MAJOR: Transform web application to professional CLI-based SIGMA rule generator
🎉 **Architecture Transformation (v2.0)**
- Complete migration from web app to professional CLI tool
- File-based SIGMA rule management system
- Git-friendly directory structure organized by year/CVE-ID
- Multiple rule variants per CVE (template, LLM, hybrid)

 **New CLI System**
- Professional command-line interface with Click framework
- 8 command groups: process, generate, search, stats, export, migrate
- Modular command architecture for maintainability
- Comprehensive help system and configuration management

📁 **File-Based Storage Architecture**
- Individual CVE directories: cves/YEAR/CVE-ID/
- Multiple SIGMA rule variants per CVE
- JSON metadata with processing history and PoC data
- Native YAML files perfect for version control

🚀 **Core CLI Commands**
- process: CVE processing and bulk operations
- generate: SIGMA rule generation with multiple methods
- search: Advanced CVE and rule searching with filters
- stats: Comprehensive statistics and analytics
- export: Multiple output formats for different workflows
- migrate: Database-to-file migration tools

🔧 **Migration Support**
- Complete migration utilities from web database
- Data validation and integrity checking
- Backward compatibility with existing processors
- Legacy web interface maintained for transition

📊 **Enhanced Features**
- Advanced search with complex filtering (severity, PoC presence, etc.)
- Multi-format exports (YAML, JSON, CSV)
- Comprehensive statistics and coverage reports
- File-based rule versioning and management

🎯 **Production Benefits**
- No database dependency - runs anywhere
- Perfect for cybersecurity teams using git workflows
- Direct integration with SIGMA ecosystems
- Portable architecture for CI/CD pipelines
- Multiple rule variants for different detection scenarios

📝 **Documentation Updates**
- Complete README rewrite for CLI-first approach
- Updated CLAUDE.md with new architecture details
- Detailed CLI documentation with examples
- Migration guides and troubleshooting

**Perfect for security teams wanting production-ready SIGMA rules with version control\! 🛡️**

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 13:11:03 -05:00
d51f3ea402 Migrate task tracking from BulkProcessingJob to Celery-based monitoring
- Remove BulkProcessingJob model and related endpoints from main.py
- Update CLAUDE.md to reference Flower dashboard for task monitoring
- Simplify enhanced_sigma_generator.py to use unified LLM client
- Remove job tracking logic from mcdevitt_poc_client.py
- Enhance CVE API with search and pagination support
- Update setup_ollama_with_sigma.py with improved checkpoint handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-21 09:23:26 -05:00
49963338d3 Add Celery dependencies and enhance bulk seeder
- Add Celery, Flower, and related dependencies to requirements.txt
- Update bulk_seeder.py with progress callback support for Celery integration
- Clean up finetuned model dependencies (now served through Ollama)
- Update setup_ollama.py for enhanced configuration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-17 18:59:07 -05:00
9bde1395bf Optimize performance and migrate to Celery-based scheduling
This commit introduces major performance improvements and migrates from custom job scheduling to Celery Beat for better reliability and scalability.

### 🚀 Performance Optimizations

**CVE2CAPEC Client Performance (Fixed startup blocking)**
- Implement lazy loading with 24-hour cache for CVE2CAPEC mappings
- Add background task for CVE2CAPEC sync (data_sync_tasks.sync_cve2capec)
- Remove blocking data fetch during client initialization
- API endpoint: POST /api/sync-cve2capec

**ExploitDB Client Performance (Fixed webapp request blocking)**
- Implement global file index cache to prevent rebuilding on every request
- Add lazy loading with 24-hour cache expiry for 46K+ exploit index
- Background task for index building (data_sync_tasks.build_exploitdb_index)
- API endpoint: POST /api/build-exploitdb-index

### 🔄 Celery Migration & Scheduling

**Celery Beat Integration**
- Migrate from custom job scheduler to Celery Beat for reliability
- Remove 'finetuned' LLM provider (logic moved to ollama container)
- Optimized daily workflow with proper timing and dependencies

**New Celery Tasks Structure**
- tasks/bulk_tasks.py - NVD bulk processing and SIGMA generation
- tasks/data_sync_tasks.py - All data synchronization tasks
- tasks/maintenance_tasks.py - System maintenance and cleanup
- tasks/sigma_tasks.py - SIGMA rule generation tasks

**Daily Schedule (Optimized)**
```
1:00 AM  → Weekly cleanup (Sundays)
1:30 AM  → Daily result cleanup
2:00 AM  → NVD incremental update
3:00 AM  → CISA KEV sync
3:15 AM  → Nomi-sec PoC sync
3:30 AM  → GitHub PoC sync
3:45 AM  → ExploitDB sync
4:00 AM  → CVE2CAPEC MITRE ATT&CK sync
4:15 AM  → ExploitDB index rebuild
5:00 AM  → Reference content sync
8:00 AM  → SIGMA rule generation
9:00 AM  → LLM-enhanced SIGMA generation
Every 15min → Health checks
```

### 🐳 Docker & Infrastructure

**Enhanced Docker Setup**
- Ollama setup with integrated SIGMA model creation (setup_ollama_with_sigma.py)
- Initial database population check and trigger (initial_setup.py)
- Proper service dependencies and health checks
- Remove manual post-rebuild script requirements

**Service Architecture**
- Celery worker with 4-queue system (default, bulk_processing, sigma_generation, data_sync)
- Flower monitoring dashboard (localhost:5555)
- Redis as message broker and result backend

### 🎯 API Improvements

**Background Task Endpoints**
- GitHub PoC sync now uses Celery (was blocking backend)
- All sync operations return task IDs and monitoring URLs
- Consistent error handling and progress tracking

**New Endpoints**
- POST /api/sync-cve2capec - CVE2CAPEC mapping sync
- POST /api/build-exploitdb-index - ExploitDB index rebuild

### 📁 Cleanup

**Removed Files**
- fix_sigma_model.sh (replaced by setup_ollama_with_sigma.py)
- Various test_* and debug_* files no longer needed
- Old training scripts related to removed 'finetuned' provider
- Utility scripts replaced by Docker services

### 🔧 Configuration

**Key Files Added/Modified**
- backend/celery_config.py - Complete Celery configuration
- backend/initial_setup.py - First-boot database population
- backend/setup_ollama_with_sigma.py - Integrated Ollama setup
- CLAUDE.md - Project documentation and development guide

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-17 18:58:47 -05:00
54db665711 only use our LLM for help with generating detection: portion of SIGMA rule. enhance poc analyzer program python indicators 2025-07-16 13:02:11 -05:00
cf57944c7f add poc analyzer code 2025-07-16 10:15:55 -05:00
06c4ed74b8 add cve2capec client to map mitre attack data to cves 2025-07-14 15:48:10 -05:00
d38edff1cd script to clear old sigma rules and starting to tweak system prompt to send to llm for rule generation 2025-07-11 19:20:03 -05:00
d17f961b9d add job scheduler 2025-07-11 09:16:57 -05:00
08d6e33bbc add ollama to docker-compose for local model testing 2025-07-10 21:32:15 -05:00
3c120462ac add reference data gathering 2025-07-10 17:30:12 -05:00
c1bbea09fe update README 2025-07-10 16:23:36 -05:00
696a1a3462 add kev support, exploitDB mirror support 2025-07-10 16:19:43 -05:00
20b3a63c78 add claude client + generic llm client using langchain 2025-07-09 18:02:45 -05:00
e4a3cc6cb9 make nvd sync all cves, fix interpolation for templates 2025-07-09 12:42:18 -05:00
455a46c88f added git submodule for more exploits. added template dir for base yaml templates for sigma rules 2025-07-09 11:58:29 -05:00
cfaad8b359 add templates to enhanced sigma generator 2025-07-09 07:22:51 -05:00
790e4bd91f more updates for bulk 2025-07-08 17:50:01 -05:00
5a9ae34996 Adding in rule generation from github exploits 2025-07-08 10:20:54 -05:00
cc825fdb86 updated backend code fixed bad UUID error 2025-07-08 09:45:53 -05:00
e331f1763d fix build errors 2025-07-08 09:10:25 -05:00
967886ef49 init commit. main app + frontend/backend 2025-07-08 08:34:28 -05:00