auto_sigma_rule_generator/README.md

266 lines
No EOL
7 KiB
Markdown

# CVE-SIGMA Auto Generator
Automated platform that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis.
## ✨ Key Features
- **Bulk CVE Processing**: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
- **AI-Powered Rule Generation**: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
- **Quality-Based PoC Analysis**: 5-tier quality scoring system for exploit reliability
- **Real-time Monitoring**: Live job tracking and progress dashboard
- **Advanced Indicators**: Extract processes, files, network patterns from actual exploits
## 🚀 Quick Start
### Prerequisites
- Docker and Docker Compose
- (Optional) API keys for enhanced features
### Installation
```bash
# Clone and start
git clone <repository-url>
cd auto_sigma_rule_generator
chmod +x start.sh
./start.sh
```
**Access Points:**
- Frontend: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
### First Run
The application automatically:
1. Initializes database with rule templates
2. Fetches recent CVEs from NVD
3. Generates SIGMA rules with AI enhancement
4. Polls for new CVEs hourly
## 🎯 Usage
### Web Interface
- **Dashboard**: Statistics and system overview
- **CVEs**: Complete CVE listing with PoC data
- **SIGMA Rules**: Generated detection rules
- **Bulk Jobs**: Processing status and controls
### API Endpoints
#### Core Operations
```bash
# Fetch CVEs
curl -X POST http://localhost:8000/api/fetch-cves
# Bulk processing
curl -X POST http://localhost:8000/api/bulk-seed
curl -X POST http://localhost:8000/api/incremental-update
# LLM-enhanced rules
curl -X POST http://localhost:8000/api/llm-enhanced-rules
```
#### Data Access
- `GET /api/cves` - List CVEs
- `GET /api/sigma-rules` - List rules
- `GET /api/stats` - Statistics
- `GET /api/llm-status` - LLM provider status
## ⚙️ Configuration
### Environment Variables
**Core Settings**
```bash
DATABASE_URL=postgresql://user:pass@db:5432/dbname
NVD_API_KEY=your_nvd_key # Optional: 5→50 req/30s
GITHUB_TOKEN=your_github_token # Optional: Enhanced PoC analysis
```
**LLM Configuration**
```bash
LLM_PROVIDER=ollama # Default: ollama (local)
LLM_MODEL=llama3.2 # Provider-specific model
OLLAMA_BASE_URL=http://ollama:11434
# External providers (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
```
### API Keys Setup
**NVD API** (Recommended)
1. Get key: https://nvd.nist.gov/developers/request-an-api-key
2. Add to `.env`: `NVD_API_KEY=your_key`
3. Benefit: 10x rate limit increase
**GitHub Token** (Optional)
1. Create: https://github.com/settings/tokens (public_repo scope)
2. Add to `.env`: `GITHUB_TOKEN=your_token`
3. Benefit: Enhanced exploit-based rules
**LLM APIs** (Optional)
- **Local Ollama**: No setup required (default)
- **OpenAI**: Get key from https://platform.openai.com/api-keys
- **Anthropic**: Get key from https://console.anthropic.com/
## 🧠 Rule Generation
### AI-Enhanced Generation
1. **PoC Analysis**: LLM analyzes actual exploit code
2. **Intelligent Detection**: Creates sophisticated SIGMA rules
3. **Context Awareness**: Maps CVE descriptions to detection patterns
4. **Validation**: Automatic SIGMA syntax verification
5. **Fallback**: Template-based generation if LLM unavailable
### Quality Tiers
- **Excellent** (80+ pts): High-quality PoCs with recent updates
- **Good** (60-79 pts): Moderate quality indicators
- **Fair** (40-59 pts): Basic PoCs with some validation
- **Poor** (20-39 pts): Minimal quality indicators
- **Very Poor** (<20 pts): Low-quality PoCs
### Rule Types
- 🤖 **AI-Enhanced**: LLM-generated with PoC analysis
- 🔍 **Exploit-Based**: Template + GitHub exploit indicators
- **Basic**: CVE description only
### Example Output
```yaml
title: CVE-2025-1234 AI-Enhanced Detection
description: Detection for CVE-2025-1234 RCE [AI-Enhanced with PoC analysis]
tags:
- attack.t1059.001
- cve-2025-1234
- ai.enhanced
detection:
selection_process:
Image|endswith: '\powershell.exe'
CommandLine|contains:
- '-EncodedCommand'
- 'bypass'
selection_network:
DestinationPort: [443, 80]
condition: selection_process and selection_network
level: high
```
## 🛠️ Development
### Local Development
```bash
# Start dependencies
docker-compose up -d db redis ollama
# Backend
cd backend && pip install -r requirements.txt
uvicorn main:app --reload
# Frontend
cd frontend && npm install && npm start
```
### Testing LLM Integration
```bash
# Check Ollama
curl http://localhost:11434/api/tags
# Test LLM status
curl http://localhost:8000/api/llm-status
# Switch providers
curl -X POST http://localhost:8000/api/llm-switch \
-H "Content-Type: application/json" \
-d '{"provider": "ollama", "model": "llama3.2"}'
```
## 📊 Architecture
- **Backend**: FastAPI + SQLAlchemy ORM
- **Frontend**: React + Tailwind CSS
- **Database**: PostgreSQL with enhanced schema
- **Cache**: Redis (optional)
- **LLM**: Ollama container + multi-provider support
- **Deployment**: Docker Compose
### Enhanced Database Schema
- **CVEs**: PoC metadata, bulk processing fields
- **SIGMA Rules**: Quality scoring, nomi-sec data
- **Rule Templates**: Pattern templates for generation
- **Bulk Jobs**: Job tracking and status
## 🔧 Troubleshooting
### Common Issues
**CVE Fetch Issues**
- Verify NVD API key in `.env`
- Check API connectivity: Use "Test NVD API" button
- Review logs: `docker-compose logs -f backend`
**No Rules Generated**
- Ensure LLM provider is accessible
- Check `/api/llm-status` for provider health
- Verify PoC data quality in CVE details
**Performance Issues**
- Start with recent years (2020+) for faster initial setup
- Use smaller batch sizes for bulk operations
- Monitor system resources during processing
**Port Conflicts**
- Default ports: 3000 (frontend), 8000 (backend), 5432 (db)
- Modify `docker-compose.yml` if ports are in use
### Rate Limits
- **NVD API**: 5/30s (no key) 50/30s (with key)
- **nomi-sec API**: 1/second (built-in limiting)
- **GitHub API**: 60/hour (no token) 5000/hour (with token)
## 🛡️ Security
- Store API keys in environment variables
- Validate generated rules before production deployment
- Rules marked as "experimental" - require analyst review
- Use strong database passwords in production
## 📈 Monitoring
```bash
# View logs
docker-compose logs -f backend
docker-compose logs -f frontend
# Check service health
docker-compose ps
# Monitor bulk jobs
curl http://localhost:8000/api/bulk-status
```
## 🗺️ Roadmap
- [ ] Custom rule template editor
- [ ] Advanced MITRE ATT&CK mapping
- [ ] SIEM platform export
- [ ] ML-based rule optimization
- [ ] Threat intelligence integration
## 📝 License
MIT License - see LICENSE file for details.
## 🤝 Contributing
1. Fork repository
2. Create feature branch
3. Add tests and documentation
4. Submit pull request
## 📞 Support
- Check troubleshooting section
- Review application logs
- Open GitHub issue for bugs/questions