auto_sigma_rule_generator/README.md

# CVE-SIGMA Auto Generator

Automated platform that generates SIGMA detection rules from CVE data using AI-enhanced exploit analysis.

## ✨ Key Features

- **Bulk CVE Processing**: Complete NVD datasets (2002-2025) with nomi-sec PoC integration
- **AI-Powered Rule Generation**: Multi-provider LLM support (OpenAI, Anthropic, local Ollama)
- **Quality-Based PoC Analysis**: 5-tier quality scoring system for exploit reliability
- **Real-time Monitoring**: Live job tracking and progress dashboard
- **Advanced Indicators**: Extract processes, files, network patterns from actual exploits

## 🚀 Quick Start

### Prerequisites
- Docker and Docker Compose
- (Optional) API keys for enhanced features

### Installation

```bash
# Clone and start
git clone <repository-url>
cd auto_sigma_rule_generator
chmod +x start.sh
./start.sh
```

**Access Points:**
- Frontend: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs

### First Run
The application automatically:
1. Initializes database with rule templates
2. Fetches recent CVEs from NVD
3. Generates SIGMA rules with AI enhancement
4. Polls for new CVEs hourly

## 🎯 Usage

### Web Interface
- **Dashboard**: Statistics and system overview
- **CVEs**: Complete CVE listing with PoC data
- **SIGMA Rules**: Generated detection rules
- **Bulk Jobs**: Processing status and controls

### API Endpoints

#### Core Operations
```bash
# Fetch CVEs
curl -X POST http://localhost:8000/api/fetch-cves

# Bulk processing
curl -X POST http://localhost:8000/api/bulk-seed
curl -X POST http://localhost:8000/api/incremental-update

# LLM-enhanced rules
curl -X POST http://localhost:8000/api/llm-enhanced-rules
```

#### Data Access
- `GET /api/cves` - List CVEs
- `GET /api/sigma-rules` - List rules
- `GET /api/stats` - Statistics
- `GET /api/llm-status` - LLM provider status

## ⚙️ Configuration

### Environment Variables

**Core Settings**
```bash
DATABASE_URL=postgresql://user:pass@db:5432/dbname
NVD_API_KEY=your_nvd_key          # Optional: 5→50 req/30s
GITHUB_TOKEN=your_github_token     # Optional: Enhanced PoC analysis
```

**LLM Configuration**
```bash
LLM_PROVIDER=ollama               # Default: ollama (local)
LLM_MODEL=llama3.2               # Provider-specific model
OLLAMA_BASE_URL=http://ollama:11434

# External providers (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
```

### API Keys Setup

**NVD API** (Recommended)
1. Get key: https://nvd.nist.gov/developers/request-an-api-key
2. Add to `.env`: `NVD_API_KEY=your_key`
3. Benefit: 10x rate limit increase

**GitHub Token** (Optional)
1. Create: https://github.com/settings/tokens (public_repo scope)
2. Add to `.env`: `GITHUB_TOKEN=your_token`
3. Benefit: Enhanced exploit-based rules

**LLM APIs** (Optional)
- **Local Ollama**: No setup required (default)
- **OpenAI**: Get key from https://platform.openai.com/api-keys
- **Anthropic**: Get key from https://console.anthropic.com/

## 🧠 Rule Generation

### AI-Enhanced Generation
1. **PoC Analysis**: LLM analyzes actual exploit code
2. **Intelligent Detection**: Creates sophisticated SIGMA rules
3. **Context Awareness**: Maps CVE descriptions to detection patterns
4. **Validation**: Automatic SIGMA syntax verification
5. **Fallback**: Template-based generation if LLM unavailable

### Quality Tiers
- **Excellent** (80+ pts): High-quality PoCs with recent updates
- **Good** (60-79 pts): Moderate quality indicators
- **Fair** (40-59 pts): Basic PoCs with some validation
- **Poor** (20-39 pts): Minimal quality indicators
- **Very Poor** (<20 pts): Low-quality PoCs

### Rule Types
- 🤖 **AI-Enhanced**: LLM-generated with PoC analysis
- 🔍 **Exploit-Based**: Template + GitHub exploit indicators
- ⚡ **Basic**: CVE description only

### Example Output
```yaml
title: CVE-2025-1234 AI-Enhanced Detection
description: Detection for CVE-2025-1234 RCE [AI-Enhanced with PoC analysis]
tags:
    - attack.t1059.001
    - cve-2025-1234
    - ai.enhanced
detection:
    selection_process:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-EncodedCommand'
            - 'bypass'
    selection_network:
        DestinationPort: [443, 80]
    condition: selection_process and selection_network
level: high
```

## 🛠️ Development

### Local Development
```bash
# Start dependencies
docker-compose up -d db redis ollama

# Backend
cd backend && pip install -r requirements.txt
uvicorn main:app --reload

# Frontend
cd frontend && npm install && npm start
```

### Testing LLM Integration
```bash
# Check Ollama
curl http://localhost:11434/api/tags

# Test LLM status
curl http://localhost:8000/api/llm-status

# Switch providers
curl -X POST http://localhost:8000/api/llm-switch \
  -H "Content-Type: application/json" \
  -d '{"provider": "ollama", "model": "llama3.2"}'
```

## 📊 Architecture

- **Backend**: FastAPI + SQLAlchemy ORM
- **Frontend**: React + Tailwind CSS
- **Database**: PostgreSQL with enhanced schema
- **Cache**: Redis (optional)
- **LLM**: Ollama container + multi-provider support
- **Deployment**: Docker Compose

### Enhanced Database Schema
- **CVEs**: PoC metadata, bulk processing fields
- **SIGMA Rules**: Quality scoring, nomi-sec data
- **Rule Templates**: Pattern templates for generation
- **Bulk Jobs**: Job tracking and status

## 🔧 Troubleshooting

### Common Issues

**CVE Fetch Issues**
- Verify NVD API key in `.env`
- Check API connectivity: Use "Test NVD API" button
- Review logs: `docker-compose logs -f backend`

**No Rules Generated**
- Ensure LLM provider is accessible
- Check `/api/llm-status` for provider health
- Verify PoC data quality in CVE details

**Performance Issues**
- Start with recent years (2020+) for faster initial setup
- Use smaller batch sizes for bulk operations
- Monitor system resources during processing

**Port Conflicts**
- Default ports: 3000 (frontend), 8000 (backend), 5432 (db)
- Modify `docker-compose.yml` if ports are in use

### Rate Limits
- **NVD API**: 5/30s (no key) → 50/30s (with key)
- **nomi-sec API**: 1/second (built-in limiting)
- **GitHub API**: 60/hour (no token) → 5000/hour (with token)

## 🛡️ Security

- Store API keys in environment variables
- Validate generated rules before production deployment
- Rules marked as "experimental" - require analyst review
- Use strong database passwords in production

## 📈 Monitoring

```bash
# View logs
docker-compose logs -f backend
docker-compose logs -f frontend

# Check service health
docker-compose ps

# Monitor bulk jobs
curl http://localhost:8000/api/bulk-status
```

## 🗺️ Roadmap

- [ ] Custom rule template editor
- [ ] Advanced MITRE ATT&CK mapping
- [ ] SIEM platform export
- [ ] ML-based rule optimization
- [ ] Threat intelligence integration

## 📝 License

MIT License - see LICENSE file for details.

## 🤝 Contributing

1. Fork repository
2. Create feature branch
3. Add tests and documentation
4. Submit pull request

## 📞 Support

- Check troubleshooting section
- Review application logs
- Open GitHub issue for bugs/questions