sciagent/docs/TECH_STACK_COMPARISON.md

# Tech Stack Comparison: Original vs Simplified

## Quick Reference

### Original Suggestions → Simplified Alternatives

| Requirement | Original Tech | Simplified Tech | Complexity Reduction |
|------------|--------------|----------------|---------------------|
| **Workflow Engine** | Camunda / Temporal | Custom Python state machine | 90% simpler |
| **Document Storage** | AWS S3 / MinIO | Local filesystem + PostgreSQL metadata | 80% simpler |
| **Search & Duplicate Detection** | Elasticsearch + ML (Sentence-BERT) | PostgreSQL full-text + pg_trgm | 85% simpler |
| **Event Bus** | Apache Kafka / RabbitMQ | PostgreSQL NOTIFY/LISTEN or in-memory | 90% simpler |
| **Business Rules** | Drools | Custom Python classes/functions | 70% simpler |
| **Audit Log** | Separate WORM storage | PostgreSQL append-only table | 60% simpler |

---

## Detailed Simplifications

### 1. Workflow Engine

**Original:** Camunda or Temporal
- Separate service to run
- Complex BPMN diagrams
- Additional database
- Learning curve

**Simplified:** Custom Python State Machine
```python
# ~100 lines of code
class InitiativeWorkflow:
    STATES = {
        'DRAFT': ['SUBMITTED'],
        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
        'UNIT_REVIEW': ['COUNCIL_REVIEW', 'REJECTED'],
        'COUNCIL_REVIEW': ['APPROVED', 'REJECTED'],
        'APPROVED': ['FINALIZED', 'APPEAL'],
        'REJECTED': ['APPEAL'],
        'APPEAL': ['APPROVED', 'REJECTED', 'FINALIZED'],
        'FINALIZED': []
    }

    def can_transition(self, from_state, to_state, user_role):
        return to_state in self.STATES.get(from_state, [])
```

**Savings:**
- No separate service
- No BPMN learning
- Easier to debug
- Version controlled

---

### 2. Document Storage

**Original:** AWS S3 / MinIO
- Separate service
- API calls for every operation
- Network latency
- Additional configuration

**Simplified:** Local Filesystem
```
/initiatives/
  /{initiative_id}/
    /forms/
    /reviews/
    /attachments/
```

**Savings:**
- Direct file access
- No API calls
- Simpler backup (copy folder)
- No network dependency

---

### 3. Search & Duplicate Detection

**Original:** Elasticsearch + ML Model (Sentence-BERT)
- Separate service
- Model training required
- Complex deployment
- Resource intensive

**Simplified:** PostgreSQL Full-Text + Trigram Similarity
```sql
-- Enable extensions
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create index
CREATE INDEX idx_initiative_description_gin
ON initiatives USING gin(to_tsvector('english', description));

-- Similarity search
SELECT id, title,
       similarity(description, 'search text') as score
FROM initiatives
WHERE similarity(description, 'search text') > 0.3
ORDER BY score DESC;
```

**Savings:**
- Built into PostgreSQL
- No model training
- No separate service
- Good enough for local scale

---

### 4. Event Bus

**Original:** Apache Kafka / RabbitMQ
- Separate service
- Complex configuration
- Message persistence
- Consumer groups

**Simplified:** PostgreSQL NOTIFY/LISTEN
```python
# Publisher
async def notify_event(event_type, data):
    await db.execute(
        "SELECT pg_notify('initiative_events', %s)",
        json.dumps({'type': event_type, 'data': data})
    )

# Listener
async def listen_events():
    conn = await asyncpg.connect(...)
    await conn.add_listener('initiative_events', handle_event)
```

**Savings:**
- No separate service
- Built into database
- Persistent (if needed)
- Simple pub/sub

---

### 5. Business Rules Engine

**Original:** Drools
- Java-based
- Separate rule files
- Complex syntax
- Additional dependency

**Simplified:** Python Functions/Classes
```python
class NoveltyChecker:
    def check(self, initiative):
        # Check similarity with existing
        similar = self.find_similar(initiative)
        if similar:
            return ValidationResult(invalid=True, reason="Duplicate found")
        return ValidationResult(valid=True)

class ScoringEngine:
    def calculate(self, reviews):
        scores = [r.score for r in reviews if r.score is not None]
        if len(scores) == 0:
            return None
        return sum(scores) / len(scores)
```

**Savings:**
- Native Python
- Easy to test
- Version controlled
- No external engine

---

## Resource Usage Comparison

### Original Stack
- PostgreSQL: ~200MB RAM
- FastAPI: ~100MB RAM
- Elasticsearch: ~1GB RAM
- Kafka: ~500MB RAM
- MinIO: ~200MB RAM
- **Total: ~2GB RAM minimum**

### Simplified Stack
- PostgreSQL: ~200MB RAM
- FastAPI: ~100MB RAM
- **Total: ~300MB RAM**

**Savings: 85% less memory**

---

## Deployment Complexity

### Original Stack
```
docker-compose.yml:
  - postgres
  - fastapi
  - elasticsearch
  - kafka
  - minio
  - zookeeper (for Kafka)

Total: 6+ containers
```

### Simplified Stack
```
docker-compose.yml:
  - postgres
  - fastapi

Total: 2 containers
```

**Savings: 67% fewer services**

---

## Maintenance Effort

| Task | Original | Simplified | Time Saved |
|------|----------|------------|------------|
| Setup | 2-3 days | 2-3 hours | 90% |
| Debugging | Complex (multiple services) | Simple (2 services) | 70% |
| Updates | Multiple services | 2 services | 80% |
| Monitoring | Multiple dashboards | Single dashboard | 75% |

---

## When to Upgrade

Upgrade to original stack only if:

1. **Scale:** >10,000 initiatives/year
2. **Users:** >100 concurrent users
3. **Performance:** Response time >2s
4. **Distribution:** Multi-server deployment
5. **Advanced ML:** Need sophisticated NLP

For local application with typical load (<5,000 initiatives/year), simplified stack is optimal.

---

## Migration Path

If you need to scale later:

1. **Add Redis** for caching (if slow queries)
2. **Add Elasticsearch** for advanced search (if PostgreSQL search insufficient)
3. **Add RabbitMQ** for async processing (if need background jobs)
4. **Move to S3** for documents (if need cloud storage)

But start simple, scale when needed.