sciagent/docs/SIMPLIFIED_TECH_STACK.md

# Simplified Tech Stack for Local Governance Layer

## Analysis & Simplification Strategy

### Key Observations
1. **Local Application Context**: Single-server deployment, not distributed
2. **Existing Stack**: Already using FastAPI + PostgreSQL
3. **Complexity Overkill**: Enterprise tools (Kafka, Camunda, Elasticsearch) are unnecessary for local deployment
4. **Core Needs**: State machine, rules engine, document storage, audit logging

---

## Simplified Tech Stack Recommendation

### ✅ **Core Stack (Keep These)**

| Component | Technology | Rationale |
|-----------|-----------|-----------|
| **Database** | PostgreSQL 15+ | ✅ Already in use, supports JSONB, excellent for local deployment |
| **API Framework** | FastAPI (Python) | ✅ Already in use, fast, async, great for this use case |
| **Document Storage** | Local filesystem + PostgreSQL (metadata) | ✅ Simple, no external service needed |
| **Business Rules** | Custom Python classes/functions | ✅ Lightweight, maintainable, no external engine needed |

### 🔄 **Replace Complex Components**

| Original Suggestion | Simplified Alternative | Why |
|-------------------|----------------------|-----|
| **Camunda/Temporal** | Custom state machine (Python) | Simple workflow states, no need for enterprise orchestration |
| **Elasticsearch + ML** | PostgreSQL full-text search + `pg_trgm` (trigram similarity) | Built-in, sufficient for duplicate detection |
| **Apache Kafka/RabbitMQ** | PostgreSQL NOTIFY/LISTEN or in-memory event queue | Simple pub/sub, no separate service |
| **AWS S3/MinIO** | Local filesystem with organized folders | Direct file storage, simpler for local |
| **Drools** | Python rule functions/classes | More maintainable, easier to debug |

---

## Recommended Simplified Architecture

### 1. **Database Layer**
```python
# Single PostgreSQL database with:
- Core tables (initiatives, authors, reviews, etc.)
- JSONB columns for flexible metadata
- Full-text search indexes (GIN indexes on text fields)
- pg_trgm extension for similarity matching
```

**Benefits:**
- No additional services
- ACID compliance
- Built-in full-text search
- Trigram similarity for duplicate detection

### 2. **Business Rules Engine**
```python
# Custom Python classes
class NoveltyChecker:
    def check(self, initiative: Initiative) -> ValidationResult

class ScoringEngine:
    def calculate_score(self, reviews: List[Review]) -> Score

class WorkflowStateMachine:
    def transition(self, initiative: Initiative, action: str) -> State
```

**Benefits:**
- Easy to test and debug
- No external dependencies
- Version control friendly
- Can be extended incrementally

### 3. **Workflow Engine**
```python
# Simple state machine
class InitiativeWorkflow:
    STATES = ['DRAFT', 'SUBMITTED', 'UNIT_REVIEW', ...]
    TRANSITIONS = {
        'DRAFT': ['SUBMITTED'],
        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
        ...
    }

    def can_transition(self, from_state, to_state, user_role):
        # Check permissions and business rules
        pass
```

**Benefits:**
- No external workflow engine
- Easy to understand and modify
- Can store state in database
- Lightweight

### 4. **Document Storage**
```python
# Local filesystem structure
/initiatives/
  /{initiative_id}/
    /forms/
      form_01_v1.pdf
      form_03_v1.pdf
    /reviews/
      review_001.pdf
    /attachments/
      evidence_001.pdf

# Metadata in PostgreSQL
CREATE TABLE document_metadata (
    id UUID PRIMARY KEY,
    initiative_id UUID REFERENCES initiatives(id),
    file_path TEXT,
    form_type VARCHAR(50),
    version INT,
    uploaded_by UUID,
    uploaded_at TIMESTAMP,
    checksum VARCHAR(64)
);
```

**Benefits:**
- No object storage service needed
- Easy backup (just copy folder)
- Direct file access
- Simple versioning

### 5. **Duplicate Detection**
```sql
-- Use PostgreSQL trigram similarity
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Similarity query
SELECT
    i1.id,
    i1.title,
    similarity(i1.description, i2.description) as sim_score
FROM initiatives i1
CROSS JOIN initiatives i2
WHERE i1.id != i2.id
  AND similarity(i1.description, i2.description) > 0.7
ORDER BY sim_score DESC;
```

**Benefits:**
- Built into PostgreSQL
- No ML model training needed
- Fast enough for local scale
- Can be enhanced with custom logic

### 6. **Event System**
```python
# Simple in-memory event dispatcher
class EventDispatcher:
    def __init__(self):
        self.listeners = {}

    def subscribe(self, event_type, callback):
        if event_type not in self.listeners:
            self.listeners[event_type] = []
        self.listeners[event_type].append(callback)

    def emit(self, event_type, data):
        for callback in self.listeners.get(event_type, []):
            callback(data)

# Or use PostgreSQL NOTIFY/LISTEN for persistence
```

**Benefits:**
- No message broker needed
- Simple pub/sub pattern
- Can persist events to database if needed
- Easy to add email notifications

### 7. **Audit Logging**
```sql
-- Simple append-only table
CREATE TABLE audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    initiative_id UUID,
    actor_id UUID,
    action VARCHAR(100),
    timestamp TIMESTAMP DEFAULT NOW(),
    previous_state JSONB,
    new_state JSONB,
    metadata JSONB
);

CREATE INDEX idx_audit_initiative ON audit_log(initiative_id);
CREATE INDEX idx_audit_timestamp ON audit_log(timestamp);
```

**Benefits:**
- No separate audit system
- Queryable with SQL
- Can export for compliance
- Simple to implement

---

## Complete Simplified Stack

### **Backend**
```
FastAPI (Python)
├── Database: PostgreSQL 15+
│   ├── Core tables (initiatives, authors, reviews, etc.)
│   ├── JSONB for flexible data
│   ├── Full-text search (GIN indexes)
│   ├── Trigram similarity (pg_trgm)
│   └── Audit log table
├── Business Logic: Custom Python classes
│   ├── NoveltyChecker
│   ├── ScoringEngine
│   ├── WorkflowStateMachine
│   └── DuplicateDetector
├── Document Storage: Local filesystem
│   └── Organized folder structure
├── Event System: In-memory dispatcher + PostgreSQL NOTIFY
└── API: FastAPI REST endpoints
```

### **Frontend** (Already in place)
```
React + TypeScript
├── Feature-based architecture
├── React Query for data fetching
└── Existing UI components
```

---

## Implementation Priority

### **Phase 1: Core Foundation** (Week 1-2)
1. ✅ Database schema (PostgreSQL)
2. ✅ Basic CRUD APIs (FastAPI)
3. ✅ Document upload/storage (local filesystem)
4. ✅ Basic state machine (Python class)

### **Phase 2: Business Rules** (Week 3-4)
1. ✅ Novelty checking (PostgreSQL similarity)
2. ✅ Author contribution validation
3. ✅ Scoring algorithm (Group 01)
4. ✅ Auto-classification (Group 02)

### **Phase 3: Workflow & Notifications** (Week 5-6)
1. ✅ Complete state machine transitions
2. ✅ Deadline tracking & alerts
3. ✅ Email notifications (SMTP)
4. ✅ Duplicate detection & mediation

### **Phase 4: Advanced Features** (Week 7-8)
1. ✅ Reporting & analytics
2. ✅ Audit trail queries
3. ✅ Role-based permissions
4. ✅ Appeal workflow

---

## Technology Comparison

### **Original Stack Complexity**
- 8+ services to manage
- External dependencies (Kafka, Elasticsearch, S3)
- Complex deployment
- Higher resource usage
- Steeper learning curve

### **Simplified Stack**
- 2 services (FastAPI + PostgreSQL)
- Minimal external dependencies
- Simple deployment
- Lower resource usage
- Easier to maintain

---

## When to Scale Up

Consider adding complexity only if:
- **>10,000 initiatives/year**: Add Elasticsearch for search
- **>100 concurrent users**: Add Redis for caching
- **Multi-server deployment**: Add message queue (RabbitMQ)
- **Advanced ML needed**: Add dedicated ML service
- **Cloud deployment**: Use S3 for documents

For local application with <5,000 initiatives/year, simplified stack is sufficient.

---

## Code Structure Example

```
be0/
├── src/
│   ├── domain/
│   │   ├── entities/
│   │   │   ├── initiative.py
│   │   │   ├── author.py
│   │   │   └── review.py
│   │   └── rules/
│   │       ├── novelty_checker.py
│   │       ├── scoring_engine.py
│   │       └── duplicate_detector.py
│   ├── application/
│   │   ├── services/
│   │   │   ├── workflow_service.py
│   │   │   └── notification_service.py
│   │   └── state_machine.py
│   ├── infrastructure/
│   │   ├── database/
│   │   │   └── models.py
│   │   ├── storage/
│   │   │   └── file_storage.py
│   │   └── events/
│   │       └── dispatcher.py
│   └── api/
│       └── routes/
│           └── initiatives.py
└── storage/
    └── documents/
        └── initiatives/
```

---

## Summary

**Simplified Stack:**
- ✅ PostgreSQL (database + search + similarity)
- ✅ FastAPI (API framework)
- ✅ Python (business rules + workflow)
- ✅ Local filesystem (document storage)
- ✅ In-memory events (or PostgreSQL NOTIFY)

**Removed:**
- ❌ Camunda/Temporal (use custom state machine)
- ❌ Elasticsearch (use PostgreSQL full-text search)
- ❌ Kafka/RabbitMQ (use simple event dispatcher)
- ❌ S3/MinIO (use local filesystem)
- ❌ Drools (use Python functions)

**Result:** Simpler, easier to maintain, sufficient for local deployment, can scale up later if needed.