Files
sciagent/docs/SIMPLIFIED_TECH_STACK.md
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

344 lines
9.4 KiB
Markdown

# Simplified Tech Stack for Local Governance Layer
## Analysis & Simplification Strategy
### Key Observations
1. **Local Application Context**: Single-server deployment, not distributed
2. **Existing Stack**: Already using FastAPI + PostgreSQL
3. **Complexity Overkill**: Enterprise tools (Kafka, Camunda, Elasticsearch) are unnecessary for local deployment
4. **Core Needs**: State machine, rules engine, document storage, audit logging
---
## Simplified Tech Stack Recommendation
### ✅ **Core Stack (Keep These)**
| Component | Technology | Rationale |
|-----------|-----------|-----------|
| **Database** | PostgreSQL 15+ | ✅ Already in use, supports JSONB, excellent for local deployment |
| **API Framework** | FastAPI (Python) | ✅ Already in use, fast, async, great for this use case |
| **Document Storage** | Local filesystem + PostgreSQL (metadata) | ✅ Simple, no external service needed |
| **Business Rules** | Custom Python classes/functions | ✅ Lightweight, maintainable, no external engine needed |
### 🔄 **Replace Complex Components**
| Original Suggestion | Simplified Alternative | Why |
|-------------------|----------------------|-----|
| **Camunda/Temporal** | Custom state machine (Python) | Simple workflow states, no need for enterprise orchestration |
| **Elasticsearch + ML** | PostgreSQL full-text search + `pg_trgm` (trigram similarity) | Built-in, sufficient for duplicate detection |
| **Apache Kafka/RabbitMQ** | PostgreSQL NOTIFY/LISTEN or in-memory event queue | Simple pub/sub, no separate service |
| **AWS S3/MinIO** | Local filesystem with organized folders | Direct file storage, simpler for local |
| **Drools** | Python rule functions/classes | More maintainable, easier to debug |
---
## Recommended Simplified Architecture
### 1. **Database Layer**
```python
# Single PostgreSQL database with:
- Core tables (initiatives, authors, reviews, etc.)
- JSONB columns for flexible metadata
- Full-text search indexes (GIN indexes on text fields)
- pg_trgm extension for similarity matching
```
**Benefits:**
- No additional services
- ACID compliance
- Built-in full-text search
- Trigram similarity for duplicate detection
### 2. **Business Rules Engine**
```python
# Custom Python classes
class NoveltyChecker:
def check(self, initiative: Initiative) -> ValidationResult
class ScoringEngine:
def calculate_score(self, reviews: List[Review]) -> Score
class WorkflowStateMachine:
def transition(self, initiative: Initiative, action: str) -> State
```
**Benefits:**
- Easy to test and debug
- No external dependencies
- Version control friendly
- Can be extended incrementally
### 3. **Workflow Engine**
```python
# Simple state machine
class InitiativeWorkflow:
STATES = ['DRAFT', 'SUBMITTED', 'UNIT_REVIEW', ...]
TRANSITIONS = {
'DRAFT': ['SUBMITTED'],
'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
...
}
def can_transition(self, from_state, to_state, user_role):
# Check permissions and business rules
pass
```
**Benefits:**
- No external workflow engine
- Easy to understand and modify
- Can store state in database
- Lightweight
### 4. **Document Storage**
```python
# Local filesystem structure
/initiatives/
/{initiative_id}/
/forms/
form_01_v1.pdf
form_03_v1.pdf
/reviews/
review_001.pdf
/attachments/
evidence_001.pdf
# Metadata in PostgreSQL
CREATE TABLE document_metadata (
id UUID PRIMARY KEY,
initiative_id UUID REFERENCES initiatives(id),
file_path TEXT,
form_type VARCHAR(50),
version INT,
uploaded_by UUID,
uploaded_at TIMESTAMP,
checksum VARCHAR(64)
);
```
**Benefits:**
- No object storage service needed
- Easy backup (just copy folder)
- Direct file access
- Simple versioning
### 5. **Duplicate Detection**
```sql
-- Use PostgreSQL trigram similarity
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Similarity query
SELECT
i1.id,
i1.title,
similarity(i1.description, i2.description) as sim_score
FROM initiatives i1
CROSS JOIN initiatives i2
WHERE i1.id != i2.id
AND similarity(i1.description, i2.description) > 0.7
ORDER BY sim_score DESC;
```
**Benefits:**
- Built into PostgreSQL
- No ML model training needed
- Fast enough for local scale
- Can be enhanced with custom logic
### 6. **Event System**
```python
# Simple in-memory event dispatcher
class EventDispatcher:
def __init__(self):
self.listeners = {}
def subscribe(self, event_type, callback):
if event_type not in self.listeners:
self.listeners[event_type] = []
self.listeners[event_type].append(callback)
def emit(self, event_type, data):
for callback in self.listeners.get(event_type, []):
callback(data)
# Or use PostgreSQL NOTIFY/LISTEN for persistence
```
**Benefits:**
- No message broker needed
- Simple pub/sub pattern
- Can persist events to database if needed
- Easy to add email notifications
### 7. **Audit Logging**
```sql
-- Simple append-only table
CREATE TABLE audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
initiative_id UUID,
actor_id UUID,
action VARCHAR(100),
timestamp TIMESTAMP DEFAULT NOW(),
previous_state JSONB,
new_state JSONB,
metadata JSONB
);
CREATE INDEX idx_audit_initiative ON audit_log(initiative_id);
CREATE INDEX idx_audit_timestamp ON audit_log(timestamp);
```
**Benefits:**
- No separate audit system
- Queryable with SQL
- Can export for compliance
- Simple to implement
---
## Complete Simplified Stack
### **Backend**
```
FastAPI (Python)
├── Database: PostgreSQL 15+
│ ├── Core tables (initiatives, authors, reviews, etc.)
│ ├── JSONB for flexible data
│ ├── Full-text search (GIN indexes)
│ ├── Trigram similarity (pg_trgm)
│ └── Audit log table
├── Business Logic: Custom Python classes
│ ├── NoveltyChecker
│ ├── ScoringEngine
│ ├── WorkflowStateMachine
│ └── DuplicateDetector
├── Document Storage: Local filesystem
│ └── Organized folder structure
├── Event System: In-memory dispatcher + PostgreSQL NOTIFY
└── API: FastAPI REST endpoints
```
### **Frontend** (Already in place)
```
React + TypeScript
├── Feature-based architecture
├── React Query for data fetching
└── Existing UI components
```
---
## Implementation Priority
### **Phase 1: Core Foundation** (Week 1-2)
1. ✅ Database schema (PostgreSQL)
2. ✅ Basic CRUD APIs (FastAPI)
3. ✅ Document upload/storage (local filesystem)
4. ✅ Basic state machine (Python class)
### **Phase 2: Business Rules** (Week 3-4)
1. ✅ Novelty checking (PostgreSQL similarity)
2. ✅ Author contribution validation
3. ✅ Scoring algorithm (Group 01)
4. ✅ Auto-classification (Group 02)
### **Phase 3: Workflow & Notifications** (Week 5-6)
1. ✅ Complete state machine transitions
2. ✅ Deadline tracking & alerts
3. ✅ Email notifications (SMTP)
4. ✅ Duplicate detection & mediation
### **Phase 4: Advanced Features** (Week 7-8)
1. ✅ Reporting & analytics
2. ✅ Audit trail queries
3. ✅ Role-based permissions
4. ✅ Appeal workflow
---
## Technology Comparison
### **Original Stack Complexity**
- 8+ services to manage
- External dependencies (Kafka, Elasticsearch, S3)
- Complex deployment
- Higher resource usage
- Steeper learning curve
### **Simplified Stack**
- 2 services (FastAPI + PostgreSQL)
- Minimal external dependencies
- Simple deployment
- Lower resource usage
- Easier to maintain
---
## When to Scale Up
Consider adding complexity only if:
- **>10,000 initiatives/year**: Add Elasticsearch for search
- **>100 concurrent users**: Add Redis for caching
- **Multi-server deployment**: Add message queue (RabbitMQ)
- **Advanced ML needed**: Add dedicated ML service
- **Cloud deployment**: Use S3 for documents
For local application with <5,000 initiatives/year, simplified stack is sufficient.
---
## Code Structure Example
```
be0/
├── src/
│ ├── domain/
│ │ ├── entities/
│ │ │ ├── initiative.py
│ │ │ ├── author.py
│ │ │ └── review.py
│ │ └── rules/
│ │ ├── novelty_checker.py
│ │ ├── scoring_engine.py
│ │ └── duplicate_detector.py
│ ├── application/
│ │ ├── services/
│ │ │ ├── workflow_service.py
│ │ │ └── notification_service.py
│ │ └── state_machine.py
│ ├── infrastructure/
│ │ ├── database/
│ │ │ └── models.py
│ │ ├── storage/
│ │ │ └── file_storage.py
│ │ └── events/
│ │ └── dispatcher.py
│ └── api/
│ └── routes/
│ └── initiatives.py
└── storage/
└── documents/
└── initiatives/
```
---
## Summary
**Simplified Stack:**
- ✅ PostgreSQL (database + search + similarity)
- ✅ FastAPI (API framework)
- ✅ Python (business rules + workflow)
- ✅ Local filesystem (document storage)
- ✅ In-memory events (or PostgreSQL NOTIFY)
**Removed:**
- ❌ Camunda/Temporal (use custom state machine)
- ❌ Elasticsearch (use PostgreSQL full-text search)
- ❌ Kafka/RabbitMQ (use simple event dispatcher)
- ❌ S3/MinIO (use local filesystem)
- ❌ Drools (use Python functions)
**Result:** Simpler, easier to maintain, sufficient for local deployment, can scale up later if needed.