Files
sciagent/docs/SIMPLIFIED_TECH_STACK.md
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

9.4 KiB

Simplified Tech Stack for Local Governance Layer

Analysis & Simplification Strategy

Key Observations

  1. Local Application Context: Single-server deployment, not distributed
  2. Existing Stack: Already using FastAPI + PostgreSQL
  3. Complexity Overkill: Enterprise tools (Kafka, Camunda, Elasticsearch) are unnecessary for local deployment
  4. Core Needs: State machine, rules engine, document storage, audit logging

Simplified Tech Stack Recommendation

Core Stack (Keep These)

Component Technology Rationale
Database PostgreSQL 15+ Already in use, supports JSONB, excellent for local deployment
API Framework FastAPI (Python) Already in use, fast, async, great for this use case
Document Storage Local filesystem + PostgreSQL (metadata) Simple, no external service needed
Business Rules Custom Python classes/functions Lightweight, maintainable, no external engine needed

🔄 Replace Complex Components

Original Suggestion Simplified Alternative Why
Camunda/Temporal Custom state machine (Python) Simple workflow states, no need for enterprise orchestration
Elasticsearch + ML PostgreSQL full-text search + pg_trgm (trigram similarity) Built-in, sufficient for duplicate detection
Apache Kafka/RabbitMQ PostgreSQL NOTIFY/LISTEN or in-memory event queue Simple pub/sub, no separate service
AWS S3/MinIO Local filesystem with organized folders Direct file storage, simpler for local
Drools Python rule functions/classes More maintainable, easier to debug

1. Database Layer

# Single PostgreSQL database with:
- Core tables (initiatives, authors, reviews, etc.)
- JSONB columns for flexible metadata
- Full-text search indexes (GIN indexes on text fields)
- pg_trgm extension for similarity matching

Benefits:

  • No additional services
  • ACID compliance
  • Built-in full-text search
  • Trigram similarity for duplicate detection

2. Business Rules Engine

# Custom Python classes
class NoveltyChecker:
    def check(self, initiative: Initiative) -> ValidationResult
    
class ScoringEngine:
    def calculate_score(self, reviews: List[Review]) -> Score
    
class WorkflowStateMachine:
    def transition(self, initiative: Initiative, action: str) -> State

Benefits:

  • Easy to test and debug
  • No external dependencies
  • Version control friendly
  • Can be extended incrementally

3. Workflow Engine

# Simple state machine
class InitiativeWorkflow:
    STATES = ['DRAFT', 'SUBMITTED', 'UNIT_REVIEW', ...]
    TRANSITIONS = {
        'DRAFT': ['SUBMITTED'],
        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
        ...
    }
    
    def can_transition(self, from_state, to_state, user_role):
        # Check permissions and business rules
        pass

Benefits:

  • No external workflow engine
  • Easy to understand and modify
  • Can store state in database
  • Lightweight

4. Document Storage

# Local filesystem structure
/initiatives/
  /{initiative_id}/
    /forms/
      form_01_v1.pdf
      form_03_v1.pdf
    /reviews/
      review_001.pdf
    /attachments/
      evidence_001.pdf

# Metadata in PostgreSQL
CREATE TABLE document_metadata (
    id UUID PRIMARY KEY,
    initiative_id UUID REFERENCES initiatives(id),
    file_path TEXT,
    form_type VARCHAR(50),
    version INT,
    uploaded_by UUID,
    uploaded_at TIMESTAMP,
    checksum VARCHAR(64)
);

Benefits:

  • No object storage service needed
  • Easy backup (just copy folder)
  • Direct file access
  • Simple versioning

5. Duplicate Detection

-- Use PostgreSQL trigram similarity
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Similarity query
SELECT 
    i1.id,
    i1.title,
    similarity(i1.description, i2.description) as sim_score
FROM initiatives i1
CROSS JOIN initiatives i2
WHERE i1.id != i2.id
  AND similarity(i1.description, i2.description) > 0.7
ORDER BY sim_score DESC;

Benefits:

  • Built into PostgreSQL
  • No ML model training needed
  • Fast enough for local scale
  • Can be enhanced with custom logic

6. Event System

# Simple in-memory event dispatcher
class EventDispatcher:
    def __init__(self):
        self.listeners = {}
    
    def subscribe(self, event_type, callback):
        if event_type not in self.listeners:
            self.listeners[event_type] = []
        self.listeners[event_type].append(callback)
    
    def emit(self, event_type, data):
        for callback in self.listeners.get(event_type, []):
            callback(data)

# Or use PostgreSQL NOTIFY/LISTEN for persistence

Benefits:

  • No message broker needed
  • Simple pub/sub pattern
  • Can persist events to database if needed
  • Easy to add email notifications

7. Audit Logging

-- Simple append-only table
CREATE TABLE audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    initiative_id UUID,
    actor_id UUID,
    action VARCHAR(100),
    timestamp TIMESTAMP DEFAULT NOW(),
    previous_state JSONB,
    new_state JSONB,
    metadata JSONB
);

CREATE INDEX idx_audit_initiative ON audit_log(initiative_id);
CREATE INDEX idx_audit_timestamp ON audit_log(timestamp);

Benefits:

  • No separate audit system
  • Queryable with SQL
  • Can export for compliance
  • Simple to implement

Complete Simplified Stack

Backend

FastAPI (Python)
├── Database: PostgreSQL 15+
│   ├── Core tables (initiatives, authors, reviews, etc.)
│   ├── JSONB for flexible data
│   ├── Full-text search (GIN indexes)
│   ├── Trigram similarity (pg_trgm)
│   └── Audit log table
├── Business Logic: Custom Python classes
│   ├── NoveltyChecker
│   ├── ScoringEngine
│   ├── WorkflowStateMachine
│   └── DuplicateDetector
├── Document Storage: Local filesystem
│   └── Organized folder structure
├── Event System: In-memory dispatcher + PostgreSQL NOTIFY
└── API: FastAPI REST endpoints

Frontend (Already in place)

React + TypeScript
├── Feature-based architecture
├── React Query for data fetching
└── Existing UI components

Implementation Priority

Phase 1: Core Foundation (Week 1-2)

  1. Database schema (PostgreSQL)
  2. Basic CRUD APIs (FastAPI)
  3. Document upload/storage (local filesystem)
  4. Basic state machine (Python class)

Phase 2: Business Rules (Week 3-4)

  1. Novelty checking (PostgreSQL similarity)
  2. Author contribution validation
  3. Scoring algorithm (Group 01)
  4. Auto-classification (Group 02)

Phase 3: Workflow & Notifications (Week 5-6)

  1. Complete state machine transitions
  2. Deadline tracking & alerts
  3. Email notifications (SMTP)
  4. Duplicate detection & mediation

Phase 4: Advanced Features (Week 7-8)

  1. Reporting & analytics
  2. Audit trail queries
  3. Role-based permissions
  4. Appeal workflow

Technology Comparison

Original Stack Complexity

  • 8+ services to manage
  • External dependencies (Kafka, Elasticsearch, S3)
  • Complex deployment
  • Higher resource usage
  • Steeper learning curve

Simplified Stack

  • 2 services (FastAPI + PostgreSQL)
  • Minimal external dependencies
  • Simple deployment
  • Lower resource usage
  • Easier to maintain

When to Scale Up

Consider adding complexity only if:

  • >10,000 initiatives/year: Add Elasticsearch for search
  • >100 concurrent users: Add Redis for caching
  • Multi-server deployment: Add message queue (RabbitMQ)
  • Advanced ML needed: Add dedicated ML service
  • Cloud deployment: Use S3 for documents

For local application with <5,000 initiatives/year, simplified stack is sufficient.


Code Structure Example

be0/
├── src/
│   ├── domain/
│   │   ├── entities/
│   │   │   ├── initiative.py
│   │   │   ├── author.py
│   │   │   └── review.py
│   │   └── rules/
│   │       ├── novelty_checker.py
│   │       ├── scoring_engine.py
│   │       └── duplicate_detector.py
│   ├── application/
│   │   ├── services/
│   │   │   ├── workflow_service.py
│   │   │   └── notification_service.py
│   │   └── state_machine.py
│   ├── infrastructure/
│   │   ├── database/
│   │   │   └── models.py
│   │   ├── storage/
│   │   │   └── file_storage.py
│   │   └── events/
│   │       └── dispatcher.py
│   └── api/
│       └── routes/
│           └── initiatives.py
└── storage/
    └── documents/
        └── initiatives/

Summary

Simplified Stack:

  • PostgreSQL (database + search + similarity)
  • FastAPI (API framework)
  • Python (business rules + workflow)
  • Local filesystem (document storage)
  • In-memory events (or PostgreSQL NOTIFY)

Removed:

  • Camunda/Temporal (use custom state machine)
  • Elasticsearch (use PostgreSQL full-text search)
  • Kafka/RabbitMQ (use simple event dispatcher)
  • S3/MinIO (use local filesystem)
  • Drools (use Python functions)

Result: Simpler, easier to maintain, sufficient for local deployment, can scale up later if needed.