tlam89/sciagent

Fork 0

Files

T

Thinh Lam 688fac73e9

CI/CD / backend (push) Failing after 2m8s

Details

CI/CD / frontend (push) Failing after 1m40s

Details

CI/CD / deploy (push) Has been skipped

Details

sciagent code + Gitea Actions CI/CD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-30 09:38:30 +07:00

9.4 KiB

Raw Blame History

Simplified Tech Stack for Local Governance Layer

Analysis & Simplification Strategy

Key Observations

Local Application Context: Single-server deployment, not distributed
Existing Stack: Already using FastAPI + PostgreSQL
Complexity Overkill: Enterprise tools (Kafka, Camunda, Elasticsearch) are unnecessary for local deployment
Core Needs: State machine, rules engine, document storage, audit logging

Simplified Tech Stack Recommendation

✅ Core Stack (Keep These)

Component	Technology	Rationale
Database	PostgreSQL 15+	✅ Already in use, supports JSONB, excellent for local deployment
API Framework	FastAPI (Python)	✅ Already in use, fast, async, great for this use case
Document Storage	Local filesystem + PostgreSQL (metadata)	✅ Simple, no external service needed
Business Rules	Custom Python classes/functions	✅ Lightweight, maintainable, no external engine needed

🔄 Replace Complex Components

Original Suggestion	Simplified Alternative	Why
Camunda/Temporal	Custom state machine (Python)	Simple workflow states, no need for enterprise orchestration
Elasticsearch + ML	PostgreSQL full-text search + `pg_trgm` (trigram similarity)	Built-in, sufficient for duplicate detection
Apache Kafka/RabbitMQ	PostgreSQL NOTIFY/LISTEN or in-memory event queue	Simple pub/sub, no separate service
AWS S3/MinIO	Local filesystem with organized folders	Direct file storage, simpler for local
Drools	Python rule functions/classes	More maintainable, easier to debug

Recommended Simplified Architecture

1. Database Layer

# Single PostgreSQL database with:
- Core tables (initiatives, authors, reviews, etc.)
- JSONB columns for flexible metadata
- Full-text search indexes (GIN indexes on text fields)
- pg_trgm extension for similarity matching

Benefits:

No additional services
ACID compliance
Built-in full-text search
Trigram similarity for duplicate detection

2. Business Rules Engine

# Custom Python classes
class NoveltyChecker:
    def check(self, initiative: Initiative) -> ValidationResult
    
class ScoringEngine:
    def calculate_score(self, reviews: List[Review]) -> Score
    
class WorkflowStateMachine:
    def transition(self, initiative: Initiative, action: str) -> State

Benefits:

Easy to test and debug
No external dependencies
Version control friendly
Can be extended incrementally

3. Workflow Engine

# Simple state machine
class InitiativeWorkflow:
    STATES = ['DRAFT', 'SUBMITTED', 'UNIT_REVIEW', ...]
    TRANSITIONS = {
        'DRAFT': ['SUBMITTED'],
        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
        ...
    }
    
    def can_transition(self, from_state, to_state, user_role):
        # Check permissions and business rules
        pass

Benefits:

No external workflow engine
Easy to understand and modify
Can store state in database
Lightweight

4. Document Storage

# Local filesystem structure
/initiatives/
  /{initiative_id}/
    /forms/
      form_01_v1.pdf
      form_03_v1.pdf
    /reviews/
      review_001.pdf
    /attachments/
      evidence_001.pdf

# Metadata in PostgreSQL
CREATE TABLE document_metadata (
    id UUID PRIMARY KEY,
    initiative_id UUID REFERENCES initiatives(id),
    file_path TEXT,
    form_type VARCHAR(50),
    version INT,
    uploaded_by UUID,
    uploaded_at TIMESTAMP,
    checksum VARCHAR(64)
);

Benefits:

No object storage service needed
Easy backup (just copy folder)
Direct file access
Simple versioning

5. Duplicate Detection

-- Use PostgreSQL trigram similarity
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Similarity query
SELECT 
    i1.id,
    i1.title,
    similarity(i1.description, i2.description) as sim_score
FROM initiatives i1
CROSS JOIN initiatives i2
WHERE i1.id != i2.id
  AND similarity(i1.description, i2.description) > 0.7
ORDER BY sim_score DESC;

Benefits:

Built into PostgreSQL
No ML model training needed
Fast enough for local scale
Can be enhanced with custom logic

6. Event System

# Simple in-memory event dispatcher
class EventDispatcher:
    def __init__(self):
        self.listeners = {}
    
    def subscribe(self, event_type, callback):
        if event_type not in self.listeners:
            self.listeners[event_type] = []
        self.listeners[event_type].append(callback)
    
    def emit(self, event_type, data):
        for callback in self.listeners.get(event_type, []):
            callback(data)

# Or use PostgreSQL NOTIFY/LISTEN for persistence

Benefits:

No message broker needed
Simple pub/sub pattern
Can persist events to database if needed
Easy to add email notifications

7. Audit Logging

-- Simple append-only table
CREATE TABLE audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    initiative_id UUID,
    actor_id UUID,
    action VARCHAR(100),
    timestamp TIMESTAMP DEFAULT NOW(),
    previous_state JSONB,
    new_state JSONB,
    metadata JSONB
);

CREATE INDEX idx_audit_initiative ON audit_log(initiative_id);
CREATE INDEX idx_audit_timestamp ON audit_log(timestamp);

Benefits:

No separate audit system
Queryable with SQL
Can export for compliance
Simple to implement

Complete Simplified Stack

Backend

FastAPI (Python)
├── Database: PostgreSQL 15+
│   ├── Core tables (initiatives, authors, reviews, etc.)
│   ├── JSONB for flexible data
│   ├── Full-text search (GIN indexes)
│   ├── Trigram similarity (pg_trgm)
│   └── Audit log table
├── Business Logic: Custom Python classes
│   ├── NoveltyChecker
│   ├── ScoringEngine
│   ├── WorkflowStateMachine
│   └── DuplicateDetector
├── Document Storage: Local filesystem
│   └── Organized folder structure
├── Event System: In-memory dispatcher + PostgreSQL NOTIFY
└── API: FastAPI REST endpoints

Frontend (Already in place)

React + TypeScript
├── Feature-based architecture
├── React Query for data fetching
└── Existing UI components

Implementation Priority

Phase 1: Core Foundation (Week 1-2)

✅ Database schema (PostgreSQL)
✅ Basic CRUD APIs (FastAPI)
✅ Document upload/storage (local filesystem)
✅ Basic state machine (Python class)

Phase 2: Business Rules (Week 3-4)

✅ Novelty checking (PostgreSQL similarity)
✅ Author contribution validation
✅ Scoring algorithm (Group 01)
✅ Auto-classification (Group 02)

Phase 3: Workflow & Notifications (Week 5-6)

✅ Complete state machine transitions
✅ Deadline tracking & alerts
✅ Email notifications (SMTP)
✅ Duplicate detection & mediation

Phase 4: Advanced Features (Week 7-8)

✅ Reporting & analytics
✅ Audit trail queries
✅ Role-based permissions
✅ Appeal workflow

Technology Comparison

Original Stack Complexity

8+ services to manage
External dependencies (Kafka, Elasticsearch, S3)
Complex deployment
Higher resource usage
Steeper learning curve

Simplified Stack

2 services (FastAPI + PostgreSQL)
Minimal external dependencies
Simple deployment
Lower resource usage
Easier to maintain

When to Scale Up

Consider adding complexity only if:

>10,000 initiatives/year: Add Elasticsearch for search
>100 concurrent users: Add Redis for caching
Multi-server deployment: Add message queue (RabbitMQ)
Advanced ML needed: Add dedicated ML service
Cloud deployment: Use S3 for documents

For local application with <5,000 initiatives/year, simplified stack is sufficient.

Code Structure Example

be0/
├── src/
│   ├── domain/
│   │   ├── entities/
│   │   │   ├── initiative.py
│   │   │   ├── author.py
│   │   │   └── review.py
│   │   └── rules/
│   │       ├── novelty_checker.py
│   │       ├── scoring_engine.py
│   │       └── duplicate_detector.py
│   ├── application/
│   │   ├── services/
│   │   │   ├── workflow_service.py
│   │   │   └── notification_service.py
│   │   └── state_machine.py
│   ├── infrastructure/
│   │   ├── database/
│   │   │   └── models.py
│   │   ├── storage/
│   │   │   └── file_storage.py
│   │   └── events/
│   │       └── dispatcher.py
│   └── api/
│       └── routes/
│           └── initiatives.py
└── storage/
    └── documents/
        └── initiatives/

Summary

Simplified Stack:

✅ PostgreSQL (database + search + similarity)
✅ FastAPI (API framework)
✅ Python (business rules + workflow)
✅ Local filesystem (document storage)
✅ In-memory events (or PostgreSQL NOTIFY)

Removed:

❌ Camunda/Temporal (use custom state machine)
❌ Elasticsearch (use PostgreSQL full-text search)
❌ Kafka/RabbitMQ (use simple event dispatcher)
❌ S3/MinIO (use local filesystem)
❌ Drools (use Python functions)

Result: Simpler, easier to maintain, sufficient for local deployment, can scale up later if needed.

9.4 KiB Raw Blame History

Simplified Tech Stack for Local Governance Layer

Analysis & Simplification Strategy

Key Observations

Simplified Tech Stack Recommendation

✅ Core Stack (Keep These)

🔄 Replace Complex Components

Recommended Simplified Architecture

1. Database Layer

2. Business Rules Engine

3. Workflow Engine

4. Document Storage

5. Duplicate Detection

6. Event System

7. Audit Logging

Complete Simplified Stack

Backend

Frontend (Already in place)

Implementation Priority

Phase 1: Core Foundation (Week 1-2)

Phase 2: Business Rules (Week 3-4)

Phase 3: Workflow & Notifications (Week 5-6)

Phase 4: Advanced Features (Week 7-8)

Technology Comparison

Original Stack Complexity

Simplified Stack

When to Scale Up

Code Structure Example

Summary

9.4 KiB

Raw Blame History