Files
sciagent/docs/TECH_STACK_COMPARISON.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

5.8 KiB

Tech Stack Comparison: Original vs Simplified

Quick Reference

Original Suggestions → Simplified Alternatives

Requirement Original Tech Simplified Tech Complexity Reduction
Workflow Engine Camunda / Temporal Custom Python state machine 90% simpler
Document Storage AWS S3 / MinIO Local filesystem + PostgreSQL metadata 80% simpler
Search & Duplicate Detection Elasticsearch + ML (Sentence-BERT) PostgreSQL full-text + pg_trgm 85% simpler
Event Bus Apache Kafka / RabbitMQ PostgreSQL NOTIFY/LISTEN or in-memory 90% simpler
Business Rules Drools Custom Python classes/functions 70% simpler
Audit Log Separate WORM storage PostgreSQL append-only table 60% simpler

Detailed Simplifications

1. Workflow Engine

Original: Camunda or Temporal

  • Separate service to run
  • Complex BPMN diagrams
  • Additional database
  • Learning curve

Simplified: Custom Python State Machine

# ~100 lines of code
class InitiativeWorkflow:
    STATES = {
        'DRAFT': ['SUBMITTED'],
        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
        'UNIT_REVIEW': ['COUNCIL_REVIEW', 'REJECTED'],
        'COUNCIL_REVIEW': ['APPROVED', 'REJECTED'],
        'APPROVED': ['FINALIZED', 'APPEAL'],
        'REJECTED': ['APPEAL'],
        'APPEAL': ['APPROVED', 'REJECTED', 'FINALIZED'],
        'FINALIZED': []
    }
    
    def can_transition(self, from_state, to_state, user_role):
        return to_state in self.STATES.get(from_state, [])

Savings:

  • No separate service
  • No BPMN learning
  • Easier to debug
  • Version controlled

2. Document Storage

Original: AWS S3 / MinIO

  • Separate service
  • API calls for every operation
  • Network latency
  • Additional configuration

Simplified: Local Filesystem

/initiatives/
  /{initiative_id}/
    /forms/
    /reviews/
    /attachments/

Savings:

  • Direct file access
  • No API calls
  • Simpler backup (copy folder)
  • No network dependency

3. Search & Duplicate Detection

Original: Elasticsearch + ML Model (Sentence-BERT)

  • Separate service
  • Model training required
  • Complex deployment
  • Resource intensive

Simplified: PostgreSQL Full-Text + Trigram Similarity

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create index
CREATE INDEX idx_initiative_description_gin 
ON initiatives USING gin(to_tsvector('english', description));

-- Similarity search
SELECT id, title, 
       similarity(description, 'search text') as score
FROM initiatives
WHERE similarity(description, 'search text') > 0.3
ORDER BY score DESC;

Savings:

  • Built into PostgreSQL
  • No model training
  • No separate service
  • Good enough for local scale

4. Event Bus

Original: Apache Kafka / RabbitMQ

  • Separate service
  • Complex configuration
  • Message persistence
  • Consumer groups

Simplified: PostgreSQL NOTIFY/LISTEN

# Publisher
async def notify_event(event_type, data):
    await db.execute(
        "SELECT pg_notify('initiative_events', %s)",
        json.dumps({'type': event_type, 'data': data})
    )

# Listener
async def listen_events():
    conn = await asyncpg.connect(...)
    await conn.add_listener('initiative_events', handle_event)

Savings:

  • No separate service
  • Built into database
  • Persistent (if needed)
  • Simple pub/sub

5. Business Rules Engine

Original: Drools

  • Java-based
  • Separate rule files
  • Complex syntax
  • Additional dependency

Simplified: Python Functions/Classes

class NoveltyChecker:
    def check(self, initiative):
        # Check similarity with existing
        similar = self.find_similar(initiative)
        if similar:
            return ValidationResult(invalid=True, reason="Duplicate found")
        return ValidationResult(valid=True)

class ScoringEngine:
    def calculate(self, reviews):
        scores = [r.score for r in reviews if r.score is not None]
        if len(scores) == 0:
            return None
        return sum(scores) / len(scores)

Savings:

  • Native Python
  • Easy to test
  • Version controlled
  • No external engine

Resource Usage Comparison

Original Stack

  • PostgreSQL: ~200MB RAM
  • FastAPI: ~100MB RAM
  • Elasticsearch: ~1GB RAM
  • Kafka: ~500MB RAM
  • MinIO: ~200MB RAM
  • Total: ~2GB RAM minimum

Simplified Stack

  • PostgreSQL: ~200MB RAM
  • FastAPI: ~100MB RAM
  • Total: ~300MB RAM

Savings: 85% less memory


Deployment Complexity

Original Stack

docker-compose.yml:
  - postgres
  - fastapi
  - elasticsearch
  - kafka
  - minio
  - zookeeper (for Kafka)
  
Total: 6+ containers

Simplified Stack

docker-compose.yml:
  - postgres
  - fastapi
  
Total: 2 containers

Savings: 67% fewer services


Maintenance Effort

Task Original Simplified Time Saved
Setup 2-3 days 2-3 hours 90%
Debugging Complex (multiple services) Simple (2 services) 70%
Updates Multiple services 2 services 80%
Monitoring Multiple dashboards Single dashboard 75%

When to Upgrade

Upgrade to original stack only if:

  1. Scale: >10,000 initiatives/year
  2. Users: >100 concurrent users
  3. Performance: Response time >2s
  4. Distribution: Multi-server deployment
  5. Advanced ML: Need sophisticated NLP

For local application with typical load (<5,000 initiatives/year), simplified stack is optimal.


Migration Path

If you need to scale later:

  1. Add Redis for caching (if slow queries)
  2. Add Elasticsearch for advanced search (if PostgreSQL search insufficient)
  3. Add RabbitMQ for async processing (if need background jobs)
  4. Move to S3 for documents (if need cloud storage)

But start simple, scale when needed.