sciagent code + Gitea Actions CI/CD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
@@ -0,0 +1,259 @@
+# Tech Stack Comparison: Original vs Simplified
+
+## Quick Reference
+
+### Original Suggestions → Simplified Alternatives
+
+| Requirement | Original Tech | Simplified Tech | Complexity Reduction |
+|------------|--------------|----------------|---------------------|
+| **Workflow Engine** | Camunda / Temporal | Custom Python state machine | 90% simpler |
+| **Document Storage** | AWS S3 / MinIO | Local filesystem + PostgreSQL metadata | 80% simpler |
+| **Search & Duplicate Detection** | Elasticsearch + ML (Sentence-BERT) | PostgreSQL full-text + pg_trgm | 85% simpler |
+| **Event Bus** | Apache Kafka / RabbitMQ | PostgreSQL NOTIFY/LISTEN or in-memory | 90% simpler |
+| **Business Rules** | Drools | Custom Python classes/functions | 70% simpler |
+| **Audit Log** | Separate WORM storage | PostgreSQL append-only table | 60% simpler |
+
+---
+
+## Detailed Simplifications
+
+### 1. Workflow Engine
+
+**Original:** Camunda or Temporal
+- Separate service to run
+- Complex BPMN diagrams
+- Additional database
+- Learning curve
+
+**Simplified:** Custom Python State Machine
+```python
+# ~100 lines of code
+class InitiativeWorkflow:
+    STATES = {
+        'DRAFT': ['SUBMITTED'],
+        'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
+        'UNIT_REVIEW': ['COUNCIL_REVIEW', 'REJECTED'],
+        'COUNCIL_REVIEW': ['APPROVED', 'REJECTED'],
+        'APPROVED': ['FINALIZED', 'APPEAL'],
+        'REJECTED': ['APPEAL'],
+        'APPEAL': ['APPROVED', 'REJECTED', 'FINALIZED'],
+        'FINALIZED': []
+    }
+    
+    def can_transition(self, from_state, to_state, user_role):
+        return to_state in self.STATES.get(from_state, [])
+```
+
+**Savings:**
+- No separate service
+- No BPMN learning
+- Easier to debug
+- Version controlled
+
+---
+
+### 2. Document Storage
+
+**Original:** AWS S3 / MinIO
+- Separate service
+- API calls for every operation
+- Network latency
+- Additional configuration
+
+**Simplified:** Local Filesystem
+```
+/initiatives/
+  /{initiative_id}/
+    /forms/
+    /reviews/
+    /attachments/
+```
+
+**Savings:**
+- Direct file access
+- No API calls
+- Simpler backup (copy folder)
+- No network dependency
+
+---
+
+### 3. Search & Duplicate Detection
+
+**Original:** Elasticsearch + ML Model (Sentence-BERT)
+- Separate service
+- Model training required
+- Complex deployment
+- Resource intensive
+
+**Simplified:** PostgreSQL Full-Text + Trigram Similarity
+```sql
+-- Enable extensions
+CREATE EXTENSION IF NOT EXISTS pg_trgm;
+
+-- Create index
+CREATE INDEX idx_initiative_description_gin 
+ON initiatives USING gin(to_tsvector('english', description));
+
+-- Similarity search
+SELECT id, title, 
+       similarity(description, 'search text') as score
+FROM initiatives
+WHERE similarity(description, 'search text') > 0.3
+ORDER BY score DESC;
+```
+
+**Savings:**
+- Built into PostgreSQL
+- No model training
+- No separate service
+- Good enough for local scale
+
+---
+
+### 4. Event Bus
+
+**Original:** Apache Kafka / RabbitMQ
+- Separate service
+- Complex configuration
+- Message persistence
+- Consumer groups
+
+**Simplified:** PostgreSQL NOTIFY/LISTEN
+```python
+# Publisher
+async def notify_event(event_type, data):
+    await db.execute(
+        "SELECT pg_notify('initiative_events', %s)",
+        json.dumps({'type': event_type, 'data': data})
+    )
+
+# Listener
+async def listen_events():
+    conn = await asyncpg.connect(...)
+    await conn.add_listener('initiative_events', handle_event)
+```
+
+**Savings:**
+- No separate service
+- Built into database
+- Persistent (if needed)
+- Simple pub/sub
+
+---
+
+### 5. Business Rules Engine
+
+**Original:** Drools
+- Java-based
+- Separate rule files
+- Complex syntax
+- Additional dependency
+
+**Simplified:** Python Functions/Classes
+```python
+class NoveltyChecker:
+    def check(self, initiative):
+        # Check similarity with existing
+        similar = self.find_similar(initiative)
+        if similar:
+            return ValidationResult(invalid=True, reason="Duplicate found")
+        return ValidationResult(valid=True)
+
+class ScoringEngine:
+    def calculate(self, reviews):
+        scores = [r.score for r in reviews if r.score is not None]
+        if len(scores) == 0:
+            return None
+        return sum(scores) / len(scores)
+```
+
+**Savings:**
+- Native Python
+- Easy to test
+- Version controlled
+- No external engine
+
+---
+
+## Resource Usage Comparison
+
+### Original Stack
+- PostgreSQL: ~200MB RAM
+- FastAPI: ~100MB RAM
+- Elasticsearch: ~1GB RAM
+- Kafka: ~500MB RAM
+- MinIO: ~200MB RAM
+- **Total: ~2GB RAM minimum**
+
+### Simplified Stack
+- PostgreSQL: ~200MB RAM
+- FastAPI: ~100MB RAM
+- **Total: ~300MB RAM**
+
+**Savings: 85% less memory**
+
+---
+
+## Deployment Complexity
+
+### Original Stack
+```
+docker-compose.yml:
+  - postgres
+  - fastapi
+  - elasticsearch
+  - kafka
+  - minio
+  - zookeeper (for Kafka)
+  
+Total: 6+ containers
+```
+
+### Simplified Stack
+```
+docker-compose.yml:
+  - postgres
+  - fastapi
+  
+Total: 2 containers
+```
+
+**Savings: 67% fewer services**
+
+---
+
+## Maintenance Effort
+
+| Task | Original | Simplified | Time Saved |
+|------|----------|------------|------------|
+| Setup | 2-3 days | 2-3 hours | 90% |
+| Debugging | Complex (multiple services) | Simple (2 services) | 70% |
+| Updates | Multiple services | 2 services | 80% |
+| Monitoring | Multiple dashboards | Single dashboard | 75% |
+
+---
+
+## When to Upgrade
+
+Upgrade to original stack only if:
+
+1. **Scale:** >10,000 initiatives/year
+2. **Users:** >100 concurrent users
+3. **Performance:** Response time >2s
+4. **Distribution:** Multi-server deployment
+5. **Advanced ML:** Need sophisticated NLP
+
+For local application with typical load (<5,000 initiatives/year), simplified stack is optimal.
+
+---
+
+## Migration Path
+
+If you need to scale later:
+
+1. **Add Redis** for caching (if slow queries)
+2. **Add Elasticsearch** for advanced search (if PostgreSQL search insufficient)
+3. **Add RabbitMQ** for async processing (if need background jobs)
+4. **Move to S3** for documents (if need cloud storage)
+
+But start simple, scale when needed.