sciagent code + Gitea Actions CI/CD
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Thinh Lam
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
+259
View File
@@ -0,0 +1,259 @@
# Tech Stack Comparison: Original vs Simplified
## Quick Reference
### Original Suggestions → Simplified Alternatives
| Requirement | Original Tech | Simplified Tech | Complexity Reduction |
|------------|--------------|----------------|---------------------|
| **Workflow Engine** | Camunda / Temporal | Custom Python state machine | 90% simpler |
| **Document Storage** | AWS S3 / MinIO | Local filesystem + PostgreSQL metadata | 80% simpler |
| **Search & Duplicate Detection** | Elasticsearch + ML (Sentence-BERT) | PostgreSQL full-text + pg_trgm | 85% simpler |
| **Event Bus** | Apache Kafka / RabbitMQ | PostgreSQL NOTIFY/LISTEN or in-memory | 90% simpler |
| **Business Rules** | Drools | Custom Python classes/functions | 70% simpler |
| **Audit Log** | Separate WORM storage | PostgreSQL append-only table | 60% simpler |
---
## Detailed Simplifications
### 1. Workflow Engine
**Original:** Camunda or Temporal
- Separate service to run
- Complex BPMN diagrams
- Additional database
- Learning curve
**Simplified:** Custom Python State Machine
```python
# ~100 lines of code
class InitiativeWorkflow:
STATES = {
'DRAFT': ['SUBMITTED'],
'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
'UNIT_REVIEW': ['COUNCIL_REVIEW', 'REJECTED'],
'COUNCIL_REVIEW': ['APPROVED', 'REJECTED'],
'APPROVED': ['FINALIZED', 'APPEAL'],
'REJECTED': ['APPEAL'],
'APPEAL': ['APPROVED', 'REJECTED', 'FINALIZED'],
'FINALIZED': []
}
def can_transition(self, from_state, to_state, user_role):
return to_state in self.STATES.get(from_state, [])
```
**Savings:**
- No separate service
- No BPMN learning
- Easier to debug
- Version controlled
---
### 2. Document Storage
**Original:** AWS S3 / MinIO
- Separate service
- API calls for every operation
- Network latency
- Additional configuration
**Simplified:** Local Filesystem
```
/initiatives/
/{initiative_id}/
/forms/
/reviews/
/attachments/
```
**Savings:**
- Direct file access
- No API calls
- Simpler backup (copy folder)
- No network dependency
---
### 3. Search & Duplicate Detection
**Original:** Elasticsearch + ML Model (Sentence-BERT)
- Separate service
- Model training required
- Complex deployment
- Resource intensive
**Simplified:** PostgreSQL Full-Text + Trigram Similarity
```sql
-- Enable extensions
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Create index
CREATE INDEX idx_initiative_description_gin
ON initiatives USING gin(to_tsvector('english', description));
-- Similarity search
SELECT id, title,
similarity(description, 'search text') as score
FROM initiatives
WHERE similarity(description, 'search text') > 0.3
ORDER BY score DESC;
```
**Savings:**
- Built into PostgreSQL
- No model training
- No separate service
- Good enough for local scale
---
### 4. Event Bus
**Original:** Apache Kafka / RabbitMQ
- Separate service
- Complex configuration
- Message persistence
- Consumer groups
**Simplified:** PostgreSQL NOTIFY/LISTEN
```python
# Publisher
async def notify_event(event_type, data):
await db.execute(
"SELECT pg_notify('initiative_events', %s)",
json.dumps({'type': event_type, 'data': data})
)
# Listener
async def listen_events():
conn = await asyncpg.connect(...)
await conn.add_listener('initiative_events', handle_event)
```
**Savings:**
- No separate service
- Built into database
- Persistent (if needed)
- Simple pub/sub
---
### 5. Business Rules Engine
**Original:** Drools
- Java-based
- Separate rule files
- Complex syntax
- Additional dependency
**Simplified:** Python Functions/Classes
```python
class NoveltyChecker:
def check(self, initiative):
# Check similarity with existing
similar = self.find_similar(initiative)
if similar:
return ValidationResult(invalid=True, reason="Duplicate found")
return ValidationResult(valid=True)
class ScoringEngine:
def calculate(self, reviews):
scores = [r.score for r in reviews if r.score is not None]
if len(scores) == 0:
return None
return sum(scores) / len(scores)
```
**Savings:**
- Native Python
- Easy to test
- Version controlled
- No external engine
---
## Resource Usage Comparison
### Original Stack
- PostgreSQL: ~200MB RAM
- FastAPI: ~100MB RAM
- Elasticsearch: ~1GB RAM
- Kafka: ~500MB RAM
- MinIO: ~200MB RAM
- **Total: ~2GB RAM minimum**
### Simplified Stack
- PostgreSQL: ~200MB RAM
- FastAPI: ~100MB RAM
- **Total: ~300MB RAM**
**Savings: 85% less memory**
---
## Deployment Complexity
### Original Stack
```
docker-compose.yml:
- postgres
- fastapi
- elasticsearch
- kafka
- minio
- zookeeper (for Kafka)
Total: 6+ containers
```
### Simplified Stack
```
docker-compose.yml:
- postgres
- fastapi
Total: 2 containers
```
**Savings: 67% fewer services**
---
## Maintenance Effort
| Task | Original | Simplified | Time Saved |
|------|----------|------------|------------|
| Setup | 2-3 days | 2-3 hours | 90% |
| Debugging | Complex (multiple services) | Simple (2 services) | 70% |
| Updates | Multiple services | 2 services | 80% |
| Monitoring | Multiple dashboards | Single dashboard | 75% |
---
## When to Upgrade
Upgrade to original stack only if:
1. **Scale:** >10,000 initiatives/year
2. **Users:** >100 concurrent users
3. **Performance:** Response time >2s
4. **Distribution:** Multi-server deployment
5. **Advanced ML:** Need sophisticated NLP
For local application with typical load (<5,000 initiatives/year), simplified stack is optimal.
---
## Migration Path
If you need to scale later:
1. **Add Redis** for caching (if slow queries)
2. **Add Elasticsearch** for advanced search (if PostgreSQL search insufficient)
3. **Add RabbitMQ** for async processing (if need background jobs)
4. **Move to S3** for documents (if need cloud storage)
But start simple, scale when needed.