sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,343 @@
|
||||
# Simplified Tech Stack for Local Governance Layer
|
||||
|
||||
## Analysis & Simplification Strategy
|
||||
|
||||
### Key Observations
|
||||
1. **Local Application Context**: Single-server deployment, not distributed
|
||||
2. **Existing Stack**: Already using FastAPI + PostgreSQL
|
||||
3. **Complexity Overkill**: Enterprise tools (Kafka, Camunda, Elasticsearch) are unnecessary for local deployment
|
||||
4. **Core Needs**: State machine, rules engine, document storage, audit logging
|
||||
|
||||
---
|
||||
|
||||
## Simplified Tech Stack Recommendation
|
||||
|
||||
### ✅ **Core Stack (Keep These)**
|
||||
|
||||
| Component | Technology | Rationale |
|
||||
|-----------|-----------|-----------|
|
||||
| **Database** | PostgreSQL 15+ | ✅ Already in use, supports JSONB, excellent for local deployment |
|
||||
| **API Framework** | FastAPI (Python) | ✅ Already in use, fast, async, great for this use case |
|
||||
| **Document Storage** | Local filesystem + PostgreSQL (metadata) | ✅ Simple, no external service needed |
|
||||
| **Business Rules** | Custom Python classes/functions | ✅ Lightweight, maintainable, no external engine needed |
|
||||
|
||||
### 🔄 **Replace Complex Components**
|
||||
|
||||
| Original Suggestion | Simplified Alternative | Why |
|
||||
|-------------------|----------------------|-----|
|
||||
| **Camunda/Temporal** | Custom state machine (Python) | Simple workflow states, no need for enterprise orchestration |
|
||||
| **Elasticsearch + ML** | PostgreSQL full-text search + `pg_trgm` (trigram similarity) | Built-in, sufficient for duplicate detection |
|
||||
| **Apache Kafka/RabbitMQ** | PostgreSQL NOTIFY/LISTEN or in-memory event queue | Simple pub/sub, no separate service |
|
||||
| **AWS S3/MinIO** | Local filesystem with organized folders | Direct file storage, simpler for local |
|
||||
| **Drools** | Python rule functions/classes | More maintainable, easier to debug |
|
||||
|
||||
---
|
||||
|
||||
## Recommended Simplified Architecture
|
||||
|
||||
### 1. **Database Layer**
|
||||
```python
|
||||
# Single PostgreSQL database with:
|
||||
- Core tables (initiatives, authors, reviews, etc.)
|
||||
- JSONB columns for flexible metadata
|
||||
- Full-text search indexes (GIN indexes on text fields)
|
||||
- pg_trgm extension for similarity matching
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- No additional services
|
||||
- ACID compliance
|
||||
- Built-in full-text search
|
||||
- Trigram similarity for duplicate detection
|
||||
|
||||
### 2. **Business Rules Engine**
|
||||
```python
|
||||
# Custom Python classes
|
||||
class NoveltyChecker:
|
||||
def check(self, initiative: Initiative) -> ValidationResult
|
||||
|
||||
class ScoringEngine:
|
||||
def calculate_score(self, reviews: List[Review]) -> Score
|
||||
|
||||
class WorkflowStateMachine:
|
||||
def transition(self, initiative: Initiative, action: str) -> State
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Easy to test and debug
|
||||
- No external dependencies
|
||||
- Version control friendly
|
||||
- Can be extended incrementally
|
||||
|
||||
### 3. **Workflow Engine**
|
||||
```python
|
||||
# Simple state machine
|
||||
class InitiativeWorkflow:
|
||||
STATES = ['DRAFT', 'SUBMITTED', 'UNIT_REVIEW', ...]
|
||||
TRANSITIONS = {
|
||||
'DRAFT': ['SUBMITTED'],
|
||||
'SUBMITTED': ['UNIT_REVIEW', 'REJECTED'],
|
||||
...
|
||||
}
|
||||
|
||||
def can_transition(self, from_state, to_state, user_role):
|
||||
# Check permissions and business rules
|
||||
pass
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- No external workflow engine
|
||||
- Easy to understand and modify
|
||||
- Can store state in database
|
||||
- Lightweight
|
||||
|
||||
### 4. **Document Storage**
|
||||
```python
|
||||
# Local filesystem structure
|
||||
/initiatives/
|
||||
/{initiative_id}/
|
||||
/forms/
|
||||
form_01_v1.pdf
|
||||
form_03_v1.pdf
|
||||
/reviews/
|
||||
review_001.pdf
|
||||
/attachments/
|
||||
evidence_001.pdf
|
||||
|
||||
# Metadata in PostgreSQL
|
||||
CREATE TABLE document_metadata (
|
||||
id UUID PRIMARY KEY,
|
||||
initiative_id UUID REFERENCES initiatives(id),
|
||||
file_path TEXT,
|
||||
form_type VARCHAR(50),
|
||||
version INT,
|
||||
uploaded_by UUID,
|
||||
uploaded_at TIMESTAMP,
|
||||
checksum VARCHAR(64)
|
||||
);
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- No object storage service needed
|
||||
- Easy backup (just copy folder)
|
||||
- Direct file access
|
||||
- Simple versioning
|
||||
|
||||
### 5. **Duplicate Detection**
|
||||
```sql
|
||||
-- Use PostgreSQL trigram similarity
|
||||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
|
||||
-- Similarity query
|
||||
SELECT
|
||||
i1.id,
|
||||
i1.title,
|
||||
similarity(i1.description, i2.description) as sim_score
|
||||
FROM initiatives i1
|
||||
CROSS JOIN initiatives i2
|
||||
WHERE i1.id != i2.id
|
||||
AND similarity(i1.description, i2.description) > 0.7
|
||||
ORDER BY sim_score DESC;
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Built into PostgreSQL
|
||||
- No ML model training needed
|
||||
- Fast enough for local scale
|
||||
- Can be enhanced with custom logic
|
||||
|
||||
### 6. **Event System**
|
||||
```python
|
||||
# Simple in-memory event dispatcher
|
||||
class EventDispatcher:
|
||||
def __init__(self):
|
||||
self.listeners = {}
|
||||
|
||||
def subscribe(self, event_type, callback):
|
||||
if event_type not in self.listeners:
|
||||
self.listeners[event_type] = []
|
||||
self.listeners[event_type].append(callback)
|
||||
|
||||
def emit(self, event_type, data):
|
||||
for callback in self.listeners.get(event_type, []):
|
||||
callback(data)
|
||||
|
||||
# Or use PostgreSQL NOTIFY/LISTEN for persistence
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- No message broker needed
|
||||
- Simple pub/sub pattern
|
||||
- Can persist events to database if needed
|
||||
- Easy to add email notifications
|
||||
|
||||
### 7. **Audit Logging**
|
||||
```sql
|
||||
-- Simple append-only table
|
||||
CREATE TABLE audit_log (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
initiative_id UUID,
|
||||
actor_id UUID,
|
||||
action VARCHAR(100),
|
||||
timestamp TIMESTAMP DEFAULT NOW(),
|
||||
previous_state JSONB,
|
||||
new_state JSONB,
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
CREATE INDEX idx_audit_initiative ON audit_log(initiative_id);
|
||||
CREATE INDEX idx_audit_timestamp ON audit_log(timestamp);
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- No separate audit system
|
||||
- Queryable with SQL
|
||||
- Can export for compliance
|
||||
- Simple to implement
|
||||
|
||||
---
|
||||
|
||||
## Complete Simplified Stack
|
||||
|
||||
### **Backend**
|
||||
```
|
||||
FastAPI (Python)
|
||||
├── Database: PostgreSQL 15+
|
||||
│ ├── Core tables (initiatives, authors, reviews, etc.)
|
||||
│ ├── JSONB for flexible data
|
||||
│ ├── Full-text search (GIN indexes)
|
||||
│ ├── Trigram similarity (pg_trgm)
|
||||
│ └── Audit log table
|
||||
├── Business Logic: Custom Python classes
|
||||
│ ├── NoveltyChecker
|
||||
│ ├── ScoringEngine
|
||||
│ ├── WorkflowStateMachine
|
||||
│ └── DuplicateDetector
|
||||
├── Document Storage: Local filesystem
|
||||
│ └── Organized folder structure
|
||||
├── Event System: In-memory dispatcher + PostgreSQL NOTIFY
|
||||
└── API: FastAPI REST endpoints
|
||||
```
|
||||
|
||||
### **Frontend** (Already in place)
|
||||
```
|
||||
React + TypeScript
|
||||
├── Feature-based architecture
|
||||
├── React Query for data fetching
|
||||
└── Existing UI components
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### **Phase 1: Core Foundation** (Week 1-2)
|
||||
1. ✅ Database schema (PostgreSQL)
|
||||
2. ✅ Basic CRUD APIs (FastAPI)
|
||||
3. ✅ Document upload/storage (local filesystem)
|
||||
4. ✅ Basic state machine (Python class)
|
||||
|
||||
### **Phase 2: Business Rules** (Week 3-4)
|
||||
1. ✅ Novelty checking (PostgreSQL similarity)
|
||||
2. ✅ Author contribution validation
|
||||
3. ✅ Scoring algorithm (Group 01)
|
||||
4. ✅ Auto-classification (Group 02)
|
||||
|
||||
### **Phase 3: Workflow & Notifications** (Week 5-6)
|
||||
1. ✅ Complete state machine transitions
|
||||
2. ✅ Deadline tracking & alerts
|
||||
3. ✅ Email notifications (SMTP)
|
||||
4. ✅ Duplicate detection & mediation
|
||||
|
||||
### **Phase 4: Advanced Features** (Week 7-8)
|
||||
1. ✅ Reporting & analytics
|
||||
2. ✅ Audit trail queries
|
||||
3. ✅ Role-based permissions
|
||||
4. ✅ Appeal workflow
|
||||
|
||||
---
|
||||
|
||||
## Technology Comparison
|
||||
|
||||
### **Original Stack Complexity**
|
||||
- 8+ services to manage
|
||||
- External dependencies (Kafka, Elasticsearch, S3)
|
||||
- Complex deployment
|
||||
- Higher resource usage
|
||||
- Steeper learning curve
|
||||
|
||||
### **Simplified Stack**
|
||||
- 2 services (FastAPI + PostgreSQL)
|
||||
- Minimal external dependencies
|
||||
- Simple deployment
|
||||
- Lower resource usage
|
||||
- Easier to maintain
|
||||
|
||||
---
|
||||
|
||||
## When to Scale Up
|
||||
|
||||
Consider adding complexity only if:
|
||||
- **>10,000 initiatives/year**: Add Elasticsearch for search
|
||||
- **>100 concurrent users**: Add Redis for caching
|
||||
- **Multi-server deployment**: Add message queue (RabbitMQ)
|
||||
- **Advanced ML needed**: Add dedicated ML service
|
||||
- **Cloud deployment**: Use S3 for documents
|
||||
|
||||
For local application with <5,000 initiatives/year, simplified stack is sufficient.
|
||||
|
||||
---
|
||||
|
||||
## Code Structure Example
|
||||
|
||||
```
|
||||
be0/
|
||||
├── src/
|
||||
│ ├── domain/
|
||||
│ │ ├── entities/
|
||||
│ │ │ ├── initiative.py
|
||||
│ │ │ ├── author.py
|
||||
│ │ │ └── review.py
|
||||
│ │ └── rules/
|
||||
│ │ ├── novelty_checker.py
|
||||
│ │ ├── scoring_engine.py
|
||||
│ │ └── duplicate_detector.py
|
||||
│ ├── application/
|
||||
│ │ ├── services/
|
||||
│ │ │ ├── workflow_service.py
|
||||
│ │ │ └── notification_service.py
|
||||
│ │ └── state_machine.py
|
||||
│ ├── infrastructure/
|
||||
│ │ ├── database/
|
||||
│ │ │ └── models.py
|
||||
│ │ ├── storage/
|
||||
│ │ │ └── file_storage.py
|
||||
│ │ └── events/
|
||||
│ │ └── dispatcher.py
|
||||
│ └── api/
|
||||
│ └── routes/
|
||||
│ └── initiatives.py
|
||||
└── storage/
|
||||
└── documents/
|
||||
└── initiatives/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Simplified Stack:**
|
||||
- ✅ PostgreSQL (database + search + similarity)
|
||||
- ✅ FastAPI (API framework)
|
||||
- ✅ Python (business rules + workflow)
|
||||
- ✅ Local filesystem (document storage)
|
||||
- ✅ In-memory events (or PostgreSQL NOTIFY)
|
||||
|
||||
**Removed:**
|
||||
- ❌ Camunda/Temporal (use custom state machine)
|
||||
- ❌ Elasticsearch (use PostgreSQL full-text search)
|
||||
- ❌ Kafka/RabbitMQ (use simple event dispatcher)
|
||||
- ❌ S3/MinIO (use local filesystem)
|
||||
- ❌ Drools (use Python functions)
|
||||
|
||||
**Result:** Simpler, easier to maintain, sufficient for local deployment, can scale up later if needed.
|
||||
Reference in New Issue
Block a user