Files
sciagent/docs/ARCHITECTURE_REDESIGN.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

872 lines
29 KiB
Markdown

# Architecture Redesign Proposal
## Overview
This document outlines a comprehensive architectural redesign for the ProfytAI Compliance Management Platform, addressing critical issues identified in the current implementation.
## Design Principles
1. **Separation of Concerns**: Clear boundaries between layers
2. **Dependency Injection**: Loose coupling, easy testing
3. **Domain-Driven Design**: Business logic in domain layer
4. **Security First**: Authentication, authorization, input validation
5. **Testability**: All components should be easily testable
6. **Scalability**: Support for horizontal scaling
7. **Maintainability**: Clear structure, minimal complexity
---
## Proposed Architecture: Layered Architecture with Clean Architecture Principles
```
┌─────────────────────────────────────────────────────────┐
│ Presentation Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ API Routes │ │ Middleware │ │ WebSocket │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Services │ │ Use Cases │ │ DTOs │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Domain Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Entities │ │ Interfaces │ │ Value Obj. │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Repositories │ │ External │ │ Config │ │
│ │ │ │ Services │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
```
---
## New Directory Structure
```
be0/
├── src/
│ ├── api/ # API Layer
│ │ ├── __init__.py
│ │ ├── dependencies.py # Dependency injection
│ │ ├── middleware/
│ │ │ ├── __init__.py
│ │ │ ├── auth.py # Authentication middleware
│ │ │ ├── cors.py # CORS configuration
│ │ │ ├── rate_limit.py # Rate limiting
│ │ │ └── error_handler.py # Global error handling
│ │ ├── routes/
│ │ │ ├── __init__.py
│ │ │ ├── workflows.py # Workflow endpoints
│ │ │ ├── documents.py # Document endpoints
│ │ │ ├── compliance.py # Compliance endpoints
│ │ │ ├── health.py # Health check
│ │ │ └── auth.py # Authentication endpoints
│ │ └── schemas/ # Request/Response schemas
│ │ ├── __init__.py
│ │ ├── workflow.py
│ │ ├── document.py
│ │ └── compliance.py
│ │
│ ├── application/ # Application Layer
│ │ ├── __init__.py
│ │ ├── services/
│ │ │ ├── __init__.py
│ │ │ ├── workflow_service.py
│ │ │ ├── document_service.py
│ │ │ ├── compliance_service.py
│ │ │ └── ai_service.py
│ │ ├── use_cases/
│ │ │ ├── __init__.py
│ │ │ ├── create_workflow.py
│ │ │ ├── update_workflow_item.py
│ │ │ ├── analyze_compliance.py
│ │ │ └── process_document.py
│ │ └── dto/ # Data Transfer Objects
│ │ ├── __init__.py
│ │ ├── workflow_dto.py
│ │ └── compliance_dto.py
│ │
│ ├── domain/ # Domain Layer
│ │ ├── __init__.py
│ │ ├── entities/
│ │ │ ├── __init__.py
│ │ │ ├── workflow.py
│ │ │ ├── workflow_item.py
│ │ │ ├── document.py
│ │ │ └── compliance_rule.py
│ │ ├── value_objects/
│ │ │ ├── __init__.py
│ │ │ ├── task_status.py
│ │ │ └── workflow_phase.py
│ │ ├── interfaces/ # Repository interfaces
│ │ │ ├── __init__.py
│ │ │ ├── workflow_repository.py
│ │ │ ├── document_repository.py
│ │ │ └── compliance_repository.py
│ │ └── exceptions/
│ │ ├── __init__.py
│ │ ├── domain_exceptions.py
│ │ └── service_exceptions.py
│ │
│ ├── infrastructure/ # Infrastructure Layer
│ │ ├── __init__.py
│ │ ├── database/
│ │ │ ├── __init__.py
│ │ │ ├── connection.py # DB connection pool
│ │ │ ├── repositories/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── workflow_repository_impl.py
│ │ │ │ ├── document_repository_impl.py
│ │ │ │ └── neo4j_repository.py
│ │ │ └── migrations/
│ │ ├── external/
│ │ │ ├── __init__.py
│ │ │ ├── ollama_client.py # Ollama service client
│ │ │ └── storage/
│ │ │ ├── __init__.py
│ │ │ └── file_storage.py # File storage abstraction
│ │ ├── config/
│ │ │ ├── __init__.py
│ │ │ ├── settings.py # Pydantic settings
│ │ │ └── logging_config.py
│ │ └── security/
│ │ ├── __init__.py
│ │ ├── auth.py # JWT, password hashing
│ │ └── permissions.py
│ │
│ ├── core/ # Core utilities
│ │ ├── __init__.py
│ │ ├── logging.py
│ │ ├── exceptions.py
│ │ └── constants.py
│ │
│ └── main.py # Application entry point
├── tests/ # Test suite
│ ├── __init__.py
│ ├── unit/
│ │ ├── domain/
│ │ ├── application/
│ │ └── infrastructure/
│ ├── integration/
│ │ ├── api/
│ │ └── database/
│ ├── fixtures/
│ └── conftest.py
├── alembic/ # Database migrations
│ ├── versions/
│ └── env.py
├── requirements.txt
├── requirements-dev.txt
├── .env.example
└── Dockerfile
```
---
## Key Architectural Components
### 1. API Layer (Presentation)
**Purpose**: Handle HTTP requests, validate input, return responses
**Responsibilities**:
- Route definitions
- Request/Response serialization
- Input validation
- Authentication/Authorization checks
- Error handling
### 2. Application Layer
**Purpose**: Orchestrate business logic, coordinate between domain and infrastructure
**Responsibilities**:
- Use case implementation
- Service orchestration
- DTO transformation
- Transaction management
### 3. Domain Layer
**Purpose**: Core business logic, entities, and business rules
**Responsibilities**:
- Domain entities
- Business rules
- Value objects
- Domain events
- Repository interfaces (abstractions)
### 4. Infrastructure Layer
**Purpose**: External concerns - database, file system, external APIs
**Responsibilities**:
- Database access
- External API clients
- File storage
- Configuration
- Security implementation
---
## Implementation Examples
### Example 1: Configuration Management
```python
# infrastructure/config/settings.py
from pydantic_settings import BaseSettings
from typing import List
class Settings(BaseSettings):
# Application
app_name: str = "ProfytAI Compliance Platform"
app_version: str = "1.0.0"
debug: bool = False
# Server
host: str = "0.0.0.0"
port: int = 4402
# Database
neo4j_uri: str
neo4j_user: str
neo4j_password: str
# Security
secret_key: str
algorithm: str = "HS256"
access_token_expire_minutes: int = 30
cors_origins: List[str] = []
# AI/ML
ollama_base_url: str = "http://localhost:11434"
ollama_model: str = "gemma3:27b"
embedding_model: str = "embeddinggemma:300m"
# Storage
upload_dir: str = "./assets/data/uploads"
max_upload_size: int = 10 * 1024 * 1024 # 10MB
# Rate Limiting
rate_limit_per_minute: int = 60
class Config:
env_file = ".env"
case_sensitive = False
settings = Settings()
```
### Example 2: Domain Entity
```python
# domain/entities/workflow.py
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from uuid import UUID, uuid4
from domain.value_objects.task_status import TaskStatus
from domain.value_objects.workflow_phase import WorkflowPhase
@dataclass
class WorkflowItem:
id: int
task: str
status: TaskStatus
requires_approval: bool
approver: Optional[str] = None
comment: Optional[str] = None
updated_by: Optional[str] = None
updated_at: Optional[datetime] = None
@dataclass
class Workflow:
id: UUID
project_name: str
project_description: Optional[str]
records_officer_email: Optional[str]
current_phase: WorkflowPhase
checklist_items: List[WorkflowItem] = field(default_factory=list)
completed_items: List[int] = field(default_factory=list)
pending_approvals: List[str] = field(default_factory=list)
comments: dict = field(default_factory=dict)
validation_results: dict = field(default_factory=dict)
created_at: datetime = field(default_factory=datetime.utcnow)
updated_at: datetime = field(default_factory=datetime.utcnow)
def add_item(self, item: WorkflowItem) -> None:
"""Add a checklist item to the workflow."""
self.checklist_items.append(item)
self.updated_at = datetime.utcnow()
def update_item_status(
self,
item_id: int,
status: TaskStatus,
updated_by: str,
comment: Optional[str] = None
) -> None:
"""Update the status of a workflow item."""
item = next((i for i in self.checklist_items if i.id == item_id), None)
if not item:
raise ValueError(f"Item {item_id} not found")
item.status = status
item.updated_by = updated_by
item.updated_at = datetime.utcnow()
if comment:
item.comment = comment
if status == TaskStatus.COMPLETED and item_id not in self.completed_items:
self.completed_items.append(item_id)
self.updated_at = datetime.utcnow()
def can_advance_phase(self) -> bool:
"""Check if workflow can advance to next phase."""
all_completed = all(
item.status == TaskStatus.COMPLETED
for item in self.checklist_items
)
no_pending_approvals = len(self.pending_approvals) == 0
return all_completed and no_pending_approvals
@property
def completion_percentage(self) -> float:
"""Calculate completion percentage."""
if not self.checklist_items:
return 0.0
completed = len(self.completed_items)
total = len(self.checklist_items)
return (completed / total) * 100
```
### Example 3: Repository Interface (Domain)
```python
# domain/interfaces/workflow_repository.py
from abc import ABC, abstractmethod
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow
class IWorkflowRepository(ABC):
"""Repository interface for workflow persistence."""
@abstractmethod
async def create(self, workflow: Workflow) -> Workflow:
"""Create a new workflow."""
pass
@abstractmethod
async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]:
"""Get workflow by ID."""
pass
@abstractmethod
async def get_all(self, skip: int = 0, limit: int = 100) -> List[Workflow]:
"""Get all workflows with pagination."""
pass
@abstractmethod
async def update(self, workflow: Workflow) -> Workflow:
"""Update an existing workflow."""
pass
@abstractmethod
async def delete(self, workflow_id: UUID) -> bool:
"""Delete a workflow."""
pass
```
### Example 4: Repository Implementation (Infrastructure)
```python
# infrastructure/database/repositories/workflow_repository_impl.py
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow
from domain.interfaces.workflow_repository import IWorkflowRepository
from infrastructure.database.connection import get_db_session
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select
class WorkflowRepository(IWorkflowRepository):
"""Neo4j implementation of workflow repository."""
def __init__(self, session: AsyncSession):
self.session = session
async def create(self, workflow: Workflow) -> Workflow:
"""Create workflow in Neo4j."""
query = """
CREATE (w:Workflow {
id: $id,
project_name: $project_name,
project_description: $project_description,
records_officer_email: $records_officer_email,
current_phase: $current_phase,
created_at: $created_at,
updated_at: $updated_at
})
RETURN w
"""
# Implementation details...
return workflow
async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]:
"""Get workflow by ID from Neo4j."""
query = """
MATCH (w:Workflow {id: $workflow_id})
OPTIONAL MATCH (w)-[:HAS_ITEM]->(i:WorkflowItem)
RETURN w, collect(i) as items
"""
# Implementation details...
pass
# ... other methods
```
### Example 5: Service Layer
```python
# application/services/workflow_service.py
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow, WorkflowItem
from domain.interfaces.workflow_repository import IWorkflowRepository
from domain.value_objects.workflow_phase import WorkflowPhase
from domain.value_objects.task_status import TaskStatus
from domain.exceptions.domain_exceptions import WorkflowNotFoundError
class WorkflowService:
"""Service for workflow business logic."""
def __init__(self, workflow_repository: IWorkflowRepository):
self.workflow_repository = workflow_repository
async def create_workflow(
self,
project_name: str,
project_description: Optional[str],
records_officer_email: Optional[str]
) -> Workflow:
"""Create a new workflow with initial phase."""
workflow = Workflow(
id=UUID(),
project_name=project_name,
project_description=project_description,
records_officer_email=records_officer_email,
current_phase=WorkflowPhase.CONCEPT_DEVELOPMENT
)
# Initialize Phase 1 items
phase1_items = self._get_phase1_items()
for item in phase1_items:
workflow.add_item(item)
return await self.workflow_repository.create(workflow)
async def get_workflow(self, workflow_id: UUID) -> Workflow:
"""Get workflow by ID."""
workflow = await self.workflow_repository.get_by_id(workflow_id)
if not workflow:
raise WorkflowNotFoundError(f"Workflow {workflow_id} not found")
return workflow
async def update_workflow_item(
self,
workflow_id: UUID,
item_id: int,
status: TaskStatus,
updated_by: str,
comment: Optional[str] = None
) -> Workflow:
"""Update a workflow item."""
workflow = await self.get_workflow(workflow_id)
workflow.update_item_status(item_id, status, updated_by, comment)
return await self.workflow_repository.update(workflow)
async def advance_workflow(self, workflow_id: UUID) -> Workflow:
"""Advance workflow to next phase."""
workflow = await self.get_workflow(workflow_id)
if not workflow.can_advance_phase():
raise ValueError("Cannot advance: Phase requirements not met")
# Advance to next phase logic...
return await self.workflow_repository.update(workflow)
def _get_phase1_items(self) -> List[WorkflowItem]:
"""Get Phase 1 checklist items."""
return [
WorkflowItem(
id=1,
task="Include Records Officer in system design process",
status=TaskStatus.PENDING,
requires_approval=True,
approver="Records Officer"
),
# ... more items
]
```
### Example 6: API Route with Dependency Injection
```python
# api/routes/workflows.py
from fastapi import APIRouter, Depends, HTTPException, status
from uuid import UUID
from typing import List
from api.schemas.workflow import (
WorkflowCreateRequest,
WorkflowResponse,
WorkflowItemUpdateRequest
)
from application.services.workflow_service import WorkflowService
from api.dependencies import get_workflow_service, get_current_user
from domain.value_objects.task_status import TaskStatus
router = APIRouter(prefix="/workflows", tags=["workflows"])
@router.post("", response_model=WorkflowResponse, status_code=status.HTTP_201_CREATED)
async def create_workflow(
request: WorkflowCreateRequest,
workflow_service: WorkflowService = Depends(get_workflow_service),
current_user = Depends(get_current_user)
):
"""Create a new workflow."""
try:
workflow = await workflow_service.create_workflow(
project_name=request.project_name,
project_description=request.project_description,
records_officer_email=request.records_officer_email
)
return WorkflowResponse.from_entity(workflow)
except Exception as e:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)
@router.get("/{workflow_id}", response_model=WorkflowResponse)
async def get_workflow(
workflow_id: UUID,
workflow_service: WorkflowService = Depends(get_workflow_service),
current_user = Depends(get_current_user)
):
"""Get workflow by ID."""
try:
workflow = await workflow_service.get_workflow(workflow_id)
return WorkflowResponse.from_entity(workflow)
except WorkflowNotFoundError:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Workflow not found"
)
@router.put("/{workflow_id}/items", response_model=WorkflowResponse)
async def update_workflow_item(
workflow_id: UUID,
request: WorkflowItemUpdateRequest,
workflow_service: WorkflowService = Depends(get_workflow_service),
current_user = Depends(get_current_user)
):
"""Update a workflow item."""
try:
workflow = await workflow_service.update_workflow_item(
workflow_id=workflow_id,
item_id=request.item_id,
status=TaskStatus(request.status),
updated_by=current_user.email,
comment=request.comment
)
return WorkflowResponse.from_entity(workflow)
except WorkflowNotFoundError:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Workflow not found"
)
```
### Example 7: Dependency Injection Setup
```python
# api/dependencies.py
from functools import lru_cache
from infrastructure.database.connection import get_db_session
from infrastructure.database.repositories.workflow_repository_impl import WorkflowRepository
from application.services.workflow_service import WorkflowService
from infrastructure.external.ollama_client import OllamaClient
from application.services.compliance_service import ComplianceService
from infrastructure.config.settings import settings
# Repository dependencies
async def get_workflow_repository():
async for session in get_db_session():
yield WorkflowRepository(session)
# Service dependencies
def get_workflow_service(
workflow_repo: WorkflowRepository = Depends(get_workflow_repository)
) -> WorkflowService:
return WorkflowService(workflow_repo)
def get_compliance_service() -> ComplianceService:
ollama_client = OllamaClient(
base_url=settings.ollama_base_url,
model=settings.ollama_model
)
return ComplianceService(ollama_client)
# Auth dependencies
async def get_current_user(
token: str = Depends(oauth2_scheme)
):
# JWT validation logic
pass
```
### Example 8: Main Application Setup
```python
# main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from infrastructure.config.settings import settings
from infrastructure.config.logging_config import setup_logging
from api.middleware.error_handler import setup_exception_handlers
from api.middleware.cors import setup_cors
from api.routes import workflows, documents, compliance, health, auth
# Setup logging
setup_logging()
# Create FastAPI app
app = FastAPI(
title=settings.app_name,
version=settings.app_version,
debug=settings.debug
)
# Setup middleware
setup_cors(app, settings.cors_origins)
setup_exception_handlers(app)
# Include routers
app.include_router(auth.router)
app.include_router(workflows.router)
app.include_router(documents.router)
app.include_router(compliance.router)
app.include_router(health.router)
@app.on_event("startup")
async def startup_event():
"""Initialize services on startup."""
# Initialize database connections
# Initialize external services
pass
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup on shutdown."""
# Close database connections
# Cleanup resources
pass
```
---
## Security Improvements
### 1. Authentication & Authorization
```python
# infrastructure/security/auth.py
from datetime import datetime, timedelta
from typing import Optional
from jose import JWTError, jwt
from passlib.context import CryptContext
from infrastructure.config.settings import settings
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def verify_password(plain_password: str, hashed_password: str) -> bool:
"""Verify a password against a hash."""
return pwd_context.verify(plain_password, hashed_password)
def get_password_hash(password: str) -> str:
"""Hash a password."""
return pwd_context.hash(password)
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
"""Create JWT access token."""
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(
minutes=settings.access_token_expire_minutes
)
to_encode.update({"exp": expire})
encoded_jwt = jwt.encode(
to_encode,
settings.secret_key,
algorithm=settings.algorithm
)
return encoded_jwt
```
### 2. CORS Configuration
```python
# api/middleware/cors.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from typing import List
def setup_cors(app: FastAPI, allowed_origins: List[str]):
"""Configure CORS middleware."""
app.add_middleware(
CORSMiddleware,
allow_origins=allowed_origins, # Specific origins, not "*"
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH"],
allow_headers=["Content-Type", "Authorization"],
)
```
### 3. Rate Limiting
```python
# api/middleware/rate_limit.py
from fastapi import Request, HTTPException, status
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)
@router.post("")
@limiter.limit("10/minute") # 10 requests per minute
async def create_workflow(request: Request, ...):
# Implementation
pass
```
---
## Testing Structure
```python
# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from main import app
from infrastructure.database.connection import get_test_db
@pytest.fixture
def client():
return TestClient(app)
@pytest.fixture
def test_db():
# Setup test database
yield
# Teardown
# tests/unit/application/services/test_workflow_service.py
import pytest
from uuid import UUID
from application.services.workflow_service import WorkflowService
from domain.entities.workflow import Workflow
@pytest.mark.asyncio
async def test_create_workflow():
# Mock repository
mock_repo = MockWorkflowRepository()
service = WorkflowService(mock_repo)
workflow = await service.create_workflow(
project_name="Test Project",
project_description="Test Description",
records_officer_email="test@example.com"
)
assert workflow.project_name == "Test Project"
assert workflow.current_phase == WorkflowPhase.CONCEPT_DEVELOPMENT
```
---
## Migration Strategy
### Phase 1: Foundation (Week 1-2)
1. Create new directory structure
2. Set up configuration management
3. Implement dependency injection
4. Set up database connection
### Phase 2: Domain Layer (Week 3)
1. Create domain entities
2. Define repository interfaces
3. Implement value objects
### Phase 3: Infrastructure (Week 4)
1. Implement repository classes
2. Set up external service clients
3. Configure security
### Phase 4: Application Layer (Week 5)
1. Create service classes
2. Implement use cases
3. Create DTOs
### Phase 5: API Layer (Week 6)
1. Create route modules
2. Implement middleware
3. Set up error handling
### Phase 6: Testing & Migration (Week 7-8)
1. Write unit tests
2. Write integration tests
3. Migrate existing endpoints
4. Deploy and monitor
---
## Benefits of This Architecture
1. **Testability**: Each layer can be tested independently
2. **Maintainability**: Clear separation of concerns
3. **Scalability**: Easy to add new features
4. **Security**: Built-in security at every layer
5. **Flexibility**: Easy to swap implementations (e.g., different databases)
6. **Team Collaboration**: Different teams can work on different layers
---
## Next Steps
1. Review and approve this architecture
2. Create detailed implementation plan
3. Set up project structure
4. Begin Phase 1 implementation
5. Establish coding standards and review process