Files
sciagent/docs/ARCHITECTURE_REDESIGN.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

29 KiB

Architecture Redesign Proposal

Overview

This document outlines a comprehensive architectural redesign for the ProfytAI Compliance Management Platform, addressing critical issues identified in the current implementation.

Design Principles

  1. Separation of Concerns: Clear boundaries between layers
  2. Dependency Injection: Loose coupling, easy testing
  3. Domain-Driven Design: Business logic in domain layer
  4. Security First: Authentication, authorization, input validation
  5. Testability: All components should be easily testable
  6. Scalability: Support for horizontal scaling
  7. Maintainability: Clear structure, minimal complexity

Proposed Architecture: Layered Architecture with Clean Architecture Principles

┌─────────────────────────────────────────────────────────┐
│                    Presentation Layer                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   API Routes │  │  Middleware  │  │   WebSocket  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────────┐
│                    Application Layer                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   Services   │  │   Use Cases  │  │   DTOs        │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────────┐
│                      Domain Layer                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   Entities   │  │  Interfaces  │  │  Value Obj.  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────────┐
│                  Infrastructure Layer                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Repositories │  │   External   │  │   Config     │  │
│  │              │  │   Services   │  │              │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘

New Directory Structure

be0/
├── src/
│   ├── api/                          # API Layer
│   │   ├── __init__.py
│   │   ├── dependencies.py          # Dependency injection
│   │   ├── middleware/
│   │   │   ├── __init__.py
│   │   │   ├── auth.py              # Authentication middleware
│   │   │   ├── cors.py              # CORS configuration
│   │   │   ├── rate_limit.py        # Rate limiting
│   │   │   └── error_handler.py    # Global error handling
│   │   ├── routes/
│   │   │   ├── __init__.py
│   │   │   ├── workflows.py         # Workflow endpoints
│   │   │   ├── documents.py         # Document endpoints
│   │   │   ├── compliance.py        # Compliance endpoints
│   │   │   ├── health.py           # Health check
│   │   │   └── auth.py             # Authentication endpoints
│   │   └── schemas/                 # Request/Response schemas
│   │       ├── __init__.py
│   │       ├── workflow.py
│   │       ├── document.py
│   │       └── compliance.py
│   │
│   ├── application/                 # Application Layer
│   │   ├── __init__.py
│   │   ├── services/
│   │   │   ├── __init__.py
│   │   │   ├── workflow_service.py
│   │   │   ├── document_service.py
│   │   │   ├── compliance_service.py
│   │   │   └── ai_service.py
│   │   ├── use_cases/
│   │   │   ├── __init__.py
│   │   │   ├── create_workflow.py
│   │   │   ├── update_workflow_item.py
│   │   │   ├── analyze_compliance.py
│   │   │   └── process_document.py
│   │   └── dto/                     # Data Transfer Objects
│   │       ├── __init__.py
│   │       ├── workflow_dto.py
│   │       └── compliance_dto.py
│   │
│   ├── domain/                      # Domain Layer
│   │   ├── __init__.py
│   │   ├── entities/
│   │   │   ├── __init__.py
│   │   │   ├── workflow.py
│   │   │   ├── workflow_item.py
│   │   │   ├── document.py
│   │   │   └── compliance_rule.py
│   │   ├── value_objects/
│   │   │   ├── __init__.py
│   │   │   ├── task_status.py
│   │   │   └── workflow_phase.py
│   │   ├── interfaces/              # Repository interfaces
│   │   │   ├── __init__.py
│   │   │   ├── workflow_repository.py
│   │   │   ├── document_repository.py
│   │   │   └── compliance_repository.py
│   │   └── exceptions/
│   │       ├── __init__.py
│   │       ├── domain_exceptions.py
│   │       └── service_exceptions.py
│   │
│   ├── infrastructure/              # Infrastructure Layer
│   │   ├── __init__.py
│   │   ├── database/
│   │   │   ├── __init__.py
│   │   │   ├── connection.py       # DB connection pool
│   │   │   ├── repositories/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── workflow_repository_impl.py
│   │   │   │   ├── document_repository_impl.py
│   │   │   │   └── neo4j_repository.py
│   │   │   └── migrations/
│   │   ├── external/
│   │   │   ├── __init__.py
│   │   │   ├── ollama_client.py    # Ollama service client
│   │   │   └── storage/
│   │   │       ├── __init__.py
│   │   │       └── file_storage.py  # File storage abstraction
│   │   ├── config/
│   │   │   ├── __init__.py
│   │   │   ├── settings.py          # Pydantic settings
│   │   │   └── logging_config.py
│   │   └── security/
│   │       ├── __init__.py
│   │       ├── auth.py              # JWT, password hashing
│   │       └── permissions.py
│   │
│   ├── core/                        # Core utilities
│   │   ├── __init__.py
│   │   ├── logging.py
│   │   ├── exceptions.py
│   │   └── constants.py
│   │
│   └── main.py                      # Application entry point
│
├── tests/                           # Test suite
│   ├── __init__.py
│   ├── unit/
│   │   ├── domain/
│   │   ├── application/
│   │   └── infrastructure/
│   ├── integration/
│   │   ├── api/
│   │   └── database/
│   ├── fixtures/
│   └── conftest.py
│
├── alembic/                         # Database migrations
│   ├── versions/
│   └── env.py
│
├── requirements.txt
├── requirements-dev.txt
├── .env.example
└── Dockerfile

Key Architectural Components

1. API Layer (Presentation)

Purpose: Handle HTTP requests, validate input, return responses

Responsibilities:

  • Route definitions
  • Request/Response serialization
  • Input validation
  • Authentication/Authorization checks
  • Error handling

2. Application Layer

Purpose: Orchestrate business logic, coordinate between domain and infrastructure

Responsibilities:

  • Use case implementation
  • Service orchestration
  • DTO transformation
  • Transaction management

3. Domain Layer

Purpose: Core business logic, entities, and business rules

Responsibilities:

  • Domain entities
  • Business rules
  • Value objects
  • Domain events
  • Repository interfaces (abstractions)

4. Infrastructure Layer

Purpose: External concerns - database, file system, external APIs

Responsibilities:

  • Database access
  • External API clients
  • File storage
  • Configuration
  • Security implementation

Implementation Examples

Example 1: Configuration Management

# infrastructure/config/settings.py
from pydantic_settings import BaseSettings
from typing import List

class Settings(BaseSettings):
    # Application
    app_name: str = "ProfytAI Compliance Platform"
    app_version: str = "1.0.0"
    debug: bool = False
    
    # Server
    host: str = "0.0.0.0"
    port: int = 4402
    
    # Database
    neo4j_uri: str
    neo4j_user: str
    neo4j_password: str
    
    # Security
    secret_key: str
    algorithm: str = "HS256"
    access_token_expire_minutes: int = 30
    cors_origins: List[str] = []
    
    # AI/ML
    ollama_base_url: str = "http://localhost:11434"
    ollama_model: str = "gemma3:27b"
    embedding_model: str = "embeddinggemma:300m"
    
    # Storage
    upload_dir: str = "./assets/data/uploads"
    max_upload_size: int = 10 * 1024 * 1024  # 10MB
    
    # Rate Limiting
    rate_limit_per_minute: int = 60
    
    class Config:
        env_file = ".env"
        case_sensitive = False

settings = Settings()

Example 2: Domain Entity

# domain/entities/workflow.py
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from uuid import UUID, uuid4
from domain.value_objects.task_status import TaskStatus
from domain.value_objects.workflow_phase import WorkflowPhase

@dataclass
class WorkflowItem:
    id: int
    task: str
    status: TaskStatus
    requires_approval: bool
    approver: Optional[str] = None
    comment: Optional[str] = None
    updated_by: Optional[str] = None
    updated_at: Optional[datetime] = None

@dataclass
class Workflow:
    id: UUID
    project_name: str
    project_description: Optional[str]
    records_officer_email: Optional[str]
    current_phase: WorkflowPhase
    checklist_items: List[WorkflowItem] = field(default_factory=list)
    completed_items: List[int] = field(default_factory=list)
    pending_approvals: List[str] = field(default_factory=list)
    comments: dict = field(default_factory=dict)
    validation_results: dict = field(default_factory=dict)
    created_at: datetime = field(default_factory=datetime.utcnow)
    updated_at: datetime = field(default_factory=datetime.utcnow)
    
    def add_item(self, item: WorkflowItem) -> None:
        """Add a checklist item to the workflow."""
        self.checklist_items.append(item)
        self.updated_at = datetime.utcnow()
    
    def update_item_status(
        self, 
        item_id: int, 
        status: TaskStatus, 
        updated_by: str,
        comment: Optional[str] = None
    ) -> None:
        """Update the status of a workflow item."""
        item = next((i for i in self.checklist_items if i.id == item_id), None)
        if not item:
            raise ValueError(f"Item {item_id} not found")
        
        item.status = status
        item.updated_by = updated_by
        item.updated_at = datetime.utcnow()
        if comment:
            item.comment = comment
        
        if status == TaskStatus.COMPLETED and item_id not in self.completed_items:
            self.completed_items.append(item_id)
        
        self.updated_at = datetime.utcnow()
    
    def can_advance_phase(self) -> bool:
        """Check if workflow can advance to next phase."""
        all_completed = all(
            item.status == TaskStatus.COMPLETED 
            for item in self.checklist_items
        )
        no_pending_approvals = len(self.pending_approvals) == 0
        return all_completed and no_pending_approvals
    
    @property
    def completion_percentage(self) -> float:
        """Calculate completion percentage."""
        if not self.checklist_items:
            return 0.0
        completed = len(self.completed_items)
        total = len(self.checklist_items)
        return (completed / total) * 100

Example 3: Repository Interface (Domain)

# domain/interfaces/workflow_repository.py
from abc import ABC, abstractmethod
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow

class IWorkflowRepository(ABC):
    """Repository interface for workflow persistence."""
    
    @abstractmethod
    async def create(self, workflow: Workflow) -> Workflow:
        """Create a new workflow."""
        pass
    
    @abstractmethod
    async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]:
        """Get workflow by ID."""
        pass
    
    @abstractmethod
    async def get_all(self, skip: int = 0, limit: int = 100) -> List[Workflow]:
        """Get all workflows with pagination."""
        pass
    
    @abstractmethod
    async def update(self, workflow: Workflow) -> Workflow:
        """Update an existing workflow."""
        pass
    
    @abstractmethod
    async def delete(self, workflow_id: UUID) -> bool:
        """Delete a workflow."""
        pass

Example 4: Repository Implementation (Infrastructure)

# infrastructure/database/repositories/workflow_repository_impl.py
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow
from domain.interfaces.workflow_repository import IWorkflowRepository
from infrastructure.database.connection import get_db_session
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select

class WorkflowRepository(IWorkflowRepository):
    """Neo4j implementation of workflow repository."""
    
    def __init__(self, session: AsyncSession):
        self.session = session
    
    async def create(self, workflow: Workflow) -> Workflow:
        """Create workflow in Neo4j."""
        query = """
        CREATE (w:Workflow {
            id: $id,
            project_name: $project_name,
            project_description: $project_description,
            records_officer_email: $records_officer_email,
            current_phase: $current_phase,
            created_at: $created_at,
            updated_at: $updated_at
        })
        RETURN w
        """
        # Implementation details...
        return workflow
    
    async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]:
        """Get workflow by ID from Neo4j."""
        query = """
        MATCH (w:Workflow {id: $workflow_id})
        OPTIONAL MATCH (w)-[:HAS_ITEM]->(i:WorkflowItem)
        RETURN w, collect(i) as items
        """
        # Implementation details...
        pass
    
    # ... other methods

Example 5: Service Layer

# application/services/workflow_service.py
from typing import List, Optional
from uuid import UUID
from domain.entities.workflow import Workflow, WorkflowItem
from domain.interfaces.workflow_repository import IWorkflowRepository
from domain.value_objects.workflow_phase import WorkflowPhase
from domain.value_objects.task_status import TaskStatus
from domain.exceptions.domain_exceptions import WorkflowNotFoundError

class WorkflowService:
    """Service for workflow business logic."""
    
    def __init__(self, workflow_repository: IWorkflowRepository):
        self.workflow_repository = workflow_repository
    
    async def create_workflow(
        self,
        project_name: str,
        project_description: Optional[str],
        records_officer_email: Optional[str]
    ) -> Workflow:
        """Create a new workflow with initial phase."""
        workflow = Workflow(
            id=UUID(),
            project_name=project_name,
            project_description=project_description,
            records_officer_email=records_officer_email,
            current_phase=WorkflowPhase.CONCEPT_DEVELOPMENT
        )
        
        # Initialize Phase 1 items
        phase1_items = self._get_phase1_items()
        for item in phase1_items:
            workflow.add_item(item)
        
        return await self.workflow_repository.create(workflow)
    
    async def get_workflow(self, workflow_id: UUID) -> Workflow:
        """Get workflow by ID."""
        workflow = await self.workflow_repository.get_by_id(workflow_id)
        if not workflow:
            raise WorkflowNotFoundError(f"Workflow {workflow_id} not found")
        return workflow
    
    async def update_workflow_item(
        self,
        workflow_id: UUID,
        item_id: int,
        status: TaskStatus,
        updated_by: str,
        comment: Optional[str] = None
    ) -> Workflow:
        """Update a workflow item."""
        workflow = await self.get_workflow(workflow_id)
        workflow.update_item_status(item_id, status, updated_by, comment)
        return await self.workflow_repository.update(workflow)
    
    async def advance_workflow(self, workflow_id: UUID) -> Workflow:
        """Advance workflow to next phase."""
        workflow = await self.get_workflow(workflow_id)
        
        if not workflow.can_advance_phase():
            raise ValueError("Cannot advance: Phase requirements not met")
        
        # Advance to next phase logic...
        return await self.workflow_repository.update(workflow)
    
    def _get_phase1_items(self) -> List[WorkflowItem]:
        """Get Phase 1 checklist items."""
        return [
            WorkflowItem(
                id=1,
                task="Include Records Officer in system design process",
                status=TaskStatus.PENDING,
                requires_approval=True,
                approver="Records Officer"
            ),
            # ... more items
        ]

Example 6: API Route with Dependency Injection

# api/routes/workflows.py
from fastapi import APIRouter, Depends, HTTPException, status
from uuid import UUID
from typing import List
from api.schemas.workflow import (
    WorkflowCreateRequest,
    WorkflowResponse,
    WorkflowItemUpdateRequest
)
from application.services.workflow_service import WorkflowService
from api.dependencies import get_workflow_service, get_current_user
from domain.value_objects.task_status import TaskStatus

router = APIRouter(prefix="/workflows", tags=["workflows"])

@router.post("", response_model=WorkflowResponse, status_code=status.HTTP_201_CREATED)
async def create_workflow(
    request: WorkflowCreateRequest,
    workflow_service: WorkflowService = Depends(get_workflow_service),
    current_user = Depends(get_current_user)
):
    """Create a new workflow."""
    try:
        workflow = await workflow_service.create_workflow(
            project_name=request.project_name,
            project_description=request.project_description,
            records_officer_email=request.records_officer_email
        )
        return WorkflowResponse.from_entity(workflow)
    except Exception as e:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )

@router.get("/{workflow_id}", response_model=WorkflowResponse)
async def get_workflow(
    workflow_id: UUID,
    workflow_service: WorkflowService = Depends(get_workflow_service),
    current_user = Depends(get_current_user)
):
    """Get workflow by ID."""
    try:
        workflow = await workflow_service.get_workflow(workflow_id)
        return WorkflowResponse.from_entity(workflow)
    except WorkflowNotFoundError:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Workflow not found"
        )

@router.put("/{workflow_id}/items", response_model=WorkflowResponse)
async def update_workflow_item(
    workflow_id: UUID,
    request: WorkflowItemUpdateRequest,
    workflow_service: WorkflowService = Depends(get_workflow_service),
    current_user = Depends(get_current_user)
):
    """Update a workflow item."""
    try:
        workflow = await workflow_service.update_workflow_item(
            workflow_id=workflow_id,
            item_id=request.item_id,
            status=TaskStatus(request.status),
            updated_by=current_user.email,
            comment=request.comment
        )
        return WorkflowResponse.from_entity(workflow)
    except WorkflowNotFoundError:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Workflow not found"
        )

Example 7: Dependency Injection Setup

# api/dependencies.py
from functools import lru_cache
from infrastructure.database.connection import get_db_session
from infrastructure.database.repositories.workflow_repository_impl import WorkflowRepository
from application.services.workflow_service import WorkflowService
from infrastructure.external.ollama_client import OllamaClient
from application.services.compliance_service import ComplianceService
from infrastructure.config.settings import settings

# Repository dependencies
async def get_workflow_repository():
    async for session in get_db_session():
        yield WorkflowRepository(session)

# Service dependencies
def get_workflow_service(
    workflow_repo: WorkflowRepository = Depends(get_workflow_repository)
) -> WorkflowService:
    return WorkflowService(workflow_repo)

def get_compliance_service() -> ComplianceService:
    ollama_client = OllamaClient(
        base_url=settings.ollama_base_url,
        model=settings.ollama_model
    )
    return ComplianceService(ollama_client)

# Auth dependencies
async def get_current_user(
    token: str = Depends(oauth2_scheme)
):
    # JWT validation logic
    pass

Example 8: Main Application Setup

# main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from infrastructure.config.settings import settings
from infrastructure.config.logging_config import setup_logging
from api.middleware.error_handler import setup_exception_handlers
from api.middleware.cors import setup_cors
from api.routes import workflows, documents, compliance, health, auth

# Setup logging
setup_logging()

# Create FastAPI app
app = FastAPI(
    title=settings.app_name,
    version=settings.app_version,
    debug=settings.debug
)

# Setup middleware
setup_cors(app, settings.cors_origins)
setup_exception_handlers(app)

# Include routers
app.include_router(auth.router)
app.include_router(workflows.router)
app.include_router(documents.router)
app.include_router(compliance.router)
app.include_router(health.router)

@app.on_event("startup")
async def startup_event():
    """Initialize services on startup."""
    # Initialize database connections
    # Initialize external services
    pass

@app.on_event("shutdown")
async def shutdown_event():
    """Cleanup on shutdown."""
    # Close database connections
    # Cleanup resources
    pass

Security Improvements

1. Authentication & Authorization

# infrastructure/security/auth.py
from datetime import datetime, timedelta
from typing import Optional
from jose import JWTError, jwt
from passlib.context import CryptContext
from infrastructure.config.settings import settings

pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

def verify_password(plain_password: str, hashed_password: str) -> bool:
    """Verify a password against a hash."""
    return pwd_context.verify(plain_password, hashed_password)

def get_password_hash(password: str) -> str:
    """Hash a password."""
    return pwd_context.hash(password)

def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
    """Create JWT access token."""
    to_encode = data.copy()
    if expires_delta:
        expire = datetime.utcnow() + expires_delta
    else:
        expire = datetime.utcnow() + timedelta(
            minutes=settings.access_token_expire_minutes
        )
    to_encode.update({"exp": expire})
    encoded_jwt = jwt.encode(
        to_encode, 
        settings.secret_key, 
        algorithm=settings.algorithm
    )
    return encoded_jwt

2. CORS Configuration

# api/middleware/cors.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from typing import List

def setup_cors(app: FastAPI, allowed_origins: List[str]):
    """Configure CORS middleware."""
    app.add_middleware(
        CORSMiddleware,
        allow_origins=allowed_origins,  # Specific origins, not "*"
        allow_credentials=True,
        allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH"],
        allow_headers=["Content-Type", "Authorization"],
    )

3. Rate Limiting

# api/middleware/rate_limit.py
from fastapi import Request, HTTPException, status
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)

@router.post("")
@limiter.limit("10/minute")  # 10 requests per minute
async def create_workflow(request: Request, ...):
    # Implementation
    pass

Testing Structure

# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from main import app
from infrastructure.database.connection import get_test_db

@pytest.fixture
def client():
    return TestClient(app)

@pytest.fixture
def test_db():
    # Setup test database
    yield
    # Teardown

# tests/unit/application/services/test_workflow_service.py
import pytest
from uuid import UUID
from application.services.workflow_service import WorkflowService
from domain.entities.workflow import Workflow

@pytest.mark.asyncio
async def test_create_workflow():
    # Mock repository
    mock_repo = MockWorkflowRepository()
    service = WorkflowService(mock_repo)
    
    workflow = await service.create_workflow(
        project_name="Test Project",
        project_description="Test Description",
        records_officer_email="test@example.com"
    )
    
    assert workflow.project_name == "Test Project"
    assert workflow.current_phase == WorkflowPhase.CONCEPT_DEVELOPMENT

Migration Strategy

Phase 1: Foundation (Week 1-2)

  1. Create new directory structure
  2. Set up configuration management
  3. Implement dependency injection
  4. Set up database connection

Phase 2: Domain Layer (Week 3)

  1. Create domain entities
  2. Define repository interfaces
  3. Implement value objects

Phase 3: Infrastructure (Week 4)

  1. Implement repository classes
  2. Set up external service clients
  3. Configure security

Phase 4: Application Layer (Week 5)

  1. Create service classes
  2. Implement use cases
  3. Create DTOs

Phase 5: API Layer (Week 6)

  1. Create route modules
  2. Implement middleware
  3. Set up error handling

Phase 6: Testing & Migration (Week 7-8)

  1. Write unit tests
  2. Write integration tests
  3. Migrate existing endpoints
  4. Deploy and monitor

Benefits of This Architecture

  1. Testability: Each layer can be tested independently
  2. Maintainability: Clear separation of concerns
  3. Scalability: Easy to add new features
  4. Security: Built-in security at every layer
  5. Flexibility: Easy to swap implementations (e.g., different databases)
  6. Team Collaboration: Different teams can work on different layers

Next Steps

  1. Review and approve this architecture
  2. Create detailed implementation plan
  3. Set up project structure
  4. Begin Phase 1 implementation
  5. Establish coding standards and review process