tlam89/sciagent

Fork 0

Files

T

Thinh Lam 688fac73e9

CI/CD / backend (push) Failing after 2m8s

Details

CI/CD / frontend (push) Failing after 1m40s

Details

CI/CD / deploy (push) Has been skipped

Details

sciagent code + Gitea Actions CI/CD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-30 09:38:30 +07:00

9.5 KiB

Raw Blame History

Production Docker deployment (`docker-compose.prod.yml`)

This guide walks through common failures when running the prod-style stack locally or on a VPS, in a fixed order: validate environment, reconcile Postgres credentials with the Docker volume, then confirm frontend wiring.

Stack topology (frontend → backend → DB → MinIO): deploy-stack-overview.md

Related files: .env.example (copy to .env), scripts/deploy-prod.sh, scripts/verify-prod-env.sh.

1. `.env` in the repo root (cloud / VPS)

Docker Compose substitutes ${PUBLIC_HOST}, ${POSTGRES_USER}, etc. from a file named .env in the same directory as docker-compose.prod.yml (or from --env-file when you use the deploy script).

It may already be there: plain `ls` hides it

Unix ls does not list dotfiles. A file named .env will not show up unless you:

ls -a                    # lists .env alongside . ..
test -f .env && echo ok  # exits 0 if the file exists

Create it when it is missing

From the repo root on the server:

cp .env.example .env
nano .env   # or vim / your editor — set PUBLIC_HOST, secrets, Postgres identifiers (see section **3** below)
chmod 600 .env   # optional: restrict reads to your user/root

./scripts/deploy-prod.sh refuses to run if .env is absent. If you start Compose by hand without a .env file, ${POSTGRES_*} interpolates empty and Postgres health checks / connections can misbehave — always keep a populated .env next to the compose file.

2. Run validation before compose

Always fix script failures before restarting containers.

./scripts/verify-prod-env.sh

verify-prod-env.sh rejects:

Empty PUBLIC_HOST, ports, MinIO or Postgres variables.
POSTGRES_USER / POSTGRES_DB that are not plain SQL identifiers (letters, digits, underscore only — no !, spaces, unicode).
POSTGRES_PASSWORD containing @, :, /, or %, which breaks INITIATIVE_DATABASE_URL in Compose (assembled without URL-encoding).

If deploy-prod.sh exits early, rerun verify-prod-env.sh and edit .env until it prints OK.

3. Postgres — `FATAL: role "<name>" does not exist`

Why it happens

The official Postgres image creates POSTGRES_USER and POSTGRES_DB only when the data directory is empty (first start of the named volume). After that, changing .env does not rename or recreate roles inside the volume.

Typical triggers:

Situation	Result
Volume was initialized with `POSTGRES_USER=initiative`; `.env` now uses a different username	Existing DB has role `initiative`, not your new name.
Username with special characters (`user_pkhcn2025!`)	Prefer plain identifiers — see validation above — and historically some setups never created the role cleanly.

Fix (pick one track)

A. Keep existing data — align .env with the roles that already exist

Discover the logical volume name Compose uses:
```
docker compose --env-file .env -f docker-compose.prod.yml down
docker volume ls | grep initiative_pg_data
```
The name looks like <project>_initiative_pg_data (Compose names the volume from your project directory).
Start only Postgres temporarily with .env that matches credentials you know worked on first bootstrap (often your dev values from docker-compose.yml: user initiative, DB initiatives):
```
docker compose --env-file .env -f docker-compose.prod.yml up -d postgres
```
List roles inside the cluster (substitute -U/-d/PGPASSWORD to match credentials that succeed):
```
docker compose --env-file .env -f docker-compose.prod.yml exec postgres \
  psql -U initiative -d initiatives -c '\du'
```
Set POSTGRES_USER / POSTGRES_DB / POSTGRES_PASSWORD in .env to match an existing role and database. Do not change only the username without aligning to an existing login.

Password-only mismatch: If the role and database names are already correct but someone changed POSTGRES_PASSWORD in .env after the volume was first created, run from the repo root (with postgres running):
```
./scripts/sync-postgres-app-password.sh
```
That executes ALTER ROLE … PASSWORD to match .env when psql inside the container can connect without the old password (typical with the official image’s local socket rules). If it fails, use the psql steps above with credentials that still work, or re-init the volume (B). Optional: POSTGRES_SUPERUSER in .env if you must connect as another superuser (e.g. postgres).

B. You can afford to lose Postgres data — re-init the volume

Stop stack; remove volume (this deletes all DB data):

docker compose --env-file .env -f docker-compose.prod.yml down
docker volume rm <project>_initiative_pg_data   # exact name from `docker volume ls`

Ensure ./scripts/verify-prod-env.sh passes.
Bring stack up fresh so scripts in docker-entrypoint-initdb.d/ run:
```
./scripts/deploy-prod.sh
```

C. Rename or add roles without wiping data (advanced)

Connect as your currently working database superuser, then:

ALTER ROLE initiative RENAME TO new_name;
Create a parallel role/password with matching grants if your app expects a dedicated user only.

Operational details vary with your retention and backup policy; involve your DBA playbook if applicable.

4. Frontend (`fe0`) — port mismatch (host cannot reach UI)

Compose maps ${FE_PORT}:8080: traffic to the container must hit port 8080 inside fe0.

Vite defaults to 5173 if nothing overrides it. Previously that meant the mapped port forwarded to nothing or the wrong listener.

Required state

Vite must set:

server.port: 8080
host 0.0.0.0 and port 8080 (Compose/Dockerfile pass npm run dev -- --host 0.0.0.0 --port 8080 so bind-mounted trees without an updated vite.config.ts still match ${FE_PORT}:8080)

If logs show:

Local: http://localhost:5173/

fix vite.config.ts so the dev server uses 8080, then recreate or restart fe0.

After that, browsers use:

http://${PUBLIC_HOST}:${FE_PORT}

5. Different IPs in logs (`fe0` vs MinIO)

This is usually correct, not contradictory:

Log line	Meaning
`fe0` “Network”: `http://10.5.0.x:…`	Static container IP on Compose bridge `profyt-net` (`docker-compose.prod.yml` `ipv4_address`).
MinIO banner: `http://<PUBLIC_HOST>:19000`	Public/browser URL, from `MINIO_SERVER_URL` using `PUBLIC_HOST` and `MINIO_API_PORT`.

be0 still talks to MinIO as http://minio:9000 internally; browsers use ${PUBLIC_HOST} unless you override presign with S3_PUBLIC_ENDPOINT_URL.

When the UI is https://, embedding plain http://…:${MINIO_API_PORT} presigned URLs is blocked (mixed content). In-app PDF preview can use GET …/evidence/content; for direct presigned links in the browser, terminate TLS on the MinIO API host and set S3_PUBLIC_ENDPOINT_URL / MINIO_SERVER_URL to that https://… base — see minio-behind-https.md and deploy/nginx/minio-s3-proxy.conf.example.

6. Operational checklist after changes

./scripts/verify-prod-env.sh
docker compose --env-file .env -f docker-compose.prod.yml config >/dev/null
./scripts/deploy-prod.sh           # or: up without -d for foreground logs
docker compose --env-file .env -f docker-compose.prod.yml ps

For Postgres persistence issues, skim section 3 before editing .env again.

7. Postgres — `relation "audit_events" does not exist`

Why it happens

docker-entrypoint-initdb.d on the Postgres image runs only when the data volume is empty. If the volume was created before 008_audit_events.sql existed in compose, that migration never ran. be0 then fails when it tries to write audit rows.

Fix

After pulling a current be0 image / repo: restart be0. On startup, scripts/apply_initiative_migrations.py applies 008_audit_events.sql automatically if public.audit_events is missing (same pattern as migration 009).

Or apply by hand from the repo root on the server (adjust user/db to match .env):

docker compose --env-file .env -f docker-compose.prod.yml exec -T postgres \
  psql -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
  < be0/migrations/008_audit_events.sql

8. Large uploads — `413 Request Entity Too Large` (evidence PDF, etc.)

The app allows evidence up to 50 MB end-to-end, but HTTPS reverse proxies (nginx in front of www.rcc-ump.com) often default to client_max_body_size 1m, which rejects a multi-megabyte PDF before Docker sees the request. The browser console may show an HTML nginx error page (comment about “friendly error page”).

Fix (nginx)

In the server { } (or the location that proxies to your fe0 port), set at least:

client_max_body_size 64m;

Reload nginx after editing. If uploads are slow, you may also need longer timeouts on the same location:

proxy_read_timeout 300s;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;

If Cloudflare (or another CDN) sits in front of the origin, confirm it does not impose a smaller upload limit than nginx.

Note: Browsers hit fe0 (Vite proxy /api → be0). The body limit must allow the full multipart upload on the first hop (usually nginx → origin), not only inside Docker.

9.5 KiB Raw Blame History Unescape Escape

Production Docker deployment (docker-compose.prod.yml)

1. .env in the repo root (cloud / VPS)

It may already be there: plain ls hides it

Create it when it is missing

2. Run validation before compose

3. Postgres — FATAL: role "<name>" does not exist

Why it happens

Fix (pick one track)

4. Frontend (fe0) — port mismatch (host cannot reach UI)

Required state

5. Different IPs in logs (fe0 vs MinIO)

6. Operational checklist after changes

7. Postgres — relation "audit_events" does not exist

Why it happens

Fix

8. Large uploads — 413 Request Entity Too Large (evidence PDF, etc.)

Fix (nginx)

9.5 KiB

Raw Blame History

Production Docker deployment (`docker-compose.prod.yml`)

1. `.env` in the repo root (cloud / VPS)

It may already be there: plain `ls` hides it

3. Postgres — `FATAL: role "<name>" does not exist`

4. Frontend (`fe0`) — port mismatch (host cannot reach UI)

5. Different IPs in logs (`fe0` vs MinIO)

7. Postgres — `relation "audit_events" does not exist`

8. Large uploads — `413 Request Entity Too Large` (evidence PDF, etc.)