sciagent code + Gitea Actions CI/CD
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Thinh Lam
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
+232
View File
@@ -0,0 +1,232 @@
# Production Docker deployment (`docker-compose.prod.yml`)
This guide walks through **common failures** when running the prod-style stack locally or on a VPS, in a fixed order: validate environment, reconcile Postgres credentials with the Docker volume, then confirm frontend wiring.
**Stack topology (frontend → backend → DB → MinIO):** [deploy-stack-overview.md](./deploy-stack-overview.md)
Related files: `.env.example` (copy to `.env`), `scripts/deploy-prod.sh`, `scripts/verify-prod-env.sh`.
---
## 1. `.env` in the repo root (cloud / VPS)
Docker Compose substitutes `${PUBLIC_HOST}`, `${POSTGRES_USER}`, etc. from a file named `.env` in the **same directory** as `docker-compose.prod.yml` (or from `--env-file` when you use the deploy script).
### It may already be there: plain `ls` hides it
Unix `ls` does **not** list dotfiles. A file named `.env` will **not** show up unless you:
```bash
ls -a # lists .env alongside . ..
test -f .env && echo ok # exits 0 if the file exists
```
### Create it when it is missing
From the repo root on the server:
```bash
cp .env.example .env
nano .env # or vim / your editor — set PUBLIC_HOST, secrets, Postgres identifiers (see section **3** below)
chmod 600 .env # optional: restrict reads to your user/root
```
`./scripts/deploy-prod.sh` refuses to run if `.env` is absent. If you start Compose by hand **without** a `.env` file, `${POSTGRES_*}` interpolates empty and Postgres health checks / connections can misbehave — always keep a populated `.env` next to the compose file.
---
## 2. Run validation before compose
Always fix script failures before restarting containers.
```bash
./scripts/verify-prod-env.sh
```
`verify-prod-env.sh` rejects:
- Empty `PUBLIC_HOST`, ports, MinIO or Postgres variables.
- `POSTGRES_USER` / `POSTGRES_DB` that are not plain SQL identifiers (letters, digits, underscore only — no `!`, spaces, unicode).
- `POSTGRES_PASSWORD` containing `@`, `:`, `/`, or `%`, which breaks `INITIATIVE_DATABASE_URL` in Compose (assembled without URL-encoding).
If `deploy-prod.sh` exits early, rerun `verify-prod-env.sh` and edit `.env` until it prints `OK`.
---
## 3. Postgres — `FATAL: role "<name>" does not exist`
### Why it happens
The official Postgres image **creates `POSTGRES_USER` and `POSTGRES_DB` only when the data directory is empty** (first start of the named volume). After that, changing `.env` does **not** rename or recreate roles inside the volume.
Typical triggers:
| Situation | Result |
|-----------|--------|
| Volume was initialized with `POSTGRES_USER=initiative`; `.env` now uses a different username | Existing DB has role `initiative`, not your new name. |
| Username with special characters (`user_pkhcn2025!`) | Prefer plain identifiers — see validation above — and historically some setups never created the role cleanly. |
### Fix (pick one track)
**A. Keep existing data — align `.env` with the roles that already exist**
1. Discover the logical volume name Compose uses:
```bash
docker compose --env-file .env -f docker-compose.prod.yml down
docker volume ls | grep initiative_pg_data
```
The name looks like `<project>_initiative_pg_data` (Compose names the volume from your project directory).
2. Start only Postgres temporarily with `.env` that matches credentials you **know** worked on first bootstrap (often your dev values from `docker-compose.yml`: user `initiative`, DB `initiatives`):
```bash
docker compose --env-file .env -f docker-compose.prod.yml up -d postgres
```
3. List roles inside the cluster (substitute `-U`/`-d`/`PGPASSWORD` to match credentials that succeed):
```bash
docker compose --env-file .env -f docker-compose.prod.yml exec postgres \
psql -U initiative -d initiatives -c '\du'
```
Set `POSTGRES_USER` / `POSTGRES_DB` / `POSTGRES_PASSWORD` in `.env` to match an existing role and database. Do **not** change only the username without aligning to an existing login.
**Password-only mismatch:** If the role and database names are already correct but someone changed `POSTGRES_PASSWORD` in `.env` after the volume was first created, run from the repo root (with `postgres` running):
```bash
./scripts/sync-postgres-app-password.sh
```
That executes `ALTER ROLE … PASSWORD` to match `.env` when `psql` inside the container can connect without the old password (typical with the official images local socket rules). If it fails, use the `psql` steps above with credentials that still work, or re-init the volume (**B**). Optional: `POSTGRES_SUPERUSER` in `.env` if you must connect as another superuser (e.g. `postgres`).
**B. You can afford to lose Postgres data — re-init the volume**
1. Stop stack; remove volume (this **deletes** all DB data):
```bash
docker compose --env-file .env -f docker-compose.prod.yml down
docker volume rm <project>_initiative_pg_data # exact name from `docker volume ls`
```
2. Ensure `./scripts/verify-prod-env.sh` passes.
3. Bring stack up fresh so scripts in `docker-entrypoint-initdb.d/` run:
```bash
./scripts/deploy-prod.sh
```
**C. Rename or add roles without wiping data (advanced)**
Connect as your **currently working** database superuser, then:
- `ALTER ROLE initiative RENAME TO new_name;`
- Create a parallel role/password with matching grants if your app expects a dedicated user only.
Operational details vary with your retention and backup policy; involve your DBA playbook if applicable.
---
## 4. Frontend (`fe0`) — port mismatch (host cannot reach UI)
Compose maps **`${FE_PORT}:8080`**: traffic to the container must hit **port 8080** inside `fe0`.
Vite defaults to **5173** if nothing overrides it. Previously that meant the mapped port forwarded to nothing or the wrong listener.
### Required state
[Vite](../fe0/vite.config.ts) must set:
- `server.port: 8080`
- host `0.0.0.0` and **port 8080** (Compose/Dockerfile pass `npm run dev -- --host 0.0.0.0 --port 8080` so bind-mounted trees without an updated `vite.config.ts` still match `${FE_PORT}:8080`)
If logs show:
```text
Local: http://localhost:5173/
```
fix `vite.config.ts` so the dev server uses **8080**, then recreate or restart `fe0`.
After that, browsers use:
```text
http://${PUBLIC_HOST}:${FE_PORT}
```
---
## 5. Different IPs in logs (`fe0` vs MinIO)
This is usually **correct**, not contradictory:
| Log line | Meaning |
|----------|---------|
| `fe0` “Network”: `http://10.5.0.x:…` | **Static container IP** on Compose bridge `profyt-net` (`docker-compose.prod.yml` `ipv4_address`). |
| MinIO banner: `http://<PUBLIC_HOST>:19000` | **Public/browser URL**, from `MINIO_SERVER_URL` using `PUBLIC_HOST` and `MINIO_API_PORT`. |
`be0` still talks to MinIO as `http://minio:9000` internally; browsers use `${PUBLIC_HOST}` unless you override presign with **`S3_PUBLIC_ENDPOINT_URL`**.
When the UI is **`https://`**, embedding plain **`http://…:${MINIO_API_PORT}`** presigned URLs is blocked (**mixed content**). In-app PDF preview can use **`GET …/evidence/content`**; for direct presigned links in the browser, terminate TLS on the MinIO API host and set **`S3_PUBLIC_ENDPOINT_URL`** / **`MINIO_SERVER_URL`** to that **`https://…`** base — see **[minio-behind-https.md](./minio-behind-https.md)** and **`deploy/nginx/minio-s3-proxy.conf.example`**.
---
## 6. Operational checklist after changes
```bash
./scripts/verify-prod-env.sh
docker compose --env-file .env -f docker-compose.prod.yml config >/dev/null
./scripts/deploy-prod.sh # or: up without -d for foreground logs
docker compose --env-file .env -f docker-compose.prod.yml ps
```
For Postgres persistence issues, skim **section 3** before editing `.env` again.
---
## 7. Postgres — `relation "audit_events" does not exist`
### Why it happens
`docker-entrypoint-initdb.d` on the Postgres image runs **only when the data volume is empty**. If the volume was created **before** `008_audit_events.sql` existed in compose, that migration never ran. **`be0`** then fails when it tries to write audit rows.
### Fix
**After pulling a current `be0` image / repo:** restart **`be0`**. On startup, `scripts/apply_initiative_migrations.py` applies **`008_audit_events.sql`** automatically if `public.audit_events` is missing (same pattern as migration 009).
Or apply by hand from the repo root on the server (adjust user/db to match `.env`):
```bash
docker compose --env-file .env -f docker-compose.prod.yml exec -T postgres \
psql -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
< be0/migrations/008_audit_events.sql
```
---
## 8. Large uploads — `413 Request Entity Too Large` (evidence PDF, etc.)
The app allows evidence up to **50 MB** end-to-end, but **HTTPS reverse proxies** (nginx in front of `www.rcc-ump.com`) often default to **`client_max_body_size 1m`**, which rejects a **multi-megabyte** PDF **before** Docker sees the request. The browser console may show an HTML nginx error page (comment about “friendly error page”).
### Fix (nginx)
In the `server { }` (or the `location` that proxies to your `fe0` port), set at least:
```nginx
client_max_body_size 64m;
```
Reload nginx after editing. If uploads are slow, you may also need longer timeouts on the same `location`:
```nginx
proxy_read_timeout 300s;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
```
If **Cloudflare** (or another CDN) sits in front of the origin, confirm it does not impose a smaller upload limit than nginx.
**Note:** Browsers hit **`fe0`** (Vite proxy `/api` → `be0`). The body limit must allow the full multipart upload on the **first** hop (usually nginx → origin), not only inside Docker.