sciagent/docs/deploy-production-docker.md

# Production Docker deployment (`docker-compose.prod.yml`)

This guide walks through **common failures** when running the prod-style stack locally or on a VPS, in a fixed order: validate environment, reconcile Postgres credentials with the Docker volume, then confirm frontend wiring.

**Stack topology (frontend → backend → DB → MinIO):** [deploy-stack-overview.md](./deploy-stack-overview.md)

Related files: `.env.example` (copy to `.env`), `scripts/deploy-prod.sh`, `scripts/verify-prod-env.sh`.

---

## 1. `.env` in the repo root (cloud / VPS)

Docker Compose substitutes `${PUBLIC_HOST}`, `${POSTGRES_USER}`, etc. from a file named `.env` in the **same directory** as `docker-compose.prod.yml` (or from `--env-file` when you use the deploy script).

### It may already be there: plain `ls` hides it

Unix `ls` does **not** list dotfiles. A file named `.env` will **not** show up unless you:

```bash
ls -a                    # lists .env alongside . ..
test -f .env && echo ok  # exits 0 if the file exists
```

### Create it when it is missing

From the repo root on the server:

```bash
cp .env.example .env
nano .env   # or vim / your editor — set PUBLIC_HOST, secrets, Postgres identifiers (see section **3** below)
chmod 600 .env   # optional: restrict reads to your user/root
```

`./scripts/deploy-prod.sh` refuses to run if `.env` is absent. If you start Compose by hand **without** a `.env` file, `${POSTGRES_*}` interpolates empty and Postgres health checks / connections can misbehave — always keep a populated `.env` next to the compose file.

---

## 2. Run validation before compose

Always fix script failures before restarting containers.

```bash
./scripts/verify-prod-env.sh
```

`verify-prod-env.sh` rejects:

- Empty `PUBLIC_HOST`, ports, MinIO or Postgres variables.
- `POSTGRES_USER` / `POSTGRES_DB` that are not plain SQL identifiers (letters, digits, underscore only — no `!`, spaces, unicode).
- `POSTGRES_PASSWORD` containing `@`, `:`, `/`, or `%`, which breaks `INITIATIVE_DATABASE_URL` in Compose (assembled without URL-encoding).

If `deploy-prod.sh` exits early, rerun `verify-prod-env.sh` and edit `.env` until it prints `OK`.

---

## 3. Postgres — `FATAL: role "<name>" does not exist`

### Why it happens

The official Postgres image **creates `POSTGRES_USER` and `POSTGRES_DB` only when the data directory is empty** (first start of the named volume). After that, changing `.env` does **not** rename or recreate roles inside the volume.

Typical triggers:

| Situation | Result |
|-----------|--------|
| Volume was initialized with `POSTGRES_USER=initiative`; `.env` now uses a different username | Existing DB has role `initiative`, not your new name. |
| Username with special characters (`user_pkhcn2025!`) | Prefer plain identifiers — see validation above — and historically some setups never created the role cleanly. |

### Fix (pick one track)

**A. Keep existing data — align `.env` with the roles that already exist**

1. Discover the logical volume name Compose uses:

   ```bash
   docker compose --env-file .env -f docker-compose.prod.yml down
   docker volume ls | grep initiative_pg_data
   ```

   The name looks like `<project>_initiative_pg_data` (Compose names the volume from your project directory).

2. Start only Postgres temporarily with `.env` that matches credentials you **know** worked on first bootstrap (often your dev values from `docker-compose.yml`: user `initiative`, DB `initiatives`):

   ```bash
   docker compose --env-file .env -f docker-compose.prod.yml up -d postgres
   ```

3. List roles inside the cluster (substitute `-U`/`-d`/`PGPASSWORD` to match credentials that succeed):

   ```bash
   docker compose --env-file .env -f docker-compose.prod.yml exec postgres \
     psql -U initiative -d initiatives -c '\du'
   ```

   Set `POSTGRES_USER` / `POSTGRES_DB` / `POSTGRES_PASSWORD` in `.env` to match an existing role and database. Do **not** change only the username without aligning to an existing login.

   **Password-only mismatch:** If the role and database names are already correct but someone changed `POSTGRES_PASSWORD` in `.env` after the volume was first created, run from the repo root (with `postgres` running):

   ```bash
   ./scripts/sync-postgres-app-password.sh
   ```

   That executes `ALTER ROLE … PASSWORD` to match `.env` when `psql` inside the container can connect without the old password (typical with the official image’s local socket rules). If it fails, use the `psql` steps above with credentials that still work, or re-init the volume (**B**). Optional: `POSTGRES_SUPERUSER` in `.env` if you must connect as another superuser (e.g. `postgres`).

**B. You can afford to lose Postgres data — re-init the volume**

1. Stop stack; remove volume (this **deletes** all DB data):

   ```bash
   docker compose --env-file .env -f docker-compose.prod.yml down
   docker volume rm <project>_initiative_pg_data   # exact name from `docker volume ls`
   ```

2. Ensure `./scripts/verify-prod-env.sh` passes.

3. Bring stack up fresh so scripts in `docker-entrypoint-initdb.d/` run:

   ```bash
   ./scripts/deploy-prod.sh
   ```

**C. Rename or add roles without wiping data (advanced)**

Connect as your **currently working** database superuser, then:

- `ALTER ROLE initiative RENAME TO new_name;`
- Create a parallel role/password with matching grants if your app expects a dedicated user only.

Operational details vary with your retention and backup policy; involve your DBA playbook if applicable.

---

## 4. Frontend (`fe0`) — port mismatch (host cannot reach UI)

Compose maps **`${FE_PORT}:8080`**: traffic to the container must hit **port 8080** inside `fe0`.

Vite defaults to **5173** if nothing overrides it. Previously that meant the mapped port forwarded to nothing or the wrong listener.

### Required state

[Vite](../fe0/vite.config.ts) must set:

- `server.port: 8080`
- host `0.0.0.0` and **port 8080** (Compose/Dockerfile pass `npm run dev -- --host 0.0.0.0 --port 8080` so bind-mounted trees without an updated `vite.config.ts` still match `${FE_PORT}:8080`)

If logs show:

```text
Local: http://localhost:5173/
```

fix `vite.config.ts` so the dev server uses **8080**, then recreate or restart `fe0`.

After that, browsers use:

```text
http://${PUBLIC_HOST}:${FE_PORT}
```

---

## 5. Different IPs in logs (`fe0` vs MinIO)

This is usually **correct**, not contradictory:

| Log line | Meaning |
|----------|---------|
| `fe0` “Network”: `http://10.5.0.x:…` | **Static container IP** on Compose bridge `profyt-net` (`docker-compose.prod.yml` `ipv4_address`). |
| MinIO banner: `http://<PUBLIC_HOST>:19000` | **Public/browser URL**, from `MINIO_SERVER_URL` using `PUBLIC_HOST` and `MINIO_API_PORT`. |

`be0` still talks to MinIO as `http://minio:9000` internally; browsers use `${PUBLIC_HOST}` unless you override presign with **`S3_PUBLIC_ENDPOINT_URL`**.

When the UI is **`https://`**, embedding plain **`http://…:${MINIO_API_PORT}`** presigned URLs is blocked (**mixed content**). In-app PDF preview can use **`GET …/evidence/content`**; for direct presigned links in the browser, terminate TLS on the MinIO API host and set **`S3_PUBLIC_ENDPOINT_URL`** / **`MINIO_SERVER_URL`** to that **`https://…`** base — see **[minio-behind-https.md](./minio-behind-https.md)** and **`deploy/nginx/minio-s3-proxy.conf.example`**.

---

## 6. Operational checklist after changes

```bash
./scripts/verify-prod-env.sh
docker compose --env-file .env -f docker-compose.prod.yml config >/dev/null
./scripts/deploy-prod.sh           # or: up without -d for foreground logs
docker compose --env-file .env -f docker-compose.prod.yml ps
```

For Postgres persistence issues, skim **section 3** before editing `.env` again.

---

## 7. Postgres — `relation "audit_events" does not exist`

### Why it happens

`docker-entrypoint-initdb.d` on the Postgres image runs **only when the data volume is empty**. If the volume was created **before** `008_audit_events.sql` existed in compose, that migration never ran. **`be0`** then fails when it tries to write audit rows.

### Fix

**After pulling a current `be0` image / repo:** restart **`be0`**. On startup, `scripts/apply_initiative_migrations.py` applies **`008_audit_events.sql`** automatically if `public.audit_events` is missing (same pattern as migration 009).

Or apply by hand from the repo root on the server (adjust user/db to match `.env`):

```bash
docker compose --env-file .env -f docker-compose.prod.yml exec -T postgres \
  psql -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
  < be0/migrations/008_audit_events.sql
```

---

## 8. Large uploads — `413 Request Entity Too Large` (evidence PDF, etc.)

The app allows evidence up to **50 MB** end-to-end, but **HTTPS reverse proxies** (nginx in front of `www.rcc-ump.com`) often default to **`client_max_body_size 1m`**, which rejects a **multi-megabyte** PDF **before** Docker sees the request. The browser console may show an HTML nginx error page (comment about “friendly error page”).

### Fix (nginx)

In the `server { }` (or the `location` that proxies to your `fe0` port), set at least:

```nginx
client_max_body_size 64m;
```

Reload nginx after editing. If uploads are slow, you may also need longer timeouts on the same `location`:

```nginx
proxy_read_timeout 300s;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
```

If **Cloudflare** (or another CDN) sits in front of the origin, confirm it does not impose a smaller upload limit than nginx.

**Note:** Browsers hit **`fe0`** (Vite proxy `/api` → `be0`). The body limit must allow the full multipart upload on the **first** hop (usually nginx → origin), not only inside Docker.