sciagent code + Gitea Actions CI/CD
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Thinh Lam
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
@@ -0,0 +1,298 @@
# fe0: Dashboard data refresh and API polling
This document explains **why the browser repeatedly calls `/api/applications` and `/api/notifications/unread-count`** on the dashboard, how that fits the **React + TanStack Query + Axios** stack, and **design tradeoffs** for tuning behavior.
It also encodes a **stabilization plan** for frontend, backend pressure, and predictable data loading—refined from a stability review (`assets/docs/feedback.md`) focused on removing implicit globals, polite polling, and consistent refresh semantics.
## 1. High-level flow
```mermaid
flowchart LR
subgraph ui [Dashboard UI]
D[Dashboard.tsx]
AAL[Admin ApprovedApplicationsList]
CAL[Council ApprovedApplicationsList]
NB[NotificationBell]
end
subgraph rq [TanStack Query]
QApps["useQuery applications"]
QNotif["useQuery unread count"]
end
subgraph http [HTTP]
AC[apiClient axios]
BE["Backend APIs"]
end
D -->|admin role| AAL
D -->|editor role| CAL
D -.-> NB
AAL --> QApps
NB --> QNotif
QApps --> AC
QNotif --> AC
AC --> BE
```
- **`Dashboard.tsx`** chooses which shell to render by role: **admin** sees the admin applications list (inbox), **editor** (council) sees a different list implementation, **applicant** sees the registration workspace (no inbox polling for applications in the same way).
- **`apiClient`** (`fe0/src/shared/api/client.ts`) is the shared Axios instance used by queries and mutations.
- **TanStack Query** caches by `queryKey`, runs `queryFn` on mount, and can **refetch on an interval** or **when the window regains focus**, depending on per-query options and **explicit** `QueryClient.defaultOptions` (see §7).
## 2. What triggers repeated `/api/applications` (admin inbox)
The route **`/dashboard`** for users with the **admin** role renders:
`fe0/src/pages/Dashboard.tsx``AdminApprovedApplicationsList` with `lifecycle="inbox"`.
The list loads data with `useQuery` in `fe0/src/components/admin/ApprovedApplicationsList.tsx`.
### Current behavior (as implemented today)
| Option | Value | Effect |
|--------|--------|--------|
| `queryKey` | `["applications", filters]` | Separate cache per filter set; **must** be a stable key—see §11. |
| `refetchInterval` | `10 * 1000` (10 seconds) | **Automatic polling** while mounted. **Target:** visibility-aware + optional jitter (§8, §12). |
| `refetchOnWindowFocus` | `"always"` (today) | Refetch on every focus regardless of staleness—**high load**; **target** is `true` + sensible `staleTime` (§8). |
| `refetchOnReconnect` | `true` | Refetch when the browser regains network after offline. |
| `placeholderData` | `(previous) => previous` | Keeps showing the last page while a refetch runs (less table flicker). **Keep this.** |
So the “every few seconds” pattern you see in DevTools is **intentional polling**, not a runaway bug—but the combination of **10s polling + `"always"` focus** multiplies traffic when admins tab frequently (§8).
### Same component, other lifecycles
`ApprovedApplicationsList` is also reused for the **decided** list (`lifecycle="decided"`) from `DecidedApplicationsPanel`. The **same** `refetchInterval: 10s` applies there as well—polling is tied to the component, not only the inbox title.
## 3. Council dashboard: different refresh strategy (target: unify)
**Editors** (`hasRole("editor")`) get `CouncilApprovedApplicationsList` (`fe0/src/components/council/ApprovedApplicationsList.tsx`).
That files `applicationsQuery` **does not set `refetchInterval`** today. Updates are driven more by:
- Normal Query behavior (mount, default focus rules, etc.).
- **`reportSyncQuery`**: when its `dataUpdatedAt` changes, an effect runs `queryClient.invalidateQueries({ queryKey: ["applications"] })`, which pulls a fresh `/api/applications` without a fixed timer.
**Problem:** admin (time-based polling) and council (event-driven invalidation) are two mental models for similar surfaces, in different files—cognitive load, bug asymmetry, and drift (fixes in one place may not land in the other).
**Target architecture (single strategy everywhere):**
1. **Primary:** invalidation on mutations (`approve`, `reject`, `submit`, `assign`, etc.) plus invalidation on lightweight **report sync** / version signals where applicable.
2. **Secondary:** a **slow safety-net poll** (e.g. **60120s**, visibility-aware, optionally jittered) so a missed invalidation does not leave the UI stale forever.
3. **Later (product-driven only):** **SSE** behind `apiClient` if true realtime is required—one long-lived connection per tab scales better than many short polls; WebSockets only if the server must push high-frequency updates.
Until unified, treat **both** admin and council lists as in scope for **`isFetching` audits** and query-key stability (§10, §11).
## 4. Notifications unread count
`fe0/src/components/notifications/NotificationBell.tsx`:
- `queryKey`: `["notifications-unread-count"]`
- `queryFn`: `fetchNotificationsUnreadCount``GET /api/notifications/unread-count`
- `refetchInterval`: **60_000 ms** (once per minute)
- `refetchOnWindowFocus`: `true`
- `staleTime`: **30_000 ms**
`NotificationManager.tsx` uses a similar **60s** interval for the list and calls `queryClient.invalidateQueries({ queryKey: ["notifications-unread-count"] })` after mutations so the bell can update sooner than the next minute tick. **This invalidation pattern is the model** for other features (§3).
## 5. Other polling in the admin area
These are separate from the inbox but follow the same idea (“keep dashboards somewhat fresh”):
| Location | Interval | Purpose |
|----------|----------|----------|
| `OverviewTab.tsx` | 30s | Health/status style data |
| `AIManagementTab.tsx` | 30s | AI service health |
| `NotificationBell` / `NotificationManager` | 60s | Notifications |
**Target:** centralize intervals in one module (e.g. `fe0/src/shared/config/polling.ts`) so ops and load tests can tune without hunting magic numbers across files (§12).
## 6. `client.ts` dev logging (stability and privacy)
In `fe0/src/shared/api/client.ts`, the Axios **response interceptor** logs successful responses when `import.meta.env.DEV` is true.
**Risks:** full `data` payloads on large lists **flood the console**; a misconfigured deploy that runs “dev-like” builds could **leak user data** to browser consoles.
**Target:**
1. **Sample or summarize** responses in dev; prefer `console.debug` over `console.log` for high-volume paths so DevTools defaults stay readable.
2. **Guard production** with a build-time assertion (or strict env contract), not `import.meta.env.DEV` alone.
## 7. QueryClient defaults (critical: one entrypoint, explicit defaults)
Having **two** `App.tsx` files with **different** `QueryClient` configuration is a **silent global switch**: a refactor, import cleanup, or rebase can change refetch behavior app-wide without touching feature code—and per-query `refetchInterval` would still “look” correct in review.
**Required actions (do first):**
1. **Pick one entrypoint.** Remove the duplicate in the **same** change set; do not leave a long-lived TODO.
2. **Prevent regression:** CI or ESLint `no-restricted-imports` forbidding the removed path if it could be revived.
3. **Set explicit `defaultOptions`** on the surviving `QueryClient`, even when values match library defaults—**implicit defaults are a major-version upgrade hazard** for TanStack Query.
Illustrative shape (adjust `staleTime` / `gcTime` / retry helpers to match product decisions):
```ts
const queryClient = new QueryClient({
defaultOptions: {
queries: {
refetchOnWindowFocus: false,
refetchOnReconnect: true,
staleTime: 30_000,
gcTime: 5 * 60_000,
retry: (failureCount, err) => {
if (isAuthError(err)) return false;
return failureCount < 2;
},
retryDelay: (attempt) => Math.min(1000 * 2 ** attempt, 8000),
},
mutations: { retry: false },
},
});
```
Then **`refetchInterval`**, `refetchOnWindowFocus: true`, and other overrides become **deliberate opt-ins** at the query level.
**Todays split (legacy):** `fe0/src/main.tsx` imports `fe0/src/App.tsx` (`new QueryClient()` with no defaults). `fe0/src/app/App.tsx` uses different defaults and is **not** wired from `main.tsx` until consolidated.
## 8. Polling and focus: polite defaults (frontend + backend load)
### Why `"always"` on focus is wrong for an inbox
`refetchOnWindowFocus: "always"` refetches **on every focus event regardless of staleness**. With **10s** polling, admins who tab in and out can drive **1220+ requests/minute per tab**; many admins at start of shift create **synchronized bursts** the backend cannot absorb gracefully.
**Target:** use **`true`** (refetch only when stale) with a **sensible `staleTime`** for that query. Approvals inboxes are not trading screens; the UX difference is negligible; server load is not.
### Visibility-aware polling (default pattern, not optional)
Background tabs still run timers (throttling varies). Dashboards left open all day waste work that scales with headcount.
**Default for every polling query:** pause when the document is hidden.
```ts
function useVisibilityAwareInterval(ms: number) {
return () => (document.visibilityState === "visible" ? ms : false);
}
```
Use the function form of `refetchInterval` supported by TanStack Query so engineers do not re-implement this ad hoc.
### Jitter (optional smoothing)
Fixed intervals from **mount** align across users (start of shift). **±1020% jitter** on poll delays spreads load with negligible UX impact—worth adopting once concurrent admin count grows.
## 9. HTTP, timeouts, retries, and auth (document + implement gaps)
The happy path is documented elsewhere; **stability** requires explicit policy—**even when nothing fails in tests**.
| Concern | Risk if ignored | Target |
|--------|------------------|--------|
| **No Axios timeout** | Hung requests pile up; **10s polling** stacks in-flight work; **per-host concurrency** pins the tab; UI looks frozen. | Set **explicit timeouts** on `apiClient` (or per-route overrides for long operations). |
| **Default Query retries** | TanStack Query retries **3×** by default; a bad poll tick can **amplify load** during an outage (4 quick failures per cycle). | Align `retry` / `retryDelay` with **`defaultOptions`** (§7); cap retries on read-heavy queries. |
| **401 / 403** | **Silent loops:** auth expired → poll → 401 → retry → poll again; “dashboard broken” reports. | **Never retry** auth failures; interceptor should **logout / redirect / refresh** in one documented path—**no** infinite poll on unauthenticated sessions. |
| **Offline** | `refetchOnReconnect: true` helps, but users may see **blank** data and assume loss. | **Surface offline / reconnect** in UI where lists are empty or stale. |
Add or link implementation details in `fe0/src/shared/api/client.ts` and auth helpers as these behaviors are codified.
## 10. `isLoading` vs `isFetching` (UI coupling)
**Pattern problem:** wiring **`isFetching`** from a **list query** into controls that are **conceptually independent** (export, filters, “new application”, pagination) causes bugs that **localhost hides** (fast requests → flicker too quick to see) and **cloud exposes** (slow polls keep `isFetching` true → controls look “stuck refreshing”).
**Rules of thumb:**
- **`isLoading`** (no cached data yet) is usually safe for gating skeletons or first-load UI.
- **`isFetching`** should **almost never** disable user-initiated actions; use a **subtle indicator** or **local** loading only for that action (e.g. export-only state).
**Action:** audit every consumer of `["applications", ...]` (and similar list keys) for `isFetching` / `isLoading`. Consider a lint rule or review checklist: *if a button is disabled on `isFetching`, require an inline justification.*
## 11. Query key stability
If `filters` is an **object literal created in render** (`{ status, page, q }`), its **reference changes every render**. TanStack Query may treat the key as new every time → **extra requests**, refetch on **keystrokes**, refetch on **unrelated state** updates.
**Mitigations:**
- **`useMemo`** for the filters object keyed by primitive fields, **or**
- **Prefer primitive keys:** `["applications", status, page, q, ...]`—verbose but **serializable** and easy to debug.
Encode the chosen rule in team TanStack Query conventions.
## 12. Centralize polling constants
Intervals such as `10s`, `30s`, `60s` scattered across files are hard to tune for load tests or incidents.
**Target module** (example):
```ts
// fe0/src/shared/config/polling.ts
export const POLL_INTERVALS = {
adminInbox: 10_000,
notificationsCount: 60_000,
notificationsList: 60_000,
adminOverview: 30_000,
aiHealth: 30_000,
} as const;
```
Optionally drive values from **env** later without touching every callsite.
## 13. Phased implementation order
Pragmatic sequencing when work must land incrementally (from stability review):
1. **First****One `App.tsx`**, explicit **`QueryClient.defaultOptions`**, CI/ESLint guard against the removed path (**§7**).
2. **Next****`isFetching` audit**; **visibility-aware** polling helper; replace admin inbox **`"always"`** with **`true` + `staleTime`** (**§8, §10**).
3. **Then****Centralize `POLL_INTERVALS`**; **document and implement** timeout / retry / auth behavior (**§9, §12**); verify **query key stability** (**§11**).
4. **Horizon****Unify admin + council** refresh: invalidation primary, **slow safety-net poll** (**§3**); **SSE** only if realtime becomes a product requirement.
## 14. Quick file map
| File | Role |
|------|------|
| `fe0/src/pages/Dashboard.tsx` | Role-based dashboard shell; wires admin inbox list. |
| `fe0/src/components/admin/ApprovedApplicationsList.tsx` | Admin `/api/applications` query; **10s** poll, focus **"always"** today—**targets in §8, §10, §11**. |
| `fe0/src/components/council/ApprovedApplicationsList.tsx` | Council list; invalidates on report sync—**unify with §3**. |
| `fe0/src/components/notifications/NotificationBell.tsx` | Unread count; **60s** polling. |
| `fe0/src/components/notifications/NotificationManager.tsx` | Notification list + invalidates unread count query. |
| `fe0/src/lib/userNotificationsApi.ts` | HTTP helper for unread count. |
| `fe0/src/shared/api/client.ts` | Axios instance; dev logging—**§6, §9**. |
| `fe0/src/App.tsx` | `QueryClientProvider` + router (**actual** entry today). |
| `fe0/src/app/App.tsx` | Alternate shell—**remove as part of §7**. |
## 15. Local machine vs cloud server (why behavior can *look* different)
**The admin inbox polling interval is not environment-specific** in code: `refetchInterval: 10s` runs the same in dev, local production builds, and cloud deploys. If the admin dashboard is open and focused, you should see the same *intent* (repeated `GET /api/applications`) everywhere.
What often *differs* is how **noticeable** that is.
### Higher latency on the cloud
On a remote host, each poll typically spends **longer** in flight. While a query is in progress, TanStack Query sets **`isFetching === true`** for that query.
- **Localhost**: UI tied to `isFetching` may **flicker too fast to see**.
- **Cloud**: the same coupling looks like a steady “refresh” problem (**§10**).
Stabilizing export used **export-only loading state** so the button does not follow list refetch; slow networks still poll the same, but the control stays calm.
### Dev vs production logging
- **Local (`vite dev`)**: success logs per response can make the **console look very busy**—often **logging**, not extra requests vs prod with the same code paths.
- **Cloud (typical production build)**: those success logs are off; use **Network** in DevTools to see polling.
### Deployment or asset skew
If the server serves an **older bundle** (cached `index.html`/`assets`, wrong image, or different branch), behavior can diverge from your laptop until deploys and caches align.
### Tab visibility and throttling
Browsers may **throttle timers** for background tabs. Testing with the dashboard tab **in the background** locally can make polls appear rarer than when the tab is **focused**. **Visibility-aware polling (§8)** makes behavior match operator expectations and reduces waste.
### How to verify locally
Open the **admin inbox**, keep the tab **focused**, wait **1520 seconds**, and watch **Network** for repeating `GET /api/applications` (same pattern as cloud).
---
## What to preserve from the current design
- **`placeholderData: (previous) => previous`** to limit table flicker.
- **Invalidating `notifications-unread-count` after mutations** rather than waiting for the next poll.
- **A single shared `apiClient`**—work above layers policy on top of it, not a replacement.
- **Documenting local-vs-cloud differences** (latency, logging, `isFetching`) as institutional knowledge.
---
*Update this doc when `refetchInterval` / focus policies change, `App` entrypoints are consolidated, or admin/council refresh strategies are unified.*