Files
sciagent/docs/fe0-dashboard-data-refresh-architecture.md
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

299 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# fe0: Dashboard data refresh and API polling
This document explains **why the browser repeatedly calls `/api/applications` and `/api/notifications/unread-count`** on the dashboard, how that fits the **React + TanStack Query + Axios** stack, and **design tradeoffs** for tuning behavior.
It also encodes a **stabilization plan** for frontend, backend pressure, and predictable data loading—refined from a stability review (`assets/docs/feedback.md`) focused on removing implicit globals, polite polling, and consistent refresh semantics.
## 1. High-level flow
```mermaid
flowchart LR
subgraph ui [Dashboard UI]
D[Dashboard.tsx]
AAL[Admin ApprovedApplicationsList]
CAL[Council ApprovedApplicationsList]
NB[NotificationBell]
end
subgraph rq [TanStack Query]
QApps["useQuery applications"]
QNotif["useQuery unread count"]
end
subgraph http [HTTP]
AC[apiClient axios]
BE["Backend APIs"]
end
D -->|admin role| AAL
D -->|editor role| CAL
D -.-> NB
AAL --> QApps
NB --> QNotif
QApps --> AC
QNotif --> AC
AC --> BE
```
- **`Dashboard.tsx`** chooses which shell to render by role: **admin** sees the admin applications list (inbox), **editor** (council) sees a different list implementation, **applicant** sees the registration workspace (no inbox polling for applications in the same way).
- **`apiClient`** (`fe0/src/shared/api/client.ts`) is the shared Axios instance used by queries and mutations.
- **TanStack Query** caches by `queryKey`, runs `queryFn` on mount, and can **refetch on an interval** or **when the window regains focus**, depending on per-query options and **explicit** `QueryClient.defaultOptions` (see §7).
## 2. What triggers repeated `/api/applications` (admin inbox)
The route **`/dashboard`** for users with the **admin** role renders:
`fe0/src/pages/Dashboard.tsx``AdminApprovedApplicationsList` with `lifecycle="inbox"`.
The list loads data with `useQuery` in `fe0/src/components/admin/ApprovedApplicationsList.tsx`.
### Current behavior (as implemented today)
| Option | Value | Effect |
|--------|--------|--------|
| `queryKey` | `["applications", filters]` | Separate cache per filter set; **must** be a stable key—see §11. |
| `refetchInterval` | `10 * 1000` (10 seconds) | **Automatic polling** while mounted. **Target:** visibility-aware + optional jitter (§8, §12). |
| `refetchOnWindowFocus` | `"always"` (today) | Refetch on every focus regardless of staleness—**high load**; **target** is `true` + sensible `staleTime` (§8). |
| `refetchOnReconnect` | `true` | Refetch when the browser regains network after offline. |
| `placeholderData` | `(previous) => previous` | Keeps showing the last page while a refetch runs (less table flicker). **Keep this.** |
So the “every few seconds” pattern you see in DevTools is **intentional polling**, not a runaway bug—but the combination of **10s polling + `"always"` focus** multiplies traffic when admins tab frequently (§8).
### Same component, other lifecycles
`ApprovedApplicationsList` is also reused for the **decided** list (`lifecycle="decided"`) from `DecidedApplicationsPanel`. The **same** `refetchInterval: 10s` applies there as well—polling is tied to the component, not only the inbox title.
## 3. Council dashboard: different refresh strategy (target: unify)
**Editors** (`hasRole("editor")`) get `CouncilApprovedApplicationsList` (`fe0/src/components/council/ApprovedApplicationsList.tsx`).
That files `applicationsQuery` **does not set `refetchInterval`** today. Updates are driven more by:
- Normal Query behavior (mount, default focus rules, etc.).
- **`reportSyncQuery`**: when its `dataUpdatedAt` changes, an effect runs `queryClient.invalidateQueries({ queryKey: ["applications"] })`, which pulls a fresh `/api/applications` without a fixed timer.
**Problem:** admin (time-based polling) and council (event-driven invalidation) are two mental models for similar surfaces, in different files—cognitive load, bug asymmetry, and drift (fixes in one place may not land in the other).
**Target architecture (single strategy everywhere):**
1. **Primary:** invalidation on mutations (`approve`, `reject`, `submit`, `assign`, etc.) plus invalidation on lightweight **report sync** / version signals where applicable.
2. **Secondary:** a **slow safety-net poll** (e.g. **60120s**, visibility-aware, optionally jittered) so a missed invalidation does not leave the UI stale forever.
3. **Later (product-driven only):** **SSE** behind `apiClient` if true realtime is required—one long-lived connection per tab scales better than many short polls; WebSockets only if the server must push high-frequency updates.
Until unified, treat **both** admin and council lists as in scope for **`isFetching` audits** and query-key stability (§10, §11).
## 4. Notifications unread count
`fe0/src/components/notifications/NotificationBell.tsx`:
- `queryKey`: `["notifications-unread-count"]`
- `queryFn`: `fetchNotificationsUnreadCount``GET /api/notifications/unread-count`
- `refetchInterval`: **60_000 ms** (once per minute)
- `refetchOnWindowFocus`: `true`
- `staleTime`: **30_000 ms**
`NotificationManager.tsx` uses a similar **60s** interval for the list and calls `queryClient.invalidateQueries({ queryKey: ["notifications-unread-count"] })` after mutations so the bell can update sooner than the next minute tick. **This invalidation pattern is the model** for other features (§3).
## 5. Other polling in the admin area
These are separate from the inbox but follow the same idea (“keep dashboards somewhat fresh”):
| Location | Interval | Purpose |
|----------|----------|----------|
| `OverviewTab.tsx` | 30s | Health/status style data |
| `AIManagementTab.tsx` | 30s | AI service health |
| `NotificationBell` / `NotificationManager` | 60s | Notifications |
**Target:** centralize intervals in one module (e.g. `fe0/src/shared/config/polling.ts`) so ops and load tests can tune without hunting magic numbers across files (§12).
## 6. `client.ts` dev logging (stability and privacy)
In `fe0/src/shared/api/client.ts`, the Axios **response interceptor** logs successful responses when `import.meta.env.DEV` is true.
**Risks:** full `data` payloads on large lists **flood the console**; a misconfigured deploy that runs “dev-like” builds could **leak user data** to browser consoles.
**Target:**
1. **Sample or summarize** responses in dev; prefer `console.debug` over `console.log` for high-volume paths so DevTools defaults stay readable.
2. **Guard production** with a build-time assertion (or strict env contract), not `import.meta.env.DEV` alone.
## 7. QueryClient defaults (critical: one entrypoint, explicit defaults)
Having **two** `App.tsx` files with **different** `QueryClient` configuration is a **silent global switch**: a refactor, import cleanup, or rebase can change refetch behavior app-wide without touching feature code—and per-query `refetchInterval` would still “look” correct in review.
**Required actions (do first):**
1. **Pick one entrypoint.** Remove the duplicate in the **same** change set; do not leave a long-lived TODO.
2. **Prevent regression:** CI or ESLint `no-restricted-imports` forbidding the removed path if it could be revived.
3. **Set explicit `defaultOptions`** on the surviving `QueryClient`, even when values match library defaults—**implicit defaults are a major-version upgrade hazard** for TanStack Query.
Illustrative shape (adjust `staleTime` / `gcTime` / retry helpers to match product decisions):
```ts
const queryClient = new QueryClient({
defaultOptions: {
queries: {
refetchOnWindowFocus: false,
refetchOnReconnect: true,
staleTime: 30_000,
gcTime: 5 * 60_000,
retry: (failureCount, err) => {
if (isAuthError(err)) return false;
return failureCount < 2;
},
retryDelay: (attempt) => Math.min(1000 * 2 ** attempt, 8000),
},
mutations: { retry: false },
},
});
```
Then **`refetchInterval`**, `refetchOnWindowFocus: true`, and other overrides become **deliberate opt-ins** at the query level.
**Todays split (legacy):** `fe0/src/main.tsx` imports `fe0/src/App.tsx` (`new QueryClient()` with no defaults). `fe0/src/app/App.tsx` uses different defaults and is **not** wired from `main.tsx` until consolidated.
## 8. Polling and focus: polite defaults (frontend + backend load)
### Why `"always"` on focus is wrong for an inbox
`refetchOnWindowFocus: "always"` refetches **on every focus event regardless of staleness**. With **10s** polling, admins who tab in and out can drive **1220+ requests/minute per tab**; many admins at start of shift create **synchronized bursts** the backend cannot absorb gracefully.
**Target:** use **`true`** (refetch only when stale) with a **sensible `staleTime`** for that query. Approvals inboxes are not trading screens; the UX difference is negligible; server load is not.
### Visibility-aware polling (default pattern, not optional)
Background tabs still run timers (throttling varies). Dashboards left open all day waste work that scales with headcount.
**Default for every polling query:** pause when the document is hidden.
```ts
function useVisibilityAwareInterval(ms: number) {
return () => (document.visibilityState === "visible" ? ms : false);
}
```
Use the function form of `refetchInterval` supported by TanStack Query so engineers do not re-implement this ad hoc.
### Jitter (optional smoothing)
Fixed intervals from **mount** align across users (start of shift). **±1020% jitter** on poll delays spreads load with negligible UX impact—worth adopting once concurrent admin count grows.
## 9. HTTP, timeouts, retries, and auth (document + implement gaps)
The happy path is documented elsewhere; **stability** requires explicit policy—**even when nothing fails in tests**.
| Concern | Risk if ignored | Target |
|--------|------------------|--------|
| **No Axios timeout** | Hung requests pile up; **10s polling** stacks in-flight work; **per-host concurrency** pins the tab; UI looks frozen. | Set **explicit timeouts** on `apiClient` (or per-route overrides for long operations). |
| **Default Query retries** | TanStack Query retries **3×** by default; a bad poll tick can **amplify load** during an outage (4 quick failures per cycle). | Align `retry` / `retryDelay` with **`defaultOptions`** (§7); cap retries on read-heavy queries. |
| **401 / 403** | **Silent loops:** auth expired → poll → 401 → retry → poll again; “dashboard broken” reports. | **Never retry** auth failures; interceptor should **logout / redirect / refresh** in one documented path—**no** infinite poll on unauthenticated sessions. |
| **Offline** | `refetchOnReconnect: true` helps, but users may see **blank** data and assume loss. | **Surface offline / reconnect** in UI where lists are empty or stale. |
Add or link implementation details in `fe0/src/shared/api/client.ts` and auth helpers as these behaviors are codified.
## 10. `isLoading` vs `isFetching` (UI coupling)
**Pattern problem:** wiring **`isFetching`** from a **list query** into controls that are **conceptually independent** (export, filters, “new application”, pagination) causes bugs that **localhost hides** (fast requests → flicker too quick to see) and **cloud exposes** (slow polls keep `isFetching` true → controls look “stuck refreshing”).
**Rules of thumb:**
- **`isLoading`** (no cached data yet) is usually safe for gating skeletons or first-load UI.
- **`isFetching`** should **almost never** disable user-initiated actions; use a **subtle indicator** or **local** loading only for that action (e.g. export-only state).
**Action:** audit every consumer of `["applications", ...]` (and similar list keys) for `isFetching` / `isLoading`. Consider a lint rule or review checklist: *if a button is disabled on `isFetching`, require an inline justification.*
## 11. Query key stability
If `filters` is an **object literal created in render** (`{ status, page, q }`), its **reference changes every render**. TanStack Query may treat the key as new every time → **extra requests**, refetch on **keystrokes**, refetch on **unrelated state** updates.
**Mitigations:**
- **`useMemo`** for the filters object keyed by primitive fields, **or**
- **Prefer primitive keys:** `["applications", status, page, q, ...]`—verbose but **serializable** and easy to debug.
Encode the chosen rule in team TanStack Query conventions.
## 12. Centralize polling constants
Intervals such as `10s`, `30s`, `60s` scattered across files are hard to tune for load tests or incidents.
**Target module** (example):
```ts
// fe0/src/shared/config/polling.ts
export const POLL_INTERVALS = {
adminInbox: 10_000,
notificationsCount: 60_000,
notificationsList: 60_000,
adminOverview: 30_000,
aiHealth: 30_000,
} as const;
```
Optionally drive values from **env** later without touching every callsite.
## 13. Phased implementation order
Pragmatic sequencing when work must land incrementally (from stability review):
1. **First****One `App.tsx`**, explicit **`QueryClient.defaultOptions`**, CI/ESLint guard against the removed path (**§7**).
2. **Next****`isFetching` audit**; **visibility-aware** polling helper; replace admin inbox **`"always"`** with **`true` + `staleTime`** (**§8, §10**).
3. **Then****Centralize `POLL_INTERVALS`**; **document and implement** timeout / retry / auth behavior (**§9, §12**); verify **query key stability** (**§11**).
4. **Horizon****Unify admin + council** refresh: invalidation primary, **slow safety-net poll** (**§3**); **SSE** only if realtime becomes a product requirement.
## 14. Quick file map
| File | Role |
|------|------|
| `fe0/src/pages/Dashboard.tsx` | Role-based dashboard shell; wires admin inbox list. |
| `fe0/src/components/admin/ApprovedApplicationsList.tsx` | Admin `/api/applications` query; **10s** poll, focus **"always"** today—**targets in §8, §10, §11**. |
| `fe0/src/components/council/ApprovedApplicationsList.tsx` | Council list; invalidates on report sync—**unify with §3**. |
| `fe0/src/components/notifications/NotificationBell.tsx` | Unread count; **60s** polling. |
| `fe0/src/components/notifications/NotificationManager.tsx` | Notification list + invalidates unread count query. |
| `fe0/src/lib/userNotificationsApi.ts` | HTTP helper for unread count. |
| `fe0/src/shared/api/client.ts` | Axios instance; dev logging—**§6, §9**. |
| `fe0/src/App.tsx` | `QueryClientProvider` + router (**actual** entry today). |
| `fe0/src/app/App.tsx` | Alternate shell—**remove as part of §7**. |
## 15. Local machine vs cloud server (why behavior can *look* different)
**The admin inbox polling interval is not environment-specific** in code: `refetchInterval: 10s` runs the same in dev, local production builds, and cloud deploys. If the admin dashboard is open and focused, you should see the same *intent* (repeated `GET /api/applications`) everywhere.
What often *differs* is how **noticeable** that is.
### Higher latency on the cloud
On a remote host, each poll typically spends **longer** in flight. While a query is in progress, TanStack Query sets **`isFetching === true`** for that query.
- **Localhost**: UI tied to `isFetching` may **flicker too fast to see**.
- **Cloud**: the same coupling looks like a steady “refresh” problem (**§10**).
Stabilizing export used **export-only loading state** so the button does not follow list refetch; slow networks still poll the same, but the control stays calm.
### Dev vs production logging
- **Local (`vite dev`)**: success logs per response can make the **console look very busy**—often **logging**, not extra requests vs prod with the same code paths.
- **Cloud (typical production build)**: those success logs are off; use **Network** in DevTools to see polling.
### Deployment or asset skew
If the server serves an **older bundle** (cached `index.html`/`assets`, wrong image, or different branch), behavior can diverge from your laptop until deploys and caches align.
### Tab visibility and throttling
Browsers may **throttle timers** for background tabs. Testing with the dashboard tab **in the background** locally can make polls appear rarer than when the tab is **focused**. **Visibility-aware polling (§8)** makes behavior match operator expectations and reduces waste.
### How to verify locally
Open the **admin inbox**, keep the tab **focused**, wait **1520 seconds**, and watch **Network** for repeating `GET /api/applications` (same pattern as cloud).
---
## What to preserve from the current design
- **`placeholderData: (previous) => previous`** to limit table flicker.
- **Invalidating `notifications-unread-count` after mutations** rather than waiting for the next poll.
- **A single shared `apiClient`**—work above layers policy on top of it, not a replacement.
- **Documenting local-vs-cloud differences** (latency, logging, `isFetching`) as institutional knowledge.
---
*Update this doc when `refetchInterval` / focus policies change, `App` entrypoints are consolidated, or admin/council refresh strategies are unified.*