Files
sciagent/docs/fe0-dashboard-data-refresh-architecture.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

17 KiB
Raw Blame History

fe0: Dashboard data refresh and API polling

This document explains why the browser repeatedly calls /api/applications and /api/notifications/unread-count on the dashboard, how that fits the React + TanStack Query + Axios stack, and design tradeoffs for tuning behavior.

It also encodes a stabilization plan for frontend, backend pressure, and predictable data loading—refined from a stability review (assets/docs/feedback.md) focused on removing implicit globals, polite polling, and consistent refresh semantics.

1. High-level flow

flowchart LR
  subgraph ui [Dashboard UI]
    D[Dashboard.tsx]
    AAL[Admin ApprovedApplicationsList]
    CAL[Council ApprovedApplicationsList]
    NB[NotificationBell]
  end
  subgraph rq [TanStack Query]
    QApps["useQuery applications"]
    QNotif["useQuery unread count"]
  end
  subgraph http [HTTP]
    AC[apiClient axios]
    BE["Backend APIs"]
  end
  D -->|admin role| AAL
  D -->|editor role| CAL
  D -.-> NB
  AAL --> QApps
  NB --> QNotif
  QApps --> AC
  QNotif --> AC
  AC --> BE
  • Dashboard.tsx chooses which shell to render by role: admin sees the admin applications list (inbox), editor (council) sees a different list implementation, applicant sees the registration workspace (no inbox polling for applications in the same way).
  • apiClient (fe0/src/shared/api/client.ts) is the shared Axios instance used by queries and mutations.
  • TanStack Query caches by queryKey, runs queryFn on mount, and can refetch on an interval or when the window regains focus, depending on per-query options and explicit QueryClient.defaultOptions (see §7).

2. What triggers repeated /api/applications (admin inbox)

The route /dashboard for users with the admin role renders:

fe0/src/pages/Dashboard.tsxAdminApprovedApplicationsList with lifecycle="inbox".

The list loads data with useQuery in fe0/src/components/admin/ApprovedApplicationsList.tsx.

Current behavior (as implemented today)

Option Value Effect
queryKey ["applications", filters] Separate cache per filter set; must be a stable key—see §11.
refetchInterval 10 * 1000 (10 seconds) Automatic polling while mounted. Target: visibility-aware + optional jitter (§8, §12).
refetchOnWindowFocus "always" (today) Refetch on every focus regardless of staleness—high load; target is true + sensible staleTime (§8).
refetchOnReconnect true Refetch when the browser regains network after offline.
placeholderData (previous) => previous Keeps showing the last page while a refetch runs (less table flicker). Keep this.

So the “every few seconds” pattern you see in DevTools is intentional polling, not a runaway bug—but the combination of 10s polling + "always" focus multiplies traffic when admins tab frequently (§8).

Same component, other lifecycles

ApprovedApplicationsList is also reused for the decided list (lifecycle="decided") from DecidedApplicationsPanel. The same refetchInterval: 10s applies there as well—polling is tied to the component, not only the inbox title.

3. Council dashboard: different refresh strategy (target: unify)

Editors (hasRole("editor")) get CouncilApprovedApplicationsList (fe0/src/components/council/ApprovedApplicationsList.tsx).

That files applicationsQuery does not set refetchInterval today. Updates are driven more by:

  • Normal Query behavior (mount, default focus rules, etc.).
  • reportSyncQuery: when its dataUpdatedAt changes, an effect runs queryClient.invalidateQueries({ queryKey: ["applications"] }), which pulls a fresh /api/applications without a fixed timer.

Problem: admin (time-based polling) and council (event-driven invalidation) are two mental models for similar surfaces, in different files—cognitive load, bug asymmetry, and drift (fixes in one place may not land in the other).

Target architecture (single strategy everywhere):

  1. Primary: invalidation on mutations (approve, reject, submit, assign, etc.) plus invalidation on lightweight report sync / version signals where applicable.
  2. Secondary: a slow safety-net poll (e.g. 60120s, visibility-aware, optionally jittered) so a missed invalidation does not leave the UI stale forever.
  3. Later (product-driven only): SSE behind apiClient if true realtime is required—one long-lived connection per tab scales better than many short polls; WebSockets only if the server must push high-frequency updates.

Until unified, treat both admin and council lists as in scope for isFetching audits and query-key stability (§10, §11).

4. Notifications unread count

fe0/src/components/notifications/NotificationBell.tsx:

  • queryKey: ["notifications-unread-count"]
  • queryFn: fetchNotificationsUnreadCountGET /api/notifications/unread-count
  • refetchInterval: 60_000 ms (once per minute)
  • refetchOnWindowFocus: true
  • staleTime: 30_000 ms

NotificationManager.tsx uses a similar 60s interval for the list and calls queryClient.invalidateQueries({ queryKey: ["notifications-unread-count"] }) after mutations so the bell can update sooner than the next minute tick. This invalidation pattern is the model for other features (§3).

5. Other polling in the admin area

These are separate from the inbox but follow the same idea (“keep dashboards somewhat fresh”):

Location Interval Purpose
OverviewTab.tsx 30s Health/status style data
AIManagementTab.tsx 30s AI service health
NotificationBell / NotificationManager 60s Notifications

Target: centralize intervals in one module (e.g. fe0/src/shared/config/polling.ts) so ops and load tests can tune without hunting magic numbers across files (§12).

6. client.ts dev logging (stability and privacy)

In fe0/src/shared/api/client.ts, the Axios response interceptor logs successful responses when import.meta.env.DEV is true.

Risks: full data payloads on large lists flood the console; a misconfigured deploy that runs “dev-like” builds could leak user data to browser consoles.

Target:

  1. Sample or summarize responses in dev; prefer console.debug over console.log for high-volume paths so DevTools defaults stay readable.
  2. Guard production with a build-time assertion (or strict env contract), not import.meta.env.DEV alone.

7. QueryClient defaults (critical: one entrypoint, explicit defaults)

Having two App.tsx files with different QueryClient configuration is a silent global switch: a refactor, import cleanup, or rebase can change refetch behavior app-wide without touching feature code—and per-query refetchInterval would still “look” correct in review.

Required actions (do first):

  1. Pick one entrypoint. Remove the duplicate in the same change set; do not leave a long-lived TODO.
  2. Prevent regression: CI or ESLint no-restricted-imports forbidding the removed path if it could be revived.
  3. Set explicit defaultOptions on the surviving QueryClient, even when values match library defaults—implicit defaults are a major-version upgrade hazard for TanStack Query.

Illustrative shape (adjust staleTime / gcTime / retry helpers to match product decisions):

const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      refetchOnWindowFocus: false,
      refetchOnReconnect: true,
      staleTime: 30_000,
      gcTime: 5 * 60_000,
      retry: (failureCount, err) => {
        if (isAuthError(err)) return false;
        return failureCount < 2;
      },
      retryDelay: (attempt) => Math.min(1000 * 2 ** attempt, 8000),
    },
    mutations: { retry: false },
  },
});

Then refetchInterval, refetchOnWindowFocus: true, and other overrides become deliberate opt-ins at the query level.

Todays split (legacy): fe0/src/main.tsx imports fe0/src/App.tsx (new QueryClient() with no defaults). fe0/src/app/App.tsx uses different defaults and is not wired from main.tsx until consolidated.

8. Polling and focus: polite defaults (frontend + backend load)

Why "always" on focus is wrong for an inbox

refetchOnWindowFocus: "always" refetches on every focus event regardless of staleness. With 10s polling, admins who tab in and out can drive 1220+ requests/minute per tab; many admins at start of shift create synchronized bursts the backend cannot absorb gracefully.

Target: use true (refetch only when stale) with a sensible staleTime for that query. Approvals inboxes are not trading screens; the UX difference is negligible; server load is not.

Visibility-aware polling (default pattern, not optional)

Background tabs still run timers (throttling varies). Dashboards left open all day waste work that scales with headcount.

Default for every polling query: pause when the document is hidden.

function useVisibilityAwareInterval(ms: number) {
  return () => (document.visibilityState === "visible" ? ms : false);
}

Use the function form of refetchInterval supported by TanStack Query so engineers do not re-implement this ad hoc.

Jitter (optional smoothing)

Fixed intervals from mount align across users (start of shift). ±1020% jitter on poll delays spreads load with negligible UX impact—worth adopting once concurrent admin count grows.

9. HTTP, timeouts, retries, and auth (document + implement gaps)

The happy path is documented elsewhere; stability requires explicit policy—even when nothing fails in tests.

Concern Risk if ignored Target
No Axios timeout Hung requests pile up; 10s polling stacks in-flight work; per-host concurrency pins the tab; UI looks frozen. Set explicit timeouts on apiClient (or per-route overrides for long operations).
Default Query retries TanStack Query retries 3× by default; a bad poll tick can amplify load during an outage (4 quick failures per cycle). Align retry / retryDelay with defaultOptions (§7); cap retries on read-heavy queries.
401 / 403 Silent loops: auth expired → poll → 401 → retry → poll again; “dashboard broken” reports. Never retry auth failures; interceptor should logout / redirect / refresh in one documented path—no infinite poll on unauthenticated sessions.
Offline refetchOnReconnect: true helps, but users may see blank data and assume loss. Surface offline / reconnect in UI where lists are empty or stale.

Add or link implementation details in fe0/src/shared/api/client.ts and auth helpers as these behaviors are codified.

10. isLoading vs isFetching (UI coupling)

Pattern problem: wiring isFetching from a list query into controls that are conceptually independent (export, filters, “new application”, pagination) causes bugs that localhost hides (fast requests → flicker too quick to see) and cloud exposes (slow polls keep isFetching true → controls look “stuck refreshing”).

Rules of thumb:

  • isLoading (no cached data yet) is usually safe for gating skeletons or first-load UI.
  • isFetching should almost never disable user-initiated actions; use a subtle indicator or local loading only for that action (e.g. export-only state).

Action: audit every consumer of ["applications", ...] (and similar list keys) for isFetching / isLoading. Consider a lint rule or review checklist: if a button is disabled on isFetching, require an inline justification.

11. Query key stability

If filters is an object literal created in render ({ status, page, q }), its reference changes every render. TanStack Query may treat the key as new every time → extra requests, refetch on keystrokes, refetch on unrelated state updates.

Mitigations:

  • useMemo for the filters object keyed by primitive fields, or
  • Prefer primitive keys: ["applications", status, page, q, ...]—verbose but serializable and easy to debug.

Encode the chosen rule in team TanStack Query conventions.

12. Centralize polling constants

Intervals such as 10s, 30s, 60s scattered across files are hard to tune for load tests or incidents.

Target module (example):

// fe0/src/shared/config/polling.ts
export const POLL_INTERVALS = {
  adminInbox: 10_000,
  notificationsCount: 60_000,
  notificationsList: 60_000,
  adminOverview: 30_000,
  aiHealth: 30_000,
} as const;

Optionally drive values from env later without touching every callsite.

13. Phased implementation order

Pragmatic sequencing when work must land incrementally (from stability review):

  1. FirstOne App.tsx, explicit QueryClient.defaultOptions, CI/ESLint guard against the removed path (§7).
  2. NextisFetching audit; visibility-aware polling helper; replace admin inbox "always" with true + staleTime (§8, §10).
  3. ThenCentralize POLL_INTERVALS; document and implement timeout / retry / auth behavior (§9, §12); verify query key stability (§11).
  4. HorizonUnify admin + council refresh: invalidation primary, slow safety-net poll (§3); SSE only if realtime becomes a product requirement.

14. Quick file map

File Role
fe0/src/pages/Dashboard.tsx Role-based dashboard shell; wires admin inbox list.
fe0/src/components/admin/ApprovedApplicationsList.tsx Admin /api/applications query; 10s poll, focus "always" today—targets in §8, §10, §11.
fe0/src/components/council/ApprovedApplicationsList.tsx Council list; invalidates on report sync—unify with §3.
fe0/src/components/notifications/NotificationBell.tsx Unread count; 60s polling.
fe0/src/components/notifications/NotificationManager.tsx Notification list + invalidates unread count query.
fe0/src/lib/userNotificationsApi.ts HTTP helper for unread count.
fe0/src/shared/api/client.ts Axios instance; dev logging—§6, §9.
fe0/src/App.tsx QueryClientProvider + router (actual entry today).
fe0/src/app/App.tsx Alternate shell—remove as part of §7.

15. Local machine vs cloud server (why behavior can look different)

The admin inbox polling interval is not environment-specific in code: refetchInterval: 10s runs the same in dev, local production builds, and cloud deploys. If the admin dashboard is open and focused, you should see the same intent (repeated GET /api/applications) everywhere.

What often differs is how noticeable that is.

Higher latency on the cloud

On a remote host, each poll typically spends longer in flight. While a query is in progress, TanStack Query sets isFetching === true for that query.

  • Localhost: UI tied to isFetching may flicker too fast to see.
  • Cloud: the same coupling looks like a steady “refresh” problem (§10).

Stabilizing export used export-only loading state so the button does not follow list refetch; slow networks still poll the same, but the control stays calm.

Dev vs production logging

  • Local (vite dev): success logs per response can make the console look very busy—often logging, not extra requests vs prod with the same code paths.
  • Cloud (typical production build): those success logs are off; use Network in DevTools to see polling.

Deployment or asset skew

If the server serves an older bundle (cached index.html/assets, wrong image, or different branch), behavior can diverge from your laptop until deploys and caches align.

Tab visibility and throttling

Browsers may throttle timers for background tabs. Testing with the dashboard tab in the background locally can make polls appear rarer than when the tab is focused. Visibility-aware polling (§8) makes behavior match operator expectations and reduces waste.

How to verify locally

Open the admin inbox, keep the tab focused, wait 1520 seconds, and watch Network for repeating GET /api/applications (same pattern as cloud).


What to preserve from the current design

  • placeholderData: (previous) => previous to limit table flicker.
  • Invalidating notifications-unread-count after mutations rather than waiting for the next poll.
  • A single shared apiClient—work above layers policy on top of it, not a replacement.
  • Documenting local-vs-cloud differences (latency, logging, isFetching) as institutional knowledge.

Update this doc when refetchInterval / focus policies change, App entrypoints are consolidated, or admin/council refresh strategies are unified.