A practical deep dive into the RAG architecture, retrieval strategies, and streaming UX behind Clarity’s “earnings intelligence” answers.
Clarity is designed to behave like a disciplined analyst: it retrieves sources first, then answers using only retrieved evidence, and streams progress in real-time.
Clarity’s core idea is simple: route questions to the right evidence source, keep tool usage bounded, and stream progress so the system feels responsive and debuggable.
User → /api/chat/stream (SSE) → validate input → infer intent + ticker/timeframe → run tools (financial JSON + transcript retrieval) → compile grounded context → generate answer (stream tokens) → emit metrics (TTFT, total time, tool timings, retrieval stats)
Clarity optimizes for trust. The prompt and tool layer enforce a simple constraint: if a number or factual claim isn’t present in tool output, it shouldn’t be stated. Practically, this means you’ll sometimes see “Not found in provided sources”—and that’s intentional.
What “good” looks like: every claim is anchored to retrieved context or a tool result; numeric values come from structured tools (no guessing); and sources/tool traces are visible in the UI.
Why you might see “Not found”: the requested quarter/year isn’t present in the dataset, the question is underspecified (missing ticker or timeframe), or the metric doesn’t exist in the structured JSON.
How to ask for numbers (best practice): include a ticker and timeframe (example: “AAPL latest quarter revenue and gross margin”). For strategy questions, include a focus area (example: “NVDA Blackwell demand + supply constraints”).
Clarity answers questions from two complementary sources. This separation is intentional: financial answers often require exact numbers, while strategy answers require grounded narrative context.
Use this lane for exact metrics (revenue, EPS, margins, segment figures). Deterministic retrieval reduces extraction errors and hallucination pressure.
data/financials/get_financial_metrics, get_multi_quarter_metricsUse this lane for strategy, guidance, risks, and product positioning. Retrieval is chunk-based and metadata-filtered to keep the context grounded and auditable.
data/transcripts/search_earnings_transcriptDifferent companies have different fiscal calendars, so “FY2025” doesn’t always line up across tickers. For “latest/recent/current” questions, Clarity prefers fetching the most recent available quarter per ticker.
How to ask for clean comparisons: if you want the latest quarter per company, say “latest” explicitly. If you want apples-to-apples, specify an exact fiscal quarter/year (or constrain the analysis window) so the system doesn’t silently compare mismatched periods.
Clarity combines dense vectors (semantic similarity) with sparse vectors (keyword matching). This catches both conceptual matches (“AI strategy”) and exact terms (“Q3 FY2025”, “gross margin”, “Blackwell”).
Dense vectors: capture semantic meaning and work well for thematic questions, but they’re weak at exact quarter/term matching.
Sparse vectors (BM25-style): catch exact tokens (tickers, quarters, product names) and improve precision for metrics and dates. They complement dense retrieval rather than replacing it.
// BM25-style sparse vector generation (conceptual)
const boostTerms = {
revenue: 1.5, margin: 1.5, growth: 1.5,
earnings: 1.5, guidance: 1.5, outlook: 1.5
};
function vectorize(text) {
const tokens = tokenize(text);
const tf = computeTermFrequency(tokens);
for (const [term, boost] of Object.entries(boostTerms)) {
if (tf[term]) tf[term] *= boost;
}
return { indices, values };
}The model doesn’t “browse the whole dataset.” Instead it calls a small set of tools. Tool selection is driven by intent: numeric questions pull structured metrics; narrative questions search transcripts.
| Query type | Typical tools | Evidence source |
|---|---|---|
| Exact metric lookup | get_financial_metrics | Structured financial JSON |
| Trends (“last 4 quarters”) | get_multi_quarter_metrics | Structured financial JSON |
| Strategy / guidance / narrative | search_earnings_transcript | Transcript chunks (vector search) |
| Growth deltas (YoY/QoQ) | compute_growth_rate | Computed from structured metrics |
Clarity uses Server-Sent Events (SSE) so the UI can show progress and stream tokens as they’re generated. This reduces “dead time” and makes the pipeline observable.
metadataRequest ID + dataset freshnessstatusHuman-readable progress (“Analyzing…”)tool_startA tool began runningtool_resultTool output + latencycontentText tokens/chunksmetricsTTFT, total time, retrieval statsendStream completedata: {"type":"status","message":"Analyzing your question..."}
data: {"type":"tool_start","tool":"get_multi_quarter_metrics","id":"auto-financials"}
data: {"type":"tool_result","tool":"get_multi_quarter_metrics","success":true,"latencyMs":312}
data: {"type":"content","content":"Here’s the trend..."}
data: {"type":"metrics","metrics":{"timeToFirstTokenMs":16500,"totalTimeMs":20900}}
data: {"type":"end"}Clarity leans on production patterns that keep responses predictable: validated inputs, bounded tool usage, and explicit failure states (including transparent “not found” answers).
Guardrails:
Troubleshooting tips:
These are the concrete performance improvements documented during development (see the project’s improvement notes).
| Metric | Before | After | Change |
|---|---|---|---|
| Time to first token (TTFT) | 22.8s | 16.5s | 28% faster |
| Total response time | 30.0s | 20.9s | 30% faster |
| Retrieval time | 1716ms | 985ms | ~42% faster |