Clarity3.0

How Clarity 3.0 Works

A practical deep dive into the RAG architecture, retrieval strategies, and streaming UX behind Clarity’s “earnings intelligence” answers.

What happens when you ask a question?

Clarity is designed to behave like a disciplined analyst: it retrieves sources first, then answers using only retrieved evidence, and streams progress in real-time.

Runtime flow (high level)

Validate request (input schema + limits).
Detect likely ticker + intent (numbers vs narrative).
Retrieve evidence (tools: financial metrics + transcript search).
Assemble a grounded context block with citations.
Generate an answer and stream tokens to the UI.
Emit metrics (latencies, retrieval stats) at the end.

Important: Clarity intentionally limits tool usage per request to keep latency predictable and to avoid “runaway agent” behavior.

What you see in the UI

Status updates like “Analyzing…” / “Searching…” to reduce dead time.
Retrieved sources panel showing which chunks/metrics were used.
Streaming response (token-by-token) instead of waiting for a full block.
Pipeline metrics (e.g., time-to-first-token, tool latency, avg score).

The goal is simple: you should be able to answer “why did I get this answer?” by looking at sources and tool outputs, without guessing what happened inside the model.

Technical deep dive

System architecture (end-to-end)

Clarity’s core idea is simple: route questions to the right evidence source, keep tool usage bounded, and stream progress so the system feels responsive and debuggable.


User → /api/chat/stream (SSE)
  → validate input
  → infer intent + ticker/timeframe
  → run tools (financial JSON + transcript retrieval)
  → compile grounded context
  → generate answer (stream tokens)
  → emit metrics (TTFT, total time, tool timings, retrieval stats)

Grounding & citations: “no evidence, no claim”

Clarity optimizes for trust. The prompt and tool layer enforce a simple constraint: if a number or factual claim isn’t present in tool output, it shouldn’t be stated. Practically, this means you’ll sometimes see “Not found in provided sources”—and that’s intentional.

What “good” looks like: every claim is anchored to retrieved context or a tool result; numeric values come from structured tools (no guessing); and sources/tool traces are visible in the UI.

Why you might see “Not found”: the requested quarter/year isn’t present in the dataset, the question is underspecified (missing ticker or timeframe), or the metric doesn’t exist in the structured JSON.

How to ask for numbers (best practice): include a ticker and timeframe (example: “AAPL latest quarter revenue and gross margin”). For strategy questions, include a focus area (example: “NVDA Blackwell demand + supply constraints”).

Evidence sources: structured numbers + narrative transcripts

Clarity answers questions from two complementary sources. This separation is intentional: financial answers often require exact numbers, while strategy answers require grounded narrative context.

Structured financial JSON (numbers)

Use this lane for exact metrics (revenue, EPS, margins, segment figures). Deterministic retrieval reduces extraction errors and hallucination pressure.

Source: data/financials/
Tools: get_financial_metrics, get_multi_quarter_metrics
“Latest” handling: resolves most recent available quarter per ticker for up-to-date answers.

Embedded earnings transcripts (narrative)

Use this lane for strategy, guidance, risks, and product positioning. Retrieval is chunk-based and metadata-filtered to keep the context grounded and auditable.

Source: data/transcripts/
Tool: search_earnings_transcript
Retrieval: hybrid search (dense + sparse) to capture both meaning and exact terms.

Fiscal-year handling (and why “latest” is tricky)

Different companies have different fiscal calendars, so “FY2025” doesn’t always line up across tickers. For “latest/recent/current” questions, Clarity prefers fetching the most recent available quarter per ticker.

How to ask for clean comparisons: if you want the latest quarter per company, say “latest” explicitly. If you want apples-to-apples, specify an exact fiscal quarter/year (or constrain the analysis window) so the system doesn’t silently compare mismatched periods.

Retrieval strategy: hybrid (dense + sparse)

Clarity combines dense vectors (semantic similarity) with sparse vectors (keyword matching). This catches both conceptual matches (“AI strategy”) and exact terms (“Q3 FY2025”, “gross margin”, “Blackwell”).

Dense vectors: capture semantic meaning and work well for thematic questions, but they’re weak at exact quarter/term matching.

Sparse vectors (BM25-style): catch exact tokens (tickers, quarters, product names) and improve precision for metrics and dates. They complement dense retrieval rather than replacing it.

sparseVectorizer.js (conceptual)JavaScript

// BM25-style sparse vector generation (conceptual)
const boostTerms = {
  revenue: 1.5, margin: 1.5, growth: 1.5,
  earnings: 1.5, guidance: 1.5, outlook: 1.5
};

function vectorize(text) {
  const tokens = tokenize(text);
  const tf = computeTermFrequency(tokens);
  for (const [term, boost] of Object.entries(boostTerms)) {
    if (tf[term]) tf[term] *= boost;
  }
  return { indices, values };
}

Tool orchestration (bounded, evidence-first)

The model doesn’t “browse the whole dataset.” Instead it calls a small set of tools. Tool selection is driven by intent: numeric questions pull structured metrics; narrative questions search transcripts.

Query type	Typical tools	Evidence source
Exact metric lookup	get_financial_metrics	Structured financial JSON
Trends (“last 4 quarters”)	get_multi_quarter_metrics	Structured financial JSON
Strategy / guidance / narrative	search_earnings_transcript	Transcript chunks (vector search)
Growth deltas (YoY/QoQ)	compute_growth_rate	Computed from structured metrics

Streaming UX (Server-Sent Events)

Clarity uses Server-Sent Events (SSE) so the UI can show progress and stream tokens as they’re generated. This reduces “dead time” and makes the pipeline observable.

Event types (high-level)

metadataRequest ID + dataset freshness

statusHuman-readable progress (“Analyzing…”)

tool_startA tool began running

tool_resultTool output + latency

contentText tokens/chunks

metricsTTFT, total time, retrieval stats

endStream complete

SSE format (example)plaintext

data: {"type":"status","message":"Analyzing your question..."}

data: {"type":"tool_start","tool":"get_multi_quarter_metrics","id":"auto-financials"}

data: {"type":"tool_result","tool":"get_multi_quarter_metrics","success":true,"latencyMs":312}

data: {"type":"content","content":"Here’s the trend..."}

data: {"type":"metrics","metrics":{"timeToFirstTokenMs":16500,"totalTimeMs":20900}}

data: {"type":"end"}

Reliability & operational guardrails

Clarity leans on production patterns that keep responses predictable: validated inputs, bounded tool usage, and explicit failure states (including transparent “not found” answers).

Guardrails:

Input validation and size limits
Tool loop bounds (prevents runaway agent loops)
Grounding rules that prefer refusal over hallucination

Troubleshooting tips:

If an answer is thin, add ticker + quarter/year
If you want metrics, name the metric explicitly (“gross margin”, “EPS”)
If a provider is overloaded, retry (or the system may fall back to data-only answers)

Measured performance

These are the concrete performance improvements documented during development (see the project’s improvement notes).

Metric	Before	After	Change
Time to first token (TTFT)	22.8s	16.5s	28% faster
Total response time	30.0s	20.9s	30% faster
Retrieval time	1716ms	985ms	~42% faster