How it works - Hybrid retrieval, source-attributed, no summarising layer

Dense plus sparse plus re-ranked. Every chunk is returned with its source document, page, and snippet, so the AI client — and the human reading along — can always verify the answer.

Topic
Agentic retrieval
Pipeline
Dense + sparse + re-rank
Output
Source-attributed chunks

Three stages, in order

Hybrid retrieval is not a marketing phrase. It’s three concrete stages, each doing a job the others can’t.

Dense retrieval uses vector embeddings to find chunks whose meaning  matches the query. It handles paraphrase, synonyms, and the case where the user has no idea what the document calls the thing they’re asking about.

Sparse retrievalis the classic keyword search you already know — it finds chunks whose exact words  match the query. It catches the things dense retrieval misses: identifiers, product codes, legal citations, anything that’s precise rather than meaningful.

A re-ranker  takes the combined candidate set from both stages and decides which of those chunks actually answer the question. This is where most “RAG demos” fall over — they skip the re-rank, ship the top-k vector results, and the answer quality collapses on anything more nuanced than a paraphrase.

Evidence, not narrative

There is no summarising layer in the middle of Laminae. When the MCP client asks for relevant context, what comes back is the chunks themselves — each one stamped with its source document, page, and snippet.

That is intentional. The moment retrieval starts paraphrasing the source, provenance starts dying quietly. Source attribution either lines up to a specific span of text in a specific document or it doesn’t mean anything — and once an LLM has summarised it, it doesn’t.

What this enables

Because every chunk that reaches the AI client carries its provenance, the client can cite the source in its answer, the user can verify against the original document, and your legal team can audit the trail months later. None of that is possible with a black-box “ask my data” system.

It also means a bad answer can be debugged. Each query emits a retrieval trace: which chunks came from which stage, what scores the re-ranker assigned, what the AI client ultimately used. When something goes wrong, you can see exactly where. We can use this concrete evidence to make retrieval even better over time.

What this is not

  • Not a chat UI
  • Not a RAG wrapper
  • Not fine-tuning
  • Not summarising
  • Not real-time streaming
  • Client-agnostic

If you can't trace an answer back to a specific chunk in a specific document, it shouldn't have been part of the answer in the first place.

David Bunting, Founder, Laminae
Vector retrieval
Dense
BM25 keyword
Sparse
Final filter
Re-rank
Every query
Traced

More on how Laminae works

Managed cloud, self-hosted, or air-gapped — same product, your call

Run Laminae as a managed per-tenant deployment, or take the same Docker Compose stack and run it on your own infrastructure. The wedge is open-standard, so the choice never locks you in.

Read more

Upload to a bucket, expose it as an MCP server, done

Buckets are managed, vectorised knowledge stores. MCP servers are how your AI client talks to them. Non-technical teams can spin up both in an afternoon — no data team required.

Read more

Want to see Laminae on your own documents?

Based and hosted in

  • Frankfurt
    Frankfurt am Main
    Germany