The compliance case for self-hosting your knowledge layer

by David Bunting, Founder

1. Where does the data actually live?

This is the first question every legal team asks, and the one most retrieval products fudge. Embeddings are still your data — reconstructing the source from them is harder than it used to be, but not impossible. Chunks are definitely your data. Query logs are your data. If any of those leave the building, the answer to “where does it live” is no longer your network.

Self-hosting collapses the question. With Laminae running inside your own infrastructure, the embeddings, the chunks, and the query logs all stay there. Air-gapped goes a step further: bundled Ollama replaces the cloud model endpoints, so even the inference call doesn’t leave the network.

Key deadline

Most of the EU AI Act’s obligations bite in August 2026. Prohibited practices have been banned since February 2025. If the knowledge layer underneath your AI clients isn’t mapped yet, it’s the cheapest piece of preparation you can still do.

2. Can you trace an answer back to a document?

The second question, every time: if a regulator or a customer asks why your AI told them something, can you point to the source? A summarising retrieval layer breaks the chain here. Once an LLM has paraphrased the source, the citation is a claim, not a trace.

Laminae handles this by not summarising in the middle. Retrieval returns the chunks themselves — each one carrying its source document, page, and snippet — and the AI client uses them as evidence. The chain from “answer” to “exact span in exact document” stays intact, which is what an audit actually needs.

That’s also why the product doesn’t ship a chat surface. The AI client is the right place for the answer to be generated and cited; the knowledge layer is the right place for the evidence to come from. Mixing those concerns is what makes provenance brittle.

3. Who can you turn off, and how fast?

The third question is the operational one: if a control fails, can you cut access cleanly? With a SaaS knowledge product, “turning off” means filing a ticket and hoping the vendor responds quickly. With a self-hosted Laminae deployment, it means revoking a token on a bucket, or stopping a container.

Per-bucket auth helps here. Each bucket has its own access list and its own MCP endpoint, so a control failure in one domain doesn’t cascade to the others. You can rotate keys, retire a bucket, or pull an entire MCP server offline without touching anything else.

None of this is novel. It’s the same operational discipline that already applies to your databases and internal services. The only thing that’s changed is that retrieval now lives in that same operational layer, not in a vendor’s console where you can’t see it.

More articles

Why a smaller information surface gets you better answers

Vector databases reward scope. Pour everything into one giant index and unrelated documents start bleeding into every search — the answers come back confidently mixing facts from places you never asked about. One bucket per concept is the fix.

Read more

The most expensive way to use AI is to drag your PDF in every time

The default pattern — drop the document into the chat, ask the question, repeat tomorrow — costs you twice. Once in tokens spent re-embedding the same text. Again in answer quality, when the document outgrows the context window. There is a cheaper way.

Read more

Want to see Laminae on your own documents?

Based and hosted in

  • Frankfurt
    Frankfurt am Main
    Germany