Stop Dumping Docs Into Vector Stores
The engineering story behind Minoa's remote MCP server. Why we chose structured data over RAG, what we learned building it, and how it works under the hood.
We recently announced Remote MCP, our server that gives AI agents access to structured customer value data. This post is the engineering story behind it.
Every AI integration hits the same wall: the data is garbage.
Doesn't matter how good the model is. Ask it to build an ROI model and it hallucinates numbers. Ask what worked for similar accounts and it has no idea. Ask for a customer outcome summary and it confidently cites something that may or may not be true.
The instinct is to throw everything into a vector store and hope retrieval saves you.
It won't.
The RAG Trap
Vector stores are great for some things. They're terrible when you need:
- Calculations with traceable inputs, not paragraphs describing calculations
- Aggregations across records: "what wins in fintech?" requires counting, not retrieving
- Consistent taxonomy: if every team describes things differently, outputs diverge
- Provenance: "why did the AI say this?" needs a real answer
Retrieval gets you text that might be relevant. Structured data gets you facts you can trust.
The model isn't the bottleneck. Missing structured context is.
The MCP Server
We shipped a remote MCP server that exposes Minoa's structured data to AI agents. MCP (Model Context Protocol) is an open standard for connecting AI to external tools. One integration works with Claude, Dust, n8n, or whatever you're building.
When an agent asks for deal context, it doesn't retrieve a paragraph that might mention ROI. It gets structured data:
{
"account": "Acme Corp",
"use_case": "Process Automation",
"annual_savings": 180000,
"win_rate_in_segment": "68%",
"source": "calculated"
}
Numbers you can trace. Assumptions you can cite.
We went with a remote server (SSE transport) so there's no local setup, permissioning is handled server-side, and we can ship updates without users pulling new images.
The Pattern: Ingest → Normalize → Serve
Most people think of MCP as "AI reads my data." That's half the picture.
The full pattern:
- Ingest: Signals come in from scattered sources. Call notes, CRM, CS updates, usage data.
- Normalize: Structure them into a consistent record with real taxonomy. This is where the work happens.
- Serve: Agents query at runtime. Every request is authenticated, permissioned, and logged.
Build the value record once, use it everywhere.
When someone asks "why did the agent say this?", you can trace back to the exact objects it used. That's table stakes for enterprise.
What We Expose
The MCP server exposes these tools:
list_business_cases- Search and list business cases by name or statusget_business_case- Get full details of a specific business caseget_value_framework- Get all use cases from your value framework libraryget_value_hypothesis- Get the AI-generated value hypothesis for a dealget_pipeline_analytics- Pipeline metrics, win rates, and trendsget_use_case_analytics- Use case effectiveness and win ratescreate_smart_business_case- Create a new business case with AI-recommended use cases
Each tool returns structured JSON. No prose, no hallucination surface area.
Auth is OAuth-based, scoped to the user's Minoa workspace. Claude can only access data you have permission to view, and every action is logged under your user for full auditability.
Aggregation > Retrieval
The most powerful capability we built isn't read-only. It's the ability to compute and aggregate data.
When someone creates a business case, we recommend which value drivers to include based on what's actually won before.
SELECT
use_case_id,
COUNT(CASE WHEN status = 'WON' THEN 1 END) as wins,
ROUND(
COUNT(CASE WHEN status = 'WON' THEN 1 END) * 100.0 / COUNT(*), 1
) as win_rate
FROM opportunities_with_use_cases
WHERE industry LIKE @industry
AND headcount BETWEEN @min AND @max
GROUP BY use_case_id
ORDER BY win_rate DESC
LIMIT 5
"These 3 use cases have a 68% win rate in financial services deals your size."
You can't retrieve that. You have to compute it.
What We're Seeing
A few patterns from early users:
Agent orchestration. Platforms running multiple specialized agents need a shared context layer. Otherwise each agent retrieves differently and outputs diverge.
Grounded coaching. Generic sales frameworks don't stick. Coaching grounded in actual proof points (what's won, what evidence validates it) does.
Defensible outputs. Business cases built from scattered signals aren't credible. Structured context makes them auditable.
RAG vs. Structured Data
| Task | With RAG | With Structured Data |
|---|---|---|
| Build ROI model | Retrieves paragraph, maybe wrong | Returns calculation with cited inputs |
| Find what wins | Can't aggregate | "Efficiency messaging wins 68% in fintech" |
| Prep for QBR | Retrieves old notes | Structured: outcomes, evidence, next targets |
| Trace provenance | "It was in the docs somewhere" | Exact object IDs, timestamps, validators |
The Unsexy Truth
We built structured data capture before the MCP server was useful. Templated calculations. Standardized fields. Consistent taxonomies.
This is the boring part that makes the interesting part work.
If you're trying to make AI actually useful, the answer isn't a better prompt. It isn't a smarter model. It isn't dumping more docs into a vector store.
It's fixing the data layer.
The protocol is table stakes. The data model is the moat.
Try It
Minoa's remote MCP server is in early access. If you're building AI workflows and need structured customer value data, we'd like to talk.
Using Claude? Here's the integration guide.
If you want to work on the value intelligence infrastructure that powers the most exciting GTM AI use cases, we're hiring.