What does Derek Fox specialize in?

Derek Fox specializes in building production AI systems including multi-agent architectures, RAG pipelines, LLM evaluation tooling, and MCP server development. He has 4+ years of experience shipping these systems across enterprise consulting, finance, and telecom.

ReasonForge is an open-source neurosymbolic mathematics toolkit for LLMs. It provides 111 tools across 7 domain-specific MCP servers that route calculations through SymPy, NumPy, and SciPy for provably correct results, eliminating math hallucinations entirely.

What technologies does Derek Fox work with?

Derek works with Python, LangChain, Google Gemini Agentspace, AWS Bedrock, Google Cloud Vertex AI, Azure Synapse, Databricks, TensorFlow, TypeScript/Node.js, MCP (Model Context Protocol), and vector databases. He holds certifications from Microsoft, Databricks, and IBM.

Where does Derek Fox currently work?

Derek Fox currently works at dominKnow as AI Architect & Platform Lead, leading AI architecture and long-term platform strategy for dominKnow|ONE — a Learning Content Management System used by L&D teams at Fortune 500 enterprises. He owns production AI systems including RAG pipelines, model orchestration on AWS Bedrock, and AI-augmented data layers.

← ALL TRANSMISSIONS

FIELD NOTEMarch 10, 2026⚡ 300 XP★★★★☆

Why Most Enterprise RAG Systems Fail in Production

After building RAG systems for Fortune 100 finance clients, here's what actually breaks — and it's not what you think.

RAGEnterpriseProductionLLM

After deploying RAG architectures for Fortune 100 finance clients at Deloitte, I've watched the same failure modes repeat across every engagement. The industry conversation around RAG focuses on retrieval quality — better embeddings, hybrid search, reranking. That's important but it's not why most enterprise RAG systems fail.

They fail because of everything around the retrieval.

The Three Failures Nobody Talks About

1. Data Ingestion Is the Real Bottleneck

Your RAG demo works great on 50 clean PDFs. Then the client hands you 200,000 documents across SharePoint, Confluence, email archives, and a legacy document management system from 2008.

The parsing alone takes weeks. Tables break. Headers get merged with body text. Scanned documents need OCR that produces garbage on low-resolution faxes (yes, finance still uses faxes). Metadata is inconsistent or missing entirely.

What works: Build a robust ingestion pipeline before you touch embeddings. Invest 40% of your time here. Use document-type-specific parsers. Validate extraction quality with LLM-as-a-Judge on a sample set before embedding anything.

2. Retrieval Without Context Is Retrieval Without Value

Cosine similarity finds text that looks similar. It doesn't understand that a "risk assessment" from the compliance team means something completely different than a "risk assessment" from the trading desk.

Multi-hop RAG helps — but only if your chunking strategy preserves the context boundaries. Most teams chunk by token count. The chunks that work best in enterprise finance are document-section-aware: they respect headers, preserve table integrity, and carry metadata about source, date, and department.

What works: Chunk by semantic structure, not token count. Attach rich metadata to every chunk. Use metadata filters before vector search, not after. Your retrieval pipeline should know where a document came from before it decides if the content is relevant.

3. Evaluation Is an Afterthought

"It seems to work" is not an evaluation strategy. But that's what I see in 80% of enterprise RAG deployments.

Without systematic evaluation, you can't tell if a model upgrade improved retrieval, if a new chunking strategy helped, or if your system is hallucinating more after the last data refresh.

What works: LLM-as-a-Judge evaluation with a curated test set. Build the evaluation harness in week one, not month three. Define your metrics: answer relevance, faithfulness to source, context precision. Run evals on every pipeline change. If you can't measure it, you can't improve it.

The Pattern That Actually Ships

The teams that succeed in enterprise RAG follow this sequence:

Ingestion first — robust parsing, validation, metadata enrichment
Chunking second — semantic, document-aware, metadata-rich
Retrieval third — hybrid search with metadata pre-filtering
Evaluation always — LLM-as-a-Judge from day one

The teams that fail start with a vector database, skip straight to "let's try GPT-4 on our documents," and wonder why the demo works but production doesn't.

Build the boring parts first. The boring parts are the system.

This is a field note from building production RAG systems at Deloitte. Opinions are my own, not my employer's. If you're fighting similar battles, send a raven.

← MORE TRANSMISSIONS