Back to Blog
Document IntelligenceFebruary 3, 20269 min read

Why Your AI Can't Read the Most Important Part of That CIM

Most RAG systems are blind to the tables, charts, and figures that carry the hardest-to-find deal intelligence. Recent research shows just how costly that gap is.

L

LEDAR Team

M&A Technology Experts

Share:

Most RAG systems are blind to the tables, charts, and figures that carry the hardest-to-find deal intelligence. Recent research shows just how costly that gap is.

A typical Morgan Stanley 10-Q runs about 120 pages, with over 275 tables and nearly 200 figures. Revenue breakdowns, margin trends, organizational structures, geographic exposure—the information that drives deal decisions lives primarily in structured and visual content. The narrative provides context. The tables and charts provide evidence.

Most AI-powered document systems can only read the narrative.

Where current systems break down

When a standard RAG pipeline processes a financial document, it extracts text, chunks it, and stores it in a vector database. Tables get flattened into comma-separated strings, losing the row-column relationships that give numbers meaning. Charts and diagrams—containing zero extractable text—are skipped entirely.

The result is a system that confidently answers questions from the narrative sections while remaining blind to the structured and visual content that often matters most.

Research from S&P Global quantifies this gap. Testing a standard RAG pipeline against 300 expert-crafted questions on real financial filings, accuracy on table-based questions dropped to 2.8%. On image-based questions: 0%. The hardest category—questions requiring synthesis across text and visual elements, the kind analysts actually ask—also scored 0%.

What proper multimodal processing unlocks

The same S&P Global study tested a purpose-built multimodal RAG system where tables are converted to structured JSON with contextual summaries, charts are described and indexed, and retrieval navigates across all three modalities. The improvements were dramatic: table-based accuracy went from 5.6% to 69.4%, image-based from 0% to 66.7%, and cross-modal questions from 0% to 40%.

A separate study from the University of Hong Kong reinforces these findings. Their work on multimodal document understanding shows that performance advantages become increasingly pronounced as document length grows—for filings exceeding 100 pages, properly architected multimodal systems outperformed text-only alternatives by over 13 percentage points. Their research also highlights a critical design insight: representing multimodal content as interconnected knowledge entities rather than isolated data types, through linked graph structures, provides the primary performance gains over traditional chunk-based retrieval.

What this means for deal execution

During screening, the relevant signals are scattered across investor presentations, annual reports, and earnings transcripts—all heavily visual documents. Text-only systems miss the margin trends, the geographic breakdowns, the organizational complexity.

During due diligence, the stakes compound. A data room question like "has the target's working capital position deteriorated?" requires synthesizing quarterly balance sheets, cash flow trend charts, and management commentary simultaneously. Missing any modality means missing part of the answer.

The solution isn't adding more tools or working harder with text-based systems. It requires document intelligence architectures that preserve table structure, interpret visual content, maintain cross-modal relationships, and retrieve evidence across all three modalities in a single pass.

The bottom line

When your AI tools can only read part of the document, you're making decisions on incomplete intelligence. The multimodality gap isn't a future roadmap problem. It's a capability gap that affects deal quality today.


LEDAR is building AI infrastructure for the multimodal reality of M&A documents—processing text, tables, and visual content as interconnected knowledge, not isolated data types.

Request early access →


References:

  • Gondhalekar, C., Patel, U., & Yeh, F-C. (2025). "MultiFinRAG: An Optimized Multimodal RAG Framework for Financial Question Answering." S&P Global Ratings. arXiv:2506.20821
  • Guo, Z., Ren, X., Xu, L., Zhang, J., & Huang, C. (2025). "RAG-Anything: All-in-One RAG Framework." University of Hong Kong. arXiv:2510.12323
AIM&ADue DiligenceDocument IntelligenceRAGMultimodal AI

Ready to streamline your deal sourcing?

See how LEDAR consolidates signal tracking, list building, and relationship intelligence in one platform.