🔮 Moving beyond RAG

🔮 Moving beyond RAG

In this issue:

  1. 2.9x Lower Latency with Prompt Compression
  2. Unified Structure Learning
  3. Is it RAG? Is it FT? No, it’s RAFT!


Article content

Meet your new AI-powered data analyst!

Telescope Labs makes quality insights and Data Science more accessible by simplifying the "data to action" journey for everyone.

Want to empower your teams to develop better products and services with the help of AI? Click on the button below and try it out for free.


1. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Watching: LLMLingua-2 (paper)

Article content

What problem does it solve? Prompts are a crucial component in interacting with Large Language Models (LLMs). However, as prompts become more complex and detailed to guide the model effectively, they also become longer. This increased length can lead to redundancy and inefficiency in the prompts. Existing approaches to compress prompts often rely on information entropy obtained from a causal language model, but this method has limitations in capturing all essential information and aligning with the prompt compression objective.

How does it solve the problem? The proposed approach addresses the limitations of existing prompt compression methods by introducing a data distillation procedure. This procedure derives knowledge from an LLM to compress prompts without losing crucial information. Additionally, the authors introduce an extractive text compression dataset to support the compression task. By formulating prompt compression as a token classification problem and using a Transformer encoder architecture, the model captures essential information from the full bidirectional context, ensuring the faithfulness of the compressed prompt to the original one.

What's next? As prompt-based interaction with LLMs becomes increasingly prevalent, efficient and effective prompt compression techniques will be essential for maintaining performance while minimizing computational costs. Further research could explore the application of this approach to a wider range of tasks and LLMs, as well as investigating the potential for integrating prompt compression into the LLM training process itself.


2. mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Watching: DocOwl 1.5 (paper/code)

Article content

What problem does it solve? Multimodal Large Language Models (MLLMs) have shown impressive capabilities in understanding and reasoning about visual documents like forms, receipts, charts, and webpages. However, current MLLMs often struggle with fully capturing the rich structural information present in these documents. Understanding the layout, spatial relationships, and hierarchical organization of elements is crucial for accurately interpreting the semantics of text-rich images.

How does it solve the problem? The researchers propose Unified Structure Learning, which combines structure-aware parsing tasks and multi-grained text localization tasks across various domains. They introduce H-Reducer, a vision-to-text module that preserves layout information while efficiently reducing the length of visual features. This enables the LLM to process high-resolution images more effectively. Additionally, they construct DocStruct4M, a comprehensive training set with structure-aware text sequences and multi-grained text-bounding box pairs, and DocReason25K, a high-quality reasoning tuning dataset for detailed explanations in the document domain.

What's next? The proposed DocOwl 1.5 model achieves state-of-the-art performance on 10 visual document understanding benchmarks, significantly outperforming previous MLLMs with a 7B LLM. This demonstrates the importance of incorporating structure learning in MLLMs for text-rich image understanding. Future research could explore extending this approach to other domains, such as scientific literature, medical records, or legal documents, where structure plays a vital role in comprehension. Additionally, investigating more efficient architectures and training strategies for structure-aware MLLMs could further enhance their practicality and scalability.


3. RAFT: Adapting Language Model to Domain Specific RAG

Watching: RAFT (paper)

Article content

What problem does it solve? Large Language Models (LLMs) are typically pretrained on vast amounts of general-domain data. However, when applying these models to specific domains or tasks, it is often necessary to incorporate additional knowledge that is not present in the pretraining data. This can be achieved through techniques like Retrieval-Augmented Generation (RAG) or fine-tuning. The challenge lies in finding the most effective way to integrate this new knowledge into the pretrained model to improve its performance on the target task.

How does it solve the problem? Retrieval Augmented FineTuning (RAFT) is a training approach that enhances the model's ability to answer questions in an "open-book" in-domain setting. Given a question and a set of retrieved documents, RAFT trains the model to disregard documents that are not relevant to answering the question, referred to as "distractor documents." It achieves this by explicitly citing the correct sequence from the relevant document that would assist in answering the question. Additionally, RAFT employs a chain-of-thought-style response, which helps improve the model's reasoning capabilities.

What's next? The effectiveness of RAFT in improving the performance of pretrained LLMs in domain-specific RAG tasks has been consistently demonstrated across various datasets, including PubMed, HotpotQA, and Gorilla. This suggests that RAFT could serve as a valuable post-training recipe for adapting pretrained LLMs to in-domain RAG tasks. Future research could explore the applicability of RAFT to a wider range of domains and investigate potential improvements to the technique, such as incorporating more sophisticated retrieval methods or exploring alternative ways of guiding the model's attention to relevant information within the retrieved documents.


Papers of the Week:

YUNSEOP IM

Software Engineer @OMSCS student at GaTech

1y

Good good

Like
Reply
Khusrav Badalov

AI Engineer | Specializing in Transformer Architectures, Agentic AI Systems & Trigger-Action Programming

1y

Thank you for sharing!! Kudos!

Like
Reply
Patrick Ranger

Newly Qualified IT Specialist in System Integration with a Passion for AI | Creator of an AI Blog | Advancing in Cybersecurity | Eager to Drive Technological Innovation | DM for collab

1y

It's fascinating to see the evolution of language models and the ongoing quest for optimal methodologies.

To view or add a comment, sign in

More articles by Pascal Biese

  • NVIDIA's LLamaTron Moment

    Welcome, Watcher! This week, we're exploring three recent AI developments that are transforming how models reason…

    1 Comment
  • 📉 May the Best Cheater Win

    Welcome, Watcher! This week, we're exploring three developments that reshape how AI models think, verify, and compete…

    4 Comments
  • 🤗 Reinforcement Learning Without Human Feedback

    Welcome, Watcher! This week’s highlights dive into self-supervised adaptation, pre-computation acceleration, and a…

    9 Comments
  • 🤗 The Very First Diffusion Reasoning Model

    Welcome, Watcher! This time around, we're diving into three AI highlights that push the boundaries of how AI models…

    11 Comments
  • 🤖 AI Is Shaking Up the Life Sciences

    Welcome to this week's LLM Watch! This time around, we're diving into three AI highlights that showcase how machine…

    2 Comments
  • 🐋 DeepSeek Strikes Again As OpenAI's Valuation Skyrockets

    In this issue: AI generalization: getting more for less 3x more efficient test-time scaling Combining…

    20 Comments
  • 🎧 Vibe Coding + Knowledge Graphs = 10x Cheaper

    In this issue: Repository-level software engineering Chain-of-Tools for better tool calling The most complete AI model…

    3 Comments
  • ⚛️ Quantum-Enhanced AI - It's Here

    In this issue: Chinese researchers introduce quantum-enhanced fine-tuning Enabling open-source reinforcement learning…

    5 Comments
  • 🧠 Search-R1, Gemini Embeddings & Controlled Reasoning with L1

    In this issue: Emergent search behavior in LLMs Stopping reasoning models from “overthinking” The best embeddings - for…

    1 Comment
  • 🤯 QwQ-32B: 20x smaller than DeepSeek-R1

    In this issue: China just did it again: a new open source powerhouse The art of post-training reasoning models A new…

    6 Comments

Insights from the community

Others also viewed

Explore topics