Tornike Onoprishvili | Why did Meta Superintelligence Lab publish an obscure AI paper?

Things are taking an interesting turn at Meta Superintelligence Labs (MSI). The debut paper from this AI lab is about making retrieval augmented generation (RAG) 30x faster without any accuracy loss. This might be surprising because MSI employs some of the most ambitious and successful AI researchers and founders on the planet right now, and the first paper they publish is just a well-known RAG method, only faster. Compare this with OpenAI publishing Sora 2 just weeks ago and the wow-effect it had!

But why bother accelerating RAG? Well, most of my business AI engineering tasks involved, in one way or another, a RAG pipeline over thousands of documents. RAG is, as far as I know, the most widely applied AI method by businesses to actually make money. Building RAGs, maintaining them, tuning them, and updating them are all so vitally important that I could see it becoming a profession at some point. Businesses love RAGs and they pay for RAGs because RAGs make information search easier and they turn a real return on investment.

But RAGs are incredibly slow, and so the paper addressing this problem is really business-oriented. All of the businesses I’ve built RAGs for stand to directly benefit from this paper. It’s not even a question of “How do we monetize this?”, but “When do we update our RAGs to this?”.

The method (called “REFRAG”) is also simple to explain. The core insight can be summarized in just a couple of sentences:

During a traditional RAG pipeline, the text corpus is turned into vectors using an embedding model and stored in vector databases like Pinecone, Chroma, or FAISS.
When the user asks a question, the vector database finds relevant pieces of text by retrieving the texts from the vector database and injecting the texts in the context of an AI model like ChatGPT.
But then embedding happens twice, once by the embedding model and a second time by the ChatGPT itself!
The core insight is that these two embeddings can be replaced with just a single embedding. Avoiding unnecessary re-embedding allows the RAG pipeline to become 30x faster.

A world-class AI lab just published work that will largely go unnoticed by the public, while solving an actual business issue and driving real value. This is the kind of AI research that is sustainable and can validate the global bet on AI. It’s exciting to see what the Meta Superintelligence team will publish next.

Thanks to Ani Talakhadze for reading drafts of this

💬 Discuss this post on Hacker News ↗