Mathew Jacob
@mat_jacob1002
Senior @IllinoisCDS. Intern @DbrxMosaicAI. HPC + MLSys.
ID: 849976408667545601
http://mjacob1002.github.io 06-04-2017 13:25:42
29 Tweet
116 Takipçi
49 Takip Edilen
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern Databricks 🧱
Most RAG benchmarks are way too artificial. They start from documents and then build questions! But no one explicitly wants *RAG*, it's just a method! The actual problem is answering niche questions. FreshStack Databricks is derived from hard technical questions people ask:
🚨New RAG Dataset Release🚨 Lead by Nandan Thakur: we’ve curated real long and complex questions, each requiring multiple retrieved documents covering a diverse set of concepts (i.e. nuggets).
OASYS is cooking up bangers. Omar Khattab the people want an OASYS lab website
Ilya Sutskever says the age of scaling is over - good thing we put this paper out in time! Many recent embedding models are finetuned versions of pretrained LLMs. We asked 🤓: How does retrieval performance scale with pretraining FLOPs? 📄 paper: arxiv.org/abs/2508.17400