Mathew Jacob (@mat_jacob1002) Twitter Tweets • TwiCopy

Nandan Thakur

a year ago

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern Databricks 🧱

thumb_up_off_alt189

chat_bubble_outline11

repeat34

shareShare

Omar Khattab

@lateinteraction

a year ago

Most RAG benchmarks are way too artificial. They start from documents and then build questions! But no one explicitly wants *RAG*, it's just a method! The actual problem is answering niche questions. FreshStack Databricks is derived from hard technical questions people ask:

thumb_up_off_alt149

chat_bubble_outline8

repeat16

shareShare

Andrew Drozdov

@mrdrozdov

a year ago

🚨New RAG Dataset Release🚨 Lead by Nandan Thakur: we’ve curated real long and complex questions, each requiring multiple retrieved documents covering a diverse set of concepts (i.e. nuggets).

thumb_up_off_alt59

chat_bubble_outline1

repeat11

shareShare

Waymo

@waymo

a year ago

New York, we're coming back to the Big Apple next month! 🍎🗽We want to serve New Yorkers in the future, and we’re working towards that goal. Here’s how:👇

thumb_up_off_alt1,1K

chat_bubble_outline81

repeat153

shareShare

Peter Pao-Huang

@peterpaohuang

10 months ago

Why do diffusion models use the same GNN structure across denoising? Our #ICML paper presents Noise-Conditioned Graph Networks, a class of GNNs that adapts the graph structure to the noise level of the generative process. 📄arxiv.org/abs/2507.09391 💻tinyurl.com/ncgn-code 🧵

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

Shlok Mehrotra

@prenkist

9 months ago

OpenAI Deep Research is pretty amazing. But what if it was better… at a fraction of the price 👀

thumb_up_off_alt9

chat_bubble_outline0

repeat8

shareShare

Mathew Jacob

@mat_jacob1002

8 months ago

These guys always be shipping 🚀

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Mathew Jacob

@mat_jacob1002

7 months ago

Seattle weather is nicer than I expected. That is all.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Mathew Jacob

@mat_jacob1002

7 months ago

Everyone has been saying that all the major cloud providers are in Seattle, and yet there have been almost no clouds while I've been here...

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Mathew Jacob

@mat_jacob1002

7 months ago

OASYS is cooking up bangers. Omar Khattab the people want an OASYS lab website

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Matei Zaharia

@matei_zaharia

7 months ago

My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.

thumb_up_off_alt611

chat_bubble_outline18

repeat56

shareShare

Andrew Drozdov

@mrdrozdov

6 months ago

Come work with us at Databricks Mosaic Research! We're looking for interns at the intersection of RL, LLMs, Systems, and Search. Hiring in SF and NYC.

thumb_up_off_alt101

chat_bubble_outline5

repeat8

shareShare

Mathew Jacob

@mat_jacob1002

6 months ago

Free research direction: AI4Sports Management. Benchmark: whether the New York Giants can make the playoffs or Manchester United can make it to the Champions League.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Mathew Jacob

@mat_jacob1002

6 months ago

People clown UIUC being in the cornfields, but all I’m saying is chances of working on personal projects is lower in big time cities

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Melissa Pan

@melissapan

6 months ago

How do you debug your agent when they fail ⁉️ Say if you get a 60% success rate, what does that mean? How to know what's going on in the 40% traces that fail⁉️ You won't want to eyeball through tens of thousands line of agent traces to manually debug 😫 In our newest blog, we

thumb_up_off_alt59

chat_bubble_outline3

repeat10

shareShare

Jacob Portes

@jacobianneuro

5 months ago

Ilya Sutskever says the age of scaling is over - good thing we put this paper out in time! Many recent embedding models are finetuned versions of pretrained LLMs. We asked 🤓: How does retrieval performance scale with pretraining FLOPs? 📄 paper: arxiv.org/abs/2508.17400

thumb_up_off_alt55

chat_bubble_outline3

repeat10

shareShare

Mathew Jacob

@mat_jacob1002

5 months ago

Should definitely talk to Melissa if you’re interested in agent, systems, or how to maximize your Cursor token usage!

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare