Ashutosh Baheti (@abaheti95) Twitter Tweets • TwiCopy

Alex Trott

9 months ago

Ever wonder what it'd look like if an LLM Judge and a Reward Model had a baby? So did we, which is why we created PGRM -- the Prompt-Guided Reward Model. TLDR: You get the instructability of an LLM judge + the calibration of an RM in a single speedy package (1/n)

thumb_up_off_alt153

chat_bubble_outline6

repeat25

shareShare

Jonathan Frankle

@jefrankle

9 months ago

Not that I have a favorite recent project, but... 🧵 LLM judges are the popular way to evaluate generative models. But they have drawbacks. They're: * Generative, so slow and expensive. * Nondeterministic. * Uncalibrated. They don't know how uncertain they are. Meet PGRM!

thumb_up_off_alt77

chat_bubble_outline4

repeat15

shareShare

Andrew Drozdov

@mrdrozdov

8 months ago

We built a thing! The Databricks Reranker is now in Public Preview. It's as easy as changing the arguments to your vector search call, and doesn't require any additional setup. Read more: databricks.com/blog/reranking…

thumb_up_off_alt43

chat_bubble_outline0

repeat10

shareShare

Ali Ghodsi

@alighodsi

8 months ago

Databricks just signed a Series K term sheet at >$100B valuation to scale two flagship products: 🔥 Lakebase — serverless Postgres with true compute/storage separation 🧠 Agent Bricks — agentic framework with built-in reasoning guardrails for enterprise data

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat105

shareShare

Ivan Zhou

@ivanzhouyq

7 months ago

Automated prompt optimization (GEPA) can push open-source models beyond frontier performance on enterprise tasks — at a fraction of the cost! 🔑 Key results from our research Databricks Mosaic Research: 1⃣ gpt-oss-120b + GEPA beats Claude Opus 4.1 on Information Extraction (+2.2 points) —

thumb_up_off_alt535

chat_bubble_outline11

repeat69

shareShare

Dipendra Misra

@dipendramisra

7 months ago

Our paper showing that with a general-purpose RLVR recipe, we can get SOTA on the BIRD benchmark is out: arxiv.org/pdf/2509.21459 Our hybrid approach performs offline RL and online RL to fine-tune a 32B model. This, along with self-consistency, was sufficient to get us to SOTA.

thumb_up_off_alt18

chat_bubble_outline2

repeat5

shareShare

Ivan Zhou

@ivanzhouyq

7 months ago

🚀 Our Databricks Mosaic Research team are looking for research interns for Summer 2026! Our team explores exciting challenges at the intersection of AI and data, especially in how AI agents can help enterprises reason over knowledge and automate data workflows. We work on

🚀 Our <a href="/databricks/">Databricks</a> Mosaic Research team are looking for research interns for Summer 2026!

Our team explores exciting challenges at the intersection of AI and data, especially in how AI agents can help enterprises reason over knowledge and automate data workflows.

We work on

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Databricks

@databricks

6 months ago

80% of enterprise data is unstructured, locked in PDFs, reports, and diagrams that traditional tools can’t parse or govern. Introducing ai_parse_document, state-of-the-art document intelligence on Databricks. With a single SQL command, teams can now turn any document into

thumb_up_off_alt107

chat_bubble_outline4

repeat5

shareShare

Matei Zaharia

@matei_zaharia

6 months ago

Great new capability in Databricks powered by our AI research team! We trained a document parsing system that delivers leading quality at 3-5x lower cost and outperforms leading VLMs like GPT-5 and Claude. This is critical to connect AI to so many kinds of data.

thumb_up_off_alt210

chat_bubble_outline12

repeat29

shareShare

Jonathan Frankle

@jefrankle

5 months ago

I'm missing NeurIPS BUT my extraordinary Databricks colleagues will be there: 🧱 Erich Elsen (multimodal) 🧱 Ashutosh Baheti, Abhay Gupta, Jose Javier Gonzalez (RL at scale) 🧱 Jacob Portes (search) 🧱 Veronica Qing Lyu (feedback) Hang out with them, and you won't miss me at all 🙂

thumb_up_off_alt82

chat_bubble_outline3

repeat7

shareShare

Abhay Gupta

@gupta__abhay

5 months ago

I’ll be at #NeurIPS2025 from 2nd-6th Dec. DM if you want to chat about MoEs, scaling training and inference, making GPUs go brrr and other fun stuff !!

thumb_up_off_alt47

chat_bubble_outline2

repeat9

shareShare

Ashutosh Baheti

@abaheti95

5 months ago

Will be at #NeurIPS2025 from 2nd to 6th Dec. Excited to chat about async RL, Environment Exploration, Agents/Tool use, User Simulator, Synthetic Data Generation or any other topic!! You can find me at the Databricks booth @ Tue 12 - 4pm

thumb_up_off_alt31

chat_bubble_outline1

repeat3

shareShare

Databricks

@databricks

5 months ago

Today we’re introducing OfficeQA, a new benchmark grounded in ~89,000 pages of U.S. Treasury Bulletins that reflects the complex, document-heavy tasks enterprises actually face. Unlike existing benchmarks, OfficeQA measures economically valuable, real-world reasoning: parsing

thumb_up_off_alt74

chat_bubble_outline3

repeat13

shareShare

Krista Opsahl-Ong

@kristahopsalong

5 months ago

Today we’re releasing OfficeQA — a new benchmark for end-to-end grounded reasoning that reflects the real work enterprises need AI agents to do. More details below 👇

thumb_up_off_alt40

chat_bubble_outline3

repeat18

shareShare

Ashutosh Baheti

@abaheti95

4 months ago

Interested in training RL agents for long horizon enterprise tasks? 🤖👾 Come work with our phenomenal and cracked team! 🚀

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Andrew Drozdov

@mrdrozdov

4 months ago

Instructed Retriever is a multi-tiered declarative approach for building high quality search agents. It's an example of an "instructed system", which goes beyond prompt tuning and tool calling by passing data among modules which work together to fulfill an information need.

thumb_up_off_alt33

chat_bubble_outline1

repeat12

shareShare

Matei Zaharia

@matei_zaharia

3 months ago

Lakebase is GA! We think this is going to make it radically simpler and more reliable to work with online databases. You can instantly branch your DB, take snapshots, roll back to a point in time, or create a copy for offline analysis, whether it's humans or AI doing the work.

thumb_up_off_alt61

chat_bubble_outline2

repeat16

shareShare

Matei Zaharia

@matei_zaharia

3 months ago

Agent memory is a simple and powerful way to do continual learning! With the new MemAlign method from Databricks Research, we can build better LLM judges from examples of human ratings, and they scale with more data. Now in Databricks and MLflow. databricks.com/blog/memalign-…

thumb_up_off_alt231

chat_bubble_outline9

repeat37

shareShare

Ali Ghodsi

@alighodsi

3 months ago

I now constantly get questions about the SAAS meltdown, role of AI, system of records etc. I don't have an answer to all these. But I do know that we saw an acceleration in our business in Q2, Q3, and now finished the year with accelerating Q4. The question is, why? Short

thumb_up_off_alt968

chat_bubble_outline54

repeat121

shareShare

Wen Sun

@wensun1

2 months ago

Many recent works try to force GRPO to be on-policy by adding things like extra importance weighting, clipping, masking, data deletion, inference engine edits, router replay… But are these actually needed? We push the other direction: make it maximally off-policy and keep it

thumb_up_off_alt156

chat_bubble_outline0

repeat16

shareShare