Sriram B (@b_shrir) 's Twitter Profile
Sriram B

@b_shrir

PhD student in Computer Science at UMD College Park | Ex - Research Fellow at MSR | IIT Bombay CS undergrad

Make AI more understandable and reliable!

ID: 992008786973003776

linkhttp://sriram.live calendar_today03-05-2018 11:51:40

289 Tweet

136 Followers

118 Following

Sriram B (@b_shrir) 's Twitter Profile Photo

Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.

Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.
Sriram B (@b_shrir) 's Twitter Profile Photo

This actually seems like a bigger deal than the Deepmind result a year ago. Non-math-specific approach (no formal verification etc) yielding an IMO gold is huge. Timelines shortened!

Sriram B (@b_shrir) 's Twitter Profile Photo

Very interesting. This would imply that CoTs for visual tasks which are less reliant on explicit reasoning are more likely to be unfaithful. We actually showed this in arxiv.org/abs/2505.23945

Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on

Introducing Maestro: the holistic optimizer for AI agents.
Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch.

Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on
Sriram B (@b_shrir) 's Twitter Profile Photo

This type of stuff used to impress me too, but remember that distinguishing between subtle details is a strength of AI and weakness of humans. Even by 2016 Imagenet models were able to distinguish between fine grained classes like Asian vs African elephant.

RELAI (@reliableai) 's Twitter Profile Photo

πŸš€ RELAI is live β€” a platform for building reliable AI agents πŸ” We complete the learning loop for agents: simulate β†’ evaluate β†’ optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human

Koustava Goswami (@koustavagoswami) 's Twitter Profile Photo

πŸš€ New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩 β†’ a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning πŸ“„ arxiv.org/pdf/2510.25766 #AI #Reasoning

πŸš€ New research drop!
We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩
β†’ a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning
πŸ“„ arxiv.org/pdf/2510.25766
#AI #Reasoning
Sriram B (@b_shrir) 's Twitter Profile Photo

I had quite a bit of fun training LLMs with the latest RL techniques this summer. Some of our results are in this thread: