DAIR.AI (@dair_ai) 's Twitter Profile
DAIR.AI

@dair_ai

Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: dair-ai.thinkific.com

ID: 889050642903293953

linkhttps://github.com/dair-ai calendar_today23-07-2017 09:12:45

1,1K Tweet

71,71K Followers

1 Following

DAIR.AI (@dair_ai) 's Twitter Profile Photo

Top AI Papers of The Week (August 4-10): - CoAct-1 - ReaGAN - Agentic Web - Seed Diffusion - Efficient Agents - A Taxonomy of Hallucinations - Unified Retrieval Agent for AI Search Read on for more:

elvis (@omarsar0) 's Twitter Profile Photo

GPT-5 on Multimodal Medical Reasoning On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o. It surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding.

GPT-5 on Multimodal Medical Reasoning

On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o.

It surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding.
elvis (@omarsar0) 's Twitter Profile Photo

Unlocking Long-Horizon Agentic Search AI agents still struggle with long-horizon tasks. This paper sheds light on how to improve long-horizon agentic search with RL. Here are my notes:

Unlocking Long-Horizon Agentic Search

AI agents still struggle with long-horizon tasks.

This paper sheds light on how to improve long-horizon agentic search with RL.

Here are my notes:
elvis (@omarsar0) 's Twitter Profile Photo

A Deep Dive into RL for LLM Reasoning Provides a roadmap for practitioners applying RL for LLM reasoning. Nice to have some of the latest techniques in one place.

A Deep Dive into RL for LLM Reasoning

Provides a roadmap for practitioners applying RL for LLM reasoning.

Nice to have some of the latest techniques in one place.
elvis (@omarsar0) 's Twitter Profile Photo

Started to run vibe checks on Claude 4 Sonnet with 1M context support. Compared to Gemini 2.5 Pro on a paper analysis task, Sonnet 4 is fast, less verbose (concise), and pays attention to details. Makes it ideal for AI agents. More expensive, though. More thoughts below:

elvis (@omarsar0) 's Twitter Profile Photo

Custom AI Agents are a game-changer for builders. @Emergentlabshq now allows you to create custom AI agents to build & launch production-ready mobile + web apps 5x faster! Start with a prompt to go from an idea to a working agent to a fully deployable app.

DAIR.AI (@dair_ai) 's Twitter Profile Photo

Anyone can build useful AI Agents. But it requires having a solid framework to design and improve AI agents. That's what we'll teach in our new training on Building Effective AI Agents. Topics include context engineering, augmenting AI agents, multi-agent systems, and more.

Anyone can build useful AI Agents.

But it requires having a solid framework to design and improve AI agents.

That's what we'll teach in our new training on Building Effective AI Agents.

Topics include context engineering, augmenting AI agents, multi-agent systems, and more.
elvis (@omarsar0) 's Twitter Profile Photo

The Illusion of Progress It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities. It's no different for hallucination detection. "ROUGE fails to reliably capture true hallucination" Here are my notes:

The Illusion of Progress

It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities.

It's no different for hallucination detection.

"ROUGE fails to reliably capture true hallucination"

Here are my notes:
elvis (@omarsar0) 's Twitter Profile Photo

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset. Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning.

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset.

Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning.
elvis (@omarsar0) 's Twitter Profile Photo

Took almost a year to build out the most comprehensive set of courses on AI Agents. This is what's on the surface. Behind the scenes, we have live workshops, office hours, direct support, paper discussions, and much more. We focus on AI builders (beginners to advanced)

Took almost a year to build out the most comprehensive set of courses on AI Agents. 

This is what's on the surface.

Behind the scenes, we have live workshops, office hours, direct support, paper discussions, and much more. 

We focus on AI builders (beginners to advanced)
elvis (@omarsar0) 's Twitter Profile Photo

AI Agents are terrible at long-horizon tasks. Even the new GPT-5 model struggles with long-horizon tasks. This is one of the most pressing challenges when building AI agents. Pay attention, AI devs! This is a neat paper that went largely unnoticed. Here are my notes:

AI Agents are terrible at long-horizon tasks.

Even the new GPT-5 model struggles with long-horizon tasks.

This is one of the most pressing challenges when building AI agents.

Pay attention, AI devs!

This is a neat paper that went largely unnoticed.

Here are my notes: