DAIR.AI (@dair_ai) Twitter Tweets • TwiCopy

DAIR.AI

@dair_ai

+ Follow

Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: dair-ai.thinkific.com

ID: 889050642903293953

linkhttps://github.com/dair-ai calendar_today23-07-2017 09:12:45

1,1K Tweet

71,71K Followers

1 Following

DAIR.AI

@dair_ai

4 months ago

Top AI Papers of The Week (August 4-10): - CoAct-1 - ReaGAN - Agentic Web - Seed Diffusion - Efficient Agents - A Taxonomy of Hallucinations - Unified Retrieval Agent for AI Search Read on for more:

thumb_up_off_alt549

chat_bubble_outline9

repeat77

shareShare

elvis

@omarsar0

4 months ago

The GLM-4.5 technical report is out! Sharing some key details in case you missed it:

thumb_up_off_alt284

chat_bubble_outline10

repeat46

shareShare

GPT-5 on Multimodal Medical Reasoning On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o. It surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding.

thumb_up_off_alt538

chat_bubble_outline11

repeat121

shareShare

elvis

@omarsar0

4 months ago

Unlocking Long-Horizon Agentic Search AI agents still struggle with long-horizon tasks. This paper sheds light on how to improve long-horizon agentic search with RL. Here are my notes:

thumb_up_off_alt331

chat_bubble_outline9

repeat58

shareShare

elvis

@omarsar0

4 months ago

A Deep Dive into RL for LLM Reasoning Provides a roadmap for practitioners applying RL for LLM reasoning. Nice to have some of the latest techniques in one place.

thumb_up_off_alt592

chat_bubble_outline11

repeat100

shareShare

elvis

@omarsar0

4 months ago

Started to run vibe checks on Claude 4 Sonnet with 1M context support. Compared to Gemini 2.5 Pro on a paper analysis task, Sonnet 4 is fast, less verbose (concise), and pays attention to details. Makes it ideal for AI agents. More expensive, though. More thoughts below:

thumb_up_off_alt84

chat_bubble_outline15

repeat9

shareShare

elvis

@omarsar0

4 months ago

Custom AI Agents are a game-changer for builders. @Emergentlabshq now allows you to create custom AI agents to build & launch production-ready mobile + web apps 5x faster! Start with a prompt to go from an idea to a working agent to a fully deployable app.

thumb_up_off_alt90

chat_bubble_outline11

repeat12

shareShare

DAIR.AI

@dair_ai

4 months ago

Anyone can build useful AI Agents. But it requires having a solid framework to design and improve AI agents. That's what we'll teach in our new training on Building Effective AI Agents. Topics include context engineering, augmenting AI agents, multi-agent systems, and more.

thumb_up_off_alt94

chat_bubble_outline2

repeat10

shareShare

elvis

@omarsar0

4 months ago

The Illusion of Progress It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities. It's no different for hallucination detection. "ROUGE fails to reliably capture true hallucination" Here are my notes:

thumb_up_off_alt213

chat_bubble_outline9

repeat45

shareShare

elvis

@omarsar0

4 months ago

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset. Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning.

thumb_up_off_alt207

chat_bubble_outline13

repeat30

shareShare

elvis

@omarsar0

4 months ago

Speed Always Wins Very nice and comprehensive new report on recent efficient architectures for LLMs.

thumb_up_off_alt518

chat_bubble_outline12

repeat83

shareShare

elvis

@omarsar0

4 months ago

Took almost a year to build out the most comprehensive set of courses on AI Agents. This is what's on the surface. Behind the scenes, we have live workshops, office hours, direct support, paper discussions, and much more. We focus on AI builders (beginners to advanced)

thumb_up_off_alt107

chat_bubble_outline6

repeat18

shareShare

elvis

@omarsar0

4 months ago

AI Agents are terrible at long-horizon tasks. Even the new GPT-5 model struggles with long-horizon tasks. This is one of the most pressing challenges when building AI agents. Pay attention, AI devs! This is a neat paper that went largely unnoticed. Here are my notes:

thumb_up_off_alt102

chat_bubble_outline4

repeat27

shareShare