Alexandre Drouin (@alexandredrouin) Twitter Tweets • TwiCopy

Dawn Song

7 months ago

Really excited to announce our Advanced LLM Agents MOOC (Spring 2025)! Building on the success of our LLM Agents MOOC from Fall 2024 (15K+ registered learners, ~9K Discord members, 200K+ lecture views on YouTube), we are excited to extend the MOOC this semester to cover some more

thumb_up_off_alt204

chat_bubble_outline9

repeat44

shareShare

Léo Boisvert

@leoboisvert

6 months ago

📊 Breaking: Claude 3.7 Sonnet scores 51.5% on WorkArena benchmark! Surprising finding: The newer Claude 3.7 Sonnet (51.5%) performs below Claude 3.5 (56.4%) on our tests! 👀 Maybe newer isn't always better? Both Claude 3.7 and o3-mini are underperforming their predecessors.

thumb_up_off_alt14

chat_bubble_outline4

repeat9

shareShare

Alexandre Drouin

@alexandredrouin

6 months ago

A great opportunity to work with Nicolas Gontier Alexandre Lacoste and team on pushing the limits of web agents in enterprise settings! 🤖

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Dawn Song

@dawnsongtweets

6 months ago

🚀 Really excited to launch #AgentX competition hosted by UC Berkeley RDI UC Berkeley alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

🚀 Really excited to launch #AgentX competition hosted by <a href="/BerkeleyRDI/">UC Berkeley RDI</a> <a href="/UCBerkeley/">UC Berkeley</a> alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

thumb_up_off_alt410

chat_bubble_outline20

repeat108

shareShare

Juan A. Rodríguez 💫

@joanrod_ai

6 months ago

I’m excited to announce that 💫StarVector has been accepted at CVPR 2025! Over a year in the making, StarVector opens a new paradigm for Scalable Vector Graphics (SVG) generation by harnessing multimodal LLMs to generate SVG code that aesthetically mirrors input images and text.

thumb_up_off_alt162

chat_bubble_outline13

repeat56

shareShare

Gaurav Sahu

@dem_fier

5 months ago

🚀 Exciting news! Our work LitLLM has been accepted in TMLR! LitLLM helps researchers write literature reviews by combining keyword+embedding-based search, and LLM-powered reasoning to find relevant papers and generate high-quality reviews. LitLLM.github.io 🧵 (1/5)

thumb_up_off_alt78

chat_bubble_outline9

repeat32

shareShare

Dawn Song

@dawnsongtweets

5 months ago

🔥 Thrilled to announce the Agentic AI Summit 2025—the first summit dedicated to #AgenticAI in the Bay Area, hosted by UC Berkeley RDI UC Berkeley! 🚀 Building on momentum from our LLM Agents MOOC (23k+ global learners!), we're creating the LARGEST gathering of its kind—1,500+

🔥 Thrilled to announce the Agentic AI Summit 2025—the first summit dedicated to #AgenticAI in the Bay Area, hosted by <a href="/BerkeleyRDI/">UC Berkeley RDI</a> <a href="/UCBerkeley/">UC Berkeley</a>! 🚀

Building on momentum from our LLM Agents MOOC (23k+ global learners!), we're creating the LARGEST gathering of its kind—1,500+

thumb_up_off_alt203

chat_bubble_outline22

repeat40

shareShare

Andrej Karpathy

@karpathy

5 months ago

x.com/i/article/1909…

thumb_up_off_alt5,5K

chat_bubble_outline206

repeat814

shareShare

Torsten Scholak

@tscholak

5 months ago

🚨 SLAM Labs presents Apriel-5B! And it lands right in the green zone 🚨 Speed ⚡ + Accuracy 📈 + Efficiency 💸 This model punches above its weight, beating bigger LLMs while training on a fraction of the compute. Built with Fast-LLM, our in-house training stack. 🧵👇

$🚨 SLAM Labs presents Apriel-5B! And it lands right in the green zone 🚨 Speed ⚡ + Accuracy 📈 + Efficiency 💸 This model punches above its weight, beating bigger LLMs while training on a fraction of the compute. Built with Fast-LLM, our in-house training stack. 🧵👇$

thumb_up_off_alt128

chat_bubble_outline5

repeat47

shareShare

ServiceNow Research

@servicenowrsrch

5 months ago

10 Years on and now recognized by ICLR 2026 for standing up to the test-of-time. Please join us in congratulating 🇺🇦 Dzmitry Bahdanau, Kyunghyun Cho & Yoshua Bengio for their seminal work titled “Neural Machine Translation by Jointly Learning to Align and Translate”. arxiv.org/abs/1409.0473

thumb_up_off_alt35

chat_bubble_outline0

repeat7

shareShare

Gabriel Huang

@gabrielhuang9

5 months ago

1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064

thumb_up_off_alt35

chat_bubble_outline8

repeat16

shareShare

Alexandre Drouin

@alexandredrouin

5 months ago

Can your AI agent make it through DoomArena? 😈 Introducing a plug-in framework that adds a layer of security testing on top of any benchmark for AI agents.

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Krishnamurthy (Dj) Dvijotham

@djdvij

4 months ago

1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify

thumb_up_off_alt27

chat_bubble_outline1

repeat7

shareShare

Sara Hooker

@sarahookr

4 months ago

It is critical for scientific integrity that we trust our measure of progress. The lmarena.ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on lmarena.ai, despite best intentions.

It is critical for scientific integrity that we trust our measure of progress.

The <a href="/lmarena_ai/">lmarena.ai</a> has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on <a href="/lmarena_ai/">lmarena.ai</a>, despite best intentions.

thumb_up_off_alt712

chat_bubble_outline21

repeat132

shareShare

Juan A. Rodríguez 💫

@joanrod_ai

3 months ago

Thanks AK for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF). 🧠 We think we cracked SVG generalization with this one. Go read the paper! arxiv.org/abs/2505.20793 More details on

Thanks <a href="/_akhaliq/">AK</a> for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF).

🧠 We think we cracked SVG generalization with this one.

Go read the paper! arxiv.org/abs/2505.20793

More details on

thumb_up_off_alt122

chat_bubble_outline3

repeat41

shareShare

Emiliano Penaloza

@emilianopp_

3 months ago

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

thumb_up_off_alt30

chat_bubble_outline1

repeat21

shareShare

Massimo Caccia

@masscaccia

2 months ago

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞

thumb_up_off_alt197

chat_bubble_outline5

repeat46

shareShare

AK

@_akhaliq

2 months ago

How to Train Your LLM Web Agent A Statistical Diagnosis

thumb_up_off_alt227

chat_bubble_outline12

repeat34

shareShare

Alexandre Drouin

@alexandredrouin

2 months ago

📢 Attention Attention ServiceNow Research is hiring a Research Scientist with a focus on Agent Safety+Security 👩🏻‍🔬 Join us to work on impactful open research projects like 🔹DoomArena: github.com/ServiceNow/doo… 🔹BrowserGym: github.com/ServiceNow/Bro… Apply: jobs.smartrecruiters.com/ServiceNow/744…

thumb_up_off_alt18

chat_bubble_outline0

repeat9

shareShare

Massimo Caccia

@masscaccia

2 months ago

Our oral is tomorrow at 14:40 PDT during ICML Conference’s Workshop on Computer Use Agents (West Meeting Room 211–214)! Attending virtually? Zoom link & details here: icml.cc/virtual/2025/w…

thumb_up_off_alt27

chat_bubble_outline0

repeat10

shareShare