Xing Han Lu (@xhluca) 's Twitter Profile
Xing Han Lu

@xhluca

Vibe agents @Mila_Quebec

ID: 943571700746211328

linkhttp://xinghanlu.com calendar_today20-12-2017 19:59:58

2,2K Tweet

2,2K Followers

290 Following

Xing Han Lu (@xhluca) 's Twitter Profile Photo

incredible how fast Google indexed the AWI paper and discussing it inside the AI overview when googling "build the web for agents" the paper has been barely out for <30h...

incredible how fast <a href="/Google/">Google</a> indexed the AWI paper and discussing it inside the AI overview when googling "build the web for agents"

the paper has been barely out for &lt;30h...
Rtzr (@ryan_tzr) 's Twitter Profile Photo

The biggest issue web agents face: AUTHENTICATION WALLS 🔐 Twitter, Instagram, LinkedIn, news sites - everything requires login now so yeah if there's an agentic web interface that solves this problem it would be nice

Hanseok Oh (@hanseok_oh) 's Twitter Profile Photo

Life update: I am joining as visiting researcher at Mila - Institut québécois d'IA 🇨🇦. I returned to academia to deepen my understanding of how conversational agents can appropriately utilize information for better human interaction.

XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!
🤔Which VLMs act better as computer use agents (CUAs)?

1, Claude Sonnet 4 🥇
2, Claude 3.7 Sonnet 🥈
3, UI-TARS-1.5 🥉
4, Operator

More insights in the thread 👇
arena.xlang.ai
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.

🚨Excited to release OS-Harm! 🚨

The safety of computer use agents has been largely overlooked. 

We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
Xing Han Lu (@xhluca) 's Twitter Profile Photo

Very important benchmark about the safety of computer use agents. Validates our findings in SafeArena (safearena.github.io) that agents can complete harmful tasks - now with reasoning models and on OS tasks. We need safer digital agents asap before more productization

Benno Krojer (@benno_krojer) 's Twitter Profile Photo

The video is online now! 3min speed science talk on "From a soup of raw pixels to abstract meaning" youtu.be/AHsoMYG2Vqk?si…

The video is online now!

3min speed science talk on "From a soup of raw pixels to abstract meaning"

youtu.be/AHsoMYG2Vqk?si…
ACLRollingReview (@reviewacl) 's Twitter Profile Photo

Dear ACL community, We are seeking emergency reviewers for the May cycle. Please indicate your availability (ASAP) if you can help review extra papers urgently (by the 24th of June AOE). Many thanks!

Benno Krojer (@benno_krojer) 's Twitter Profile Photo

Started a new podcast with Tomás Vergara Browne ! Behind the Research of AI: We look behind the scenes, beyond the polished papers 🧐🧪 If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel from Mila - Institut québécois d'IA: open.spotify.com/episode/7oTcqr…

Started a new podcast with <a href="/tvergarabrowne/">Tomás Vergara Browne</a> !

Behind the Research of AI: 
We look behind the scenes, beyond the polished papers 🧐🧪 

If this sounds fun, check out our first "official" episode with the awesome <a href="/gauthier_gidel/">Gauthier Gidel</a> from <a href="/Mila_Quebec/">Mila - Institut québécois d'IA</a>:

open.spotify.com/episode/7oTcqr…
Cesare Spinoso-Di Piano (@cesare_spinoso) 's Twitter Profile Photo

A blizzard is raging in Montreal when your friend says “Wow, the weather is amazing!” Humans easily interpret irony, while LLMs struggle at it. We propose a 𝘳𝘩𝘦𝘵𝘰𝘳𝘪𝘤𝘢𝘭-𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘺-𝘢𝘸𝘢𝘳𝘦 probabilistic framework as a solution. arxiv.org/abs/2506.09301 @ #acl2025

A blizzard is raging in Montreal when your friend says “Wow, the weather is amazing!” Humans easily interpret irony, while LLMs struggle at it. We propose a 𝘳𝘩𝘦𝘵𝘰𝘳𝘪𝘤𝘢𝘭-𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘺-𝘢𝘸𝘢𝘳𝘦 probabilistic framework as a solution. arxiv.org/abs/2506.09301 @ #acl2025
Xing Han Lu (@xhluca) 's Twitter Profile Photo

WebAgentlab Would appreciate if the authors could avoid copying the title of our paper, which was release more than 2 months ago: x.com/xhluca/status/…

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️

Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge
- 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor
-
Verna Dankers (@vernadankers) 's Twitter Profile Photo

I miss Edinburgh and its wonderful people already!! Thanks to Tal Linzen and Edoardo Ponti for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join Siva Reddy's wonderful lab Mila - Institut québécois d'IA 🤩

kyutai (@kyutai_labs) 's Twitter Profile Photo

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: kyutai.org/next/tts

BlackboxNLP (@blackboxnlp) 's Twitter Profile Photo

🚨 Excited to announce two invited speakers at #BlackboxNLP 2025! Join us to hear from two leading voices in interpretability: 🎙️ Quanshi Zhang (Shanghai Jiao Tong University) 🎙️ Verna Dankers (McGill University) Verna Dankers Quanshi Zhang

🚨 Excited to announce two invited speakers at #BlackboxNLP 2025!

Join us to hear from two leading voices in interpretability:
🎙️ Quanshi Zhang (Shanghai Jiao Tong University)
🎙️ Verna Dankers (McGill University)

<a href="/vernadankers/">Verna Dankers</a> <a href="/QuanshiZhang/">Quanshi Zhang</a>