Xing Han Lu (@xhluca) Twitter Tweets • TwiCopy

Yu Su @#ICLR2025

@ysu_nlp

6 months ago

we need out-of-the-box thinking for agentic web

thumb_up_off_alt19

chat_bubble_outline2

repeat5

shareShare

Xing Han Lu

@xhluca

6 months ago

incredible how fast Google indexed the AWI paper and discussing it inside the AI overview when googling "build the web for agents" the paper has been barely out for <30h...

incredible how fast <a href="/Google/">Google</a> indexed the AWI paper and discussing it inside the AI overview when googling "build the web for agents"

the paper has been barely out for <30h...

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

The biggest issue web agents face: AUTHENTICATION WALLS 🔐 Twitter, Instagram, LinkedIn, news sites - everything requires login now so yeah if there's an agentic web interface that solves this problem it would be nice

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Hanseok Oh

@hanseok_oh

6 months ago

Life update: I am joining as visiting researcher at Mila - Institut québécois d'IA 🇨🇦. I returned to academia to deepen my understanding of how conversational agents can appropriately utilize information for better human interaction.

thumb_up_off_alt61

chat_bubble_outline0

repeat4

shareShare

XLANG NLP Lab

@xlangnlp

6 months ago

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

thumb_up_off_alt38

chat_bubble_outline1

repeat18

shareShare

Julien Chaumond

@julien_c

6 months ago

every Gradio space is now a MCP tool you can add to our MCP server in 1 click 🤯

every <a href="/Gradio/">Gradio</a> space is now a MCP tool you can add to our MCP server in 1 click 🤯

thumb_up_off_alt62

chat_bubble_outline10

repeat14

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

6 months ago

🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.

thumb_up_off_alt94

chat_bubble_outline3

repeat26

shareShare

Xing Han Lu

@xhluca

6 months ago

Very important benchmark about the safety of computer use agents. Validates our findings in SafeArena (safearena.github.io) that agents can complete harmful tasks - now with reasoning models and on OS tasks. We need safer digital agents asap before more productization

thumb_up_off_alt25

chat_bubble_outline0

repeat7

shareShare

Benno Krojer

@benno_krojer

6 months ago

The video is online now! 3min speed science talk on "From a soup of raw pixels to abstract meaning" youtu.be/AHsoMYG2Vqk?si…

thumb_up_off_alt39

chat_bubble_outline0

repeat6

shareShare

ACLRollingReview

@reviewacl

6 months ago

Dear ACL community, We are seeking emergency reviewers for the May cycle. Please indicate your availability (ASAP) if you can help review extra papers urgently (by the 24th of June AOE). Many thanks!

thumb_up_off_alt33

chat_bubble_outline1

repeat16

shareShare

Benno Krojer

@benno_krojer

6 months ago

Started a new podcast with Tomás Vergara Browne ! Behind the Research of AI: We look behind the scenes, beyond the polished papers 🧐🧪 If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel from Mila - Institut québécois d'IA: open.spotify.com/episode/7oTcqr…

Started a new podcast with <a href="/tvergarabrowne/">Tomás Vergara Browne</a> !

Behind the Research of AI:
We look behind the scenes, beyond the polished papers 🧐🧪

If this sounds fun, check out our first "official" episode with the awesome <a href="/gauthier_gidel/">Gauthier Gidel</a> from <a href="/Mila_Quebec/">Mila - Institut québécois d'IA</a>:

open.spotify.com/episode/7oTcqr…

thumb_up_off_alt41

chat_bubble_outline1

repeat13

shareShare

Cesare Spinoso-Di Piano

@cesare_spinoso

5 months ago

A blizzard is raging in Montreal when your friend says “Wow, the weather is amazing!” Humans easily interpret irony, while LLMs struggle at it. We propose a 𝘳𝘩𝘦𝘵𝘰𝘳𝘪𝘤𝘢𝘭-𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘺-𝘢𝘸𝘢𝘳𝘦 probabilistic framework as a solution. arxiv.org/abs/2506.09301 @ #acl2025

thumb_up_off_alt11

chat_bubble_outline1

repeat11

shareShare

Xing Han Lu

@xhluca

5 months ago

WebAgentlab Would appreciate if the authors could avoid copying the title of our paper, which was release more than 2 months ago: x.com/xhluca/status/…

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Yu Su @#ICLR2025

@ysu_nlp

5 months ago

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -

thumb_up_off_alt211

chat_bubble_outline3

repeat45

shareShare

Verna Dankers

@vernadankers

5 months ago

I miss Edinburgh and its wonderful people already!! Thanks to Tal Linzen and Edoardo Ponti for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join Siva Reddy's wonderful lab Mila - Institut québécois d'IA 🤩

thumb_up_off_alt88

chat_bubble_outline10

repeat8

shareShare

kyutai

@kyutai_labs

5 months ago

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: kyutai.org/next/tts

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat171

shareShare

BlackboxNLP

@blackboxnlp

5 months ago

🚨 Excited to announce two invited speakers at #BlackboxNLP 2025! Join us to hear from two leading voices in interpretability: 🎙️ Quanshi Zhang (Shanghai Jiao Tong University) 🎙️ Verna Dankers (McGill University) Verna Dankers Quanshi Zhang

thumb_up_off_alt36

chat_bubble_outline0

repeat10

shareShare

Yoav Artzi

@yoavartzi

5 months ago

Conference on Language Modeling decisions are out, and so are we The strength of submissions this year amazed us! Many many hard decisions 😩 + Aditi Raghunathan, Eunsol Choi, Ranjay Krishna 😴😴😴

<a href="/COLM_conf/">Conference on Language Modeling</a> decisions are out, and so are we

The strength of submissions this year amazed us! Many many hard decisions 😩

+ <a href="/AdtRaghunathan/">Aditi Raghunathan</a>, <a href="/eunsolc/">Eunsol Choi</a>, <a href="/RanjayKrishna/">Ranjay Krishna</a> 😴😴😴

thumb_up_off_alt72

chat_bubble_outline2

repeat8

shareShare

Xing Han Lu

Yu Su @#ICLR2025

Xing Han Lu

Rtzr

Hanseok Oh

XLANG NLP Lab

Julien Chaumond

Maksym Andriushchenko @ ICLR

Xing Han Lu

Benno Krojer

ACLRollingReview

Benno Krojer

Cesare Spinoso-Di Piano

Xing Han Lu

Yu Su @#ICLR2025

Verna Dankers

kyutai

BlackboxNLP

Yoav Artzi