Arkil Patel (@arkil_patel) Twitter Tweets • TwiCopy

Arkil Patel

8 months ago

Llamas browsing the web look cute, but they are capable of causing a lot of harm! Check out our new Web Agents ∩ Safety benchmark: SafeArena! Paper: arxiv.org/abs/2503.04957

thumb_up_off_alt16

chat_bubble_outline1

repeat4

shareShare

The potential for malicious misuse of LLM agents is a serious threat. That's why we created SafeArena, a safety benchmark for web agents. See the thread and our paper for details: arxiv.org/abs/2503.04957 👇

thumb_up_off_alt23

chat_bubble_outline0

repeat5

shareShare

Nicholas Meade

@ncmeade

8 months ago

Web agents, powered by LLMs like GPT-4o and Llama3, can easily be used to automate harmful tasks, such as posting misinformation in online forums! We release 𝗦𝗮𝗳𝗲𝗔𝗿𝗲𝗻𝗮, the first benchmark for evaluating web agent malicious misuse. Paper: arxiv.org/abs/2503.04957 👇

thumb_up_off_alt14

chat_bubble_outline0

repeat5

shareShare

Siva Reddy

@sivareddyg

8 months ago

LLM alignment doesn't transfer to Web Agents. SafeArena is a simple web environment and testbed to test the safety of agents, built on WebArena. A huge team effort that was highly self-driven 💪 safearena.github.io

thumb_up_off_alt44

chat_bubble_outline1

repeat14

shareShare

Parishad BehnamGhader

@parishadbehnam

8 months ago

Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣 Retrievers need to be aligned too! 🚨🚨🚨 Work done with the wonderful Nicholas Meade and Siva Reddy 🔗 mcgill-nlp.github.io/malicious-ir/ Thread: 🧵👇

thumb_up_off_alt42

chat_bubble_outline2

repeat16

shareShare

Nicholas Meade

@ncmeade

8 months ago

Lots of harmful and sensitive information exists on the internet and retrievers with instruction-following capabilities will become increasingly good tools for searching through it! We explore the safety risks associated with retriever malicious misuse👇

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Siva Reddy

@sivareddyg

7 months ago

Introducing the DeepSeek-R1 Thoughtology -- the most comprehensive study of R1 reasoning chains/thoughts ✨. Probably everything you need to know about R1 thoughts. If we missed something, please let us know.

thumb_up_off_alt81

chat_bubble_outline0

repeat21

shareShare

Siva Reddy

@sivareddyg

7 months ago

I will be giving a talk about this work Simons Institute for the Theory of Computing tomorrow (Apr 2nd 3PM PT). Join us, both in-person or virtually. simons.berkeley.edu/workshops/futu…

thumb_up_off_alt53

chat_bubble_outline1

repeat10

shareShare

Arkil Patel

@arkil_patel

7 months ago

Watch Siva’s talk on thoughtology: youtube.com/live/aO_cTIY9K…

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Amirhossein Kazemnejad

@a_kazemnejad

7 months ago

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat164

shareShare

Arkil Patel

@arkil_patel

7 months ago

Thoughtology is trending today on hf daily papers! Read our paper for a detailed analysis of R1’s long chains of thoughts across a variety of settings. huggingface.co/papers/2504.07…

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Amirhossein Kazemnejad

@a_kazemnejad

7 months ago

I think one of the most underrated sources of insight in research is just looking at the model's outputs. The Thoughtology paper is what happens when an entire lab of grad students at Mila do this cumbersome task for R1's CoT and actually quantifies all the patterns we saw.

thumb_up_off_alt51

chat_bubble_outline1

repeat8

shareShare

Xing Han Lu

@xhluca

7 months ago

DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning 142-page report diving into the reasoning chains of R1. It spans 9 unique axes: safety, world modeling, faithfulness, long context, etc.

thumb_up_off_alt742

chat_bubble_outline5

repeat138

shareShare

Xing Han Lu

@xhluca

7 months ago

DeepSeek-R1 Thoughtology now #2 on Hugging Face daily papers Thanks for building this great platform for sharing new papers AK

DeepSeek-R1 Thoughtology now #2 on <a href="/huggingface/">Hugging Face</a> daily papers

Thanks for building this great platform for sharing new papers <a href="/_akhaliq/">AK</a>

thumb_up_off_alt112

chat_bubble_outline2

repeat20

shareShare

🇺🇦 Dzmitry Bahdanau

@dbahdanau

7 months ago

ICLR 2025 many many many thanks to Kyunghyun Cho and Yoshua Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲

thumb_up_off_alt273

chat_bubble_outline13

repeat16

shareShare

Arkil Patel

@arkil_patel

7 months ago

Super timely work led by Xing Han Lu with extensive human evaluation of agent trajectories across multiple benchmarks and LLMs!

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Amirhossein Kazemnejad

@a_kazemnejad

7 months ago

A key reason RL for web agents hasn’t fully taken off is the lack of robust reward models. No matter the algorithm (PPO, GRPO), we can’t reliably do RL without a reward signal. With AgentRewardBench, we introduce the first benchmark aiming to kickstart progress in this space.

thumb_up_off_alt96

chat_bubble_outline2

repeat22

shareShare

AK

@_akhaliq

7 months ago

AgentRewardBench Evaluating Automatic Evaluations of Web Agent Trajectories

thumb_up_off_alt159

chat_bubble_outline3

repeat28

shareShare

Kabir

@kabirahuja004

6 months ago

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ Melanie Sclar, and tsvetshop 1/n

thumb_up_off_alt241

chat_bubble_outline3

repeat46

shareShare

Arkil Patel

Arkil Patel

Karolina Stanczak

Nicholas Meade

Siva Reddy

Parishad BehnamGhader

Nicholas Meade

Siva Reddy

Siva Reddy

Arkil Patel

Amirhossein Kazemnejad

Arkil Patel

Amirhossein Kazemnejad

Xing Han Lu

Xing Han Lu

🇺🇦 Dzmitry Bahdanau

Arkil Patel

Amirhossein Kazemnejad

AK

Kabir