Arkil Patel (@arkil_patel) 's Twitter Profile
Arkil Patel

@arkil_patel

CS PhD Student at Mila and McGill | Worked at AllenNLP and Microsoft Research

ID: 784049619936219136

linkhttp://arkilpatel.github.io calendar_today06-10-2016 15:16:11

205 Tweet

988 Followers

992 Following

Arkil Patel (@arkil_patel) 's Twitter Profile Photo

Llamas browsing the web look cute, but they are capable of causing a lot of harm! Check out our new Web Agents ∩ Safety benchmark: SafeArena! Paper: arxiv.org/abs/2503.04957

Llamas browsing the web look cute, but they are capable of causing a lot of harm! 

Check out our new Web Agents ∩ Safety benchmark: SafeArena!

Paper: arxiv.org/abs/2503.04957
Karolina Stanczak (@karstanczak) 's Twitter Profile Photo

The potential for malicious misuse of LLM agents is a serious threat. That's why we created SafeArena, a safety benchmark for web agents. See the thread and our paper for details: arxiv.org/abs/2503.04957 👇

Nicholas Meade (@ncmeade) 's Twitter Profile Photo

Web agents, powered by LLMs like GPT-4o and Llama3, can easily be used to automate harmful tasks, such as posting misinformation in online forums! We release 𝗦𝗮𝗳𝗲𝗔𝗿𝗲𝗻𝗮, the first benchmark for evaluating web agent malicious misuse. Paper: arxiv.org/abs/2503.04957 👇

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

LLM alignment doesn't transfer to Web Agents. SafeArena is a simple web environment and testbed to test the safety of agents, built on WebArena. A huge team effort that was highly self-driven 💪 safearena.github.io

Parishad BehnamGhader (@parishadbehnam) 's Twitter Profile Photo

Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣 Retrievers need to be aligned too! 🚨🚨🚨 Work done with the wonderful Nicholas Meade and Siva Reddy 🔗 mcgill-nlp.github.io/malicious-ir/ Thread: 🧵👇

Nicholas Meade (@ncmeade) 's Twitter Profile Photo

Lots of harmful and sensitive information exists on the internet and retrievers with instruction-following capabilities will become increasingly good tools for searching through it! We explore the safety risks associated with retriever malicious misuse👇

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Introducing the DeepSeek-R1 Thoughtology -- the most comprehensive study of R1 reasoning chains/thoughts ✨. Probably everything you need to know about R1 thoughts. If we missed something, please let us know.

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (&lt;700 lines)

- super hackable
- no TRL / Verl, no abstraction💆‍♂️
- Single GPU, full param tuning, 3B LLM
- Efficient (R1-zero countdown &lt; 10h)

comes with a from-scratch, fully spelled out YT video [1/n]
Arkil Patel (@arkil_patel) 's Twitter Profile Photo

Thoughtology is trending today on hf daily papers! Read our paper for a detailed analysis of R1’s long chains of thoughts across a variety of settings. huggingface.co/papers/2504.07…

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

I think one of the most underrated sources of insight in research is just looking at the model's outputs. The Thoughtology paper is what happens when an entire lab of grad students at Mila do this cumbersome task for R1's CoT and actually quantifies all the patterns we saw.

Xing Han Lu (@xhluca) 's Twitter Profile Photo

DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning 142-page report diving into the reasoning chains of R1. It spans 9 unique axes: safety, world modeling, faithfulness, long context, etc.

DeepSeek-R1 Thoughtology: Let’s &lt;think&gt; about LLM reasoning

142-page report diving into the reasoning chains of R1. It spans 9 unique axes: safety, world modeling, faithfulness, long context, etc.
🇺🇦 Dzmitry Bahdanau (@dbahdanau) 's Twitter Profile Photo

ICLR 2025 many many many thanks to Kyunghyun Cho and Yoshua Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

A key reason RL for web agents hasn’t fully taken off is the lack of robust reward models. No matter the algorithm (PPO, GRPO), we can’t reliably do RL without a reward signal. With AgentRewardBench, we introduce the first benchmark aiming to kickstart progress in this space.

Kabir (@kabirahuja004) 's Twitter Profile Photo

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ Melanie Sclar, and tsvetshop 1/n

📢 New Paper!

Tired 😴 of reasoning benchmarks full of math &amp; code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎

W/ <a href="/melaniesclar/">Melanie Sclar</a>, and <a href="/tsvetshop/">tsvetshop</a>

1/n