Davis Brown (@davisbrownr) 's Twitter Profile
Davis Brown

@davisbrownr

Research in science of {deep learning, AI security, safety}. PhD student at UPenn & RS at @PNNLab

ID: 876990739179294720

linkhttps://davisrbrown.com/ calendar_today20-06-2017 02:31:00

162 Tweet

406 Followers

977 Following

Daniel Paleka (@dpaleka) 's Twitter Profile Photo

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

Daniel Paleka (@dpaleka) 's Twitter Profile Photo

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations đź§µ (1/7)

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations đź§µ (1/7)
hoagy (@hoagycunningham) 's Twitter Profile Photo

New Anthropic blog: We benchmark approaches to making classifiers more cost-effective by reusing activations from the model being queried. We find that using linear probes or retraining just a single layer of the model can push the cost-effectiveness frontier. đź§µ1/

New Anthropic blog: We benchmark approaches to making classifiers more cost-effective by reusing activations from the model being queried. We find that  using linear probes or retraining just a single layer of the model can push the cost-effectiveness frontier. đź§µ1/
Adam Stein (@adamlsteinl) 's Twitter Profile Photo

Excited to share our new paper: "Instruction Following by Boosting Attention of Large Language Models"! We introduce Instruction Attention Boosting (InstABoost), a simple yet powerful method to steer LLM behavior by making them pay more attention to instructions. (đź§µ1/7)

Jaime Sevilla (@jsevillamol) 's Twitter Profile Photo

A conversation with Ryan Greenblatt : can AI automate AI R&D before 2030, and will this lead to a fast takeoff? Highlights: 1. We discuss whether we will achieve AI that can automate AI R&D work by 2030, and whether this could lead to a million-fold improvement in training

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

A personal update: - I just finished my 6-month residency at AI Security Institute. - I'm going back to MIT for the final year of my PhD. - I'm on the postdoc and faculty job markets this fall!

A personal update:
- I just finished my 6-month residency at <a href="/AISecurityInst/">AI Security Institute</a>.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!
Xander Davies (@alxndrdavies) 's Twitter Profile Photo

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
Sagnik Anupam (@sagnikanupam) 's Twitter Profile Photo

Introducing an evaluation platform for web agents– BrowserArena! Combining the awesome lmarena.ai platform with Browser Use, we rank LLMs side-by-side to compare their ability to solve web navigation tasks! Users vote for models using GIFs and text outputs to judge task performance.