Davis Brown (@davisbrownr) Twitter Tweets • TwiCopy

Davis Brown

@davisbrownr

+ Follow

Research in science of {deep learning, AI security, safety}. PhD student at UPenn & RS at @PNNLab

ID: 876990739179294720

linkhttps://davisrbrown.com/ calendar_today20-06-2017 02:31:00

162 Tweet

406 Takipçi

977 Takip Edilen

Daniel Paleka

@dpaleka

7 months ago

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

thumb_up_off_alt3,3K

chat_bubble_outline36

repeat269

shareShare

Daniel Paleka

@dpaleka

6 months ago

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

thumb_up_off_alt82

chat_bubble_outline5

repeat12

shareShare

hoagy

@hoagycunningham

5 months ago

New Anthropic blog: We benchmark approaches to making classifiers more cost-effective by reusing activations from the model being queried. We find that using linear probes or retraining just a single layer of the model can push the cost-effectiveness frontier. 🧵1/

thumb_up_off_alt125

chat_bubble_outline9

repeat15

shareShare

Adam Stein

@adamlsteinl

4 months ago

Excited to share our new paper: "Instruction Following by Boosting Attention of Large Language Models"! We introduce Instruction Attention Boosting (InstABoost), a simple yet powerful method to steer LLM behavior by making them pay more attention to instructions. (🧵1/7)

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Jaime Sevilla

@jsevillamol

3 months ago

A conversation with Ryan Greenblatt : can AI automate AI R&D before 2030, and will this lead to a fast takeoff? Highlights: 1. We discuss whether we will achieve AI that can automate AI R&D work by 2030, and whether this could lead to a million-fold improvement in training

thumb_up_off_alt71

chat_bubble_outline4

repeat5

shareShare

Cas (Stephen Casper)

@stephenlcasper

3 months ago

A personal update: - I just finished my 6-month residency at AI Security Institute. - I'm going back to MIT for the final year of my PhD. - I'm on the postdoc and faculty job markets this fall!

A personal update:
- I just finished my 6-month residency at <a href="/AISecurityInst/">AI Security Institute</a>.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!

thumb_up_off_alt578

chat_bubble_outline12

repeat24

shareShare

Xander Davies

@alxndrdavies

2 months ago

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

thumb_up_off_alt290

chat_bubble_outline8

repeat63

shareShare

Sagnik Anupam

@sagnikanupam

a month ago

Introducing an evaluation platform for web agents– BrowserArena! Combining the awesome lmarena.ai platform with Browser Use, we rank LLMs side-by-side to compare their ability to solve web navigation tasks! Users vote for models using GIFs and text outputs to judge task performance.

thumb_up_off_alt6

chat_bubble_outline2

repeat4

shareShare