Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile
Marius Hobbhahn

@mariushobbhahn

CEO at Apollo Research @apolloaievals

prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch

ID: 1012809976224641024

linkhttps://www.mariushobbhahn.com calendar_today29-06-2018 21:28:10

944 Tweet

3,3K Followers

1,1K Following

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

I'm very glad that detailed AI system cards are the norm. There could have been another world in which the general public knew basically almost nothing about the dangerous capabilities and propensities of frontier systems.

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

System cards are an example of something that seems irrational if in the short term ("makes you look bad"), but rational in the medium and long term ("you're more trustworthy" & sharing safety knowledge) I'm glad that the labs are defying their myopic incentives. Yay humanity

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

It's also worth pointing out that models from all providers are willing to do this under the right circumstances. Claude has a higher propensity to do so, but it's not the only one. I think this just emerges from scale+RL+HHH.

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

LLMs are getting rapidly more evals aware! Afaik, nobody has a good plan for what to do when the models constantly say "This is an eval testing for X. Let's say what the developers want to hear" during evals.

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

We're hiring for an Evals Software Engineer with a heavy focus on Infrastructure. Design, build, maintain, and secure our Infrastructure. Deadline: 22 June.  If in doubt, just apply. It takes 5-10 minutes. jobs.lever.co/apolloresearch…

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

I often hear that it will take decades for AI non-adopters to be outcompeted by AI adopters due to slow diffusion. I think coding is a datapoint against. Cursor+frontier model is already so much faster and we haven't even started with coding agent swarms yet.