Peter Hase (@peterbhase) 's Twitter Profile
Peter Hase

@peterbhase

AI research @AnthropicAI. Interested in AI safety. PhD from UNC Chapel Hill (Google PhD Fellow). Previously: AI2, Google, Meta

ID: 1119252439050354688

linkhttps://peterbhase.github.io/ calendar_today19-04-2019 14:52:30

399 Tweet

2,2K Takipçi

813 Takip Edilen

Mohit Bansal (@mohitban47) 's Twitter Profile Photo

🚨 We have postdoc openings at UNC 🙂 Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty & universities+companies, superb quality of life/weather. Please apply + help spread the word

🚨 We have postdoc openings at UNC 🙂 

Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty & universities+companies, superb quality of life/weather. Please apply + help spread the word
Mohit Bansal (@mohitban47) 's Twitter Profile Photo

🚨 Kudos to Peter on leading this long 1-year AI+Philosophy effort on fundamental problems with model editing in LLMs (inspired by work on belief revision in philosophy)! 👏👏 ➡️ 12 core challenges across 3 areas: (1) defining the model editing problem (w.r.t. background

Anisha Gunjal (@anisha_gunjal) 's Twitter Profile Photo

Challenges of LLM fact-checking with atomic facts: 👉 Too little info? Atomic facts lack context 👉 Too much info? Risk losing error localization benefits We study this and present 🧬molecular facts🧬, which balance two key criteria: decontextuality and minimality. w/Greg Durrett

Challenges of LLM fact-checking with atomic facts:
👉 Too little info? Atomic facts lack context
👉 Too much info? Risk losing error localization benefits

We study this and present 🧬molecular facts🧬, which balance two key criteria: decontextuality and minimality.

w/<a href="/gregd_nlp/">Greg Durrett</a>
Serena Booth (@serenalbooth) 's Twitter Profile Photo

I'll be at ICML next week! I'm looking to hire a postdoc and PhD students in human-centered RL and AI governance starting Fall 2025, so get in touch if you're interested in working with me at Brown! I'm also happy to advise on applying to the AAAS fellowship or AI policy roles.

Dima Krasheninnikov (@dmkrash) 's Twitter Profile Photo

1/ Excited to finally tweet about our paper “Implicit meta-learning may lead LLMs to trust more reliable sources”, to appear at ICML 2024. Our results suggest that during training, LLMs better internalize text that appears useful for predicting other text (e.g. seems reliable).

1/ Excited to finally tweet about our paper “Implicit meta-learning may lead LLMs to trust more reliable sources”, to appear at ICML 2024. Our results suggest that during training, LLMs better internalize text that appears useful for predicting other text (e.g. seems reliable).
Peter Hase (@peterbhase) 's Twitter Profile Photo

With System 1.x reasoning, we show how to steer a model toward more or less verbalized reasoning – the model learns to balance explicit/implicit reasoning. Using this method, we can encourage an LLM to verbalize reasoning/planning that would otherwise remain hidden.

Adam Gleave (@argleave) 's Twitter Profile Photo

Anthropic comes out against current SB1047, proposes refocusing it on liability post-catastrophe (tort law++) and safety transparency (releasing RSP-style plan). Would cut: pre-catastrophe enforcement; new regulatory division. Analysis & 🔗 in 🧵👇

Anthropic comes out against current SB1047, proposes refocusing it on liability post-catastrophe (tort law++) and safety transparency (releasing RSP-style plan). Would cut: pre-catastrophe enforcement; new regulatory division. Analysis &amp; 🔗 in 🧵👇
Peter Hase (@peterbhase) 's Twitter Profile Photo

Life update: I am starting a residency at Anthropic! I will be working on research in AI safety. I have also relocated to SF! You will now find me there.

Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

New letter from Geoffrey Hinton, Yoshua Bengio, Lawrence @Lessig, and Stuart Russell urging Gov. Newsom to sign SB 1047. “We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better

New letter from <a href="/geoffreyhinton/">Geoffrey Hinton</a>, Yoshua Bengio, Lawrence @Lessig, and Stuart Russell urging Gov. Newsom to sign SB 1047.

“We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better
Mohit Bansal (@mohitban47) 's Twitter Profile Photo

🚨 Check out an exciting batch of papers this week at #ACL2024! Say hi to some of our awesome students & collaborators who are attending in person, and feel free to ask about our postdoc openings too 🙂 Topics: -- multi-agent reasoning collaboration -- structured

🚨 Check out an exciting batch of papers this week at #ACL2024!

Say hi to some of our awesome students &amp; collaborators who are attending in person, and feel free to ask about our postdoc openings too 🙂

Topics: 
-- multi-agent reasoning collaboration
-- structured
Asma Ghandeharioun (@ghandeharioun) 's Twitter Profile Photo

🧵Responses to adversarial queries can still remain latent in a safety-tuned model. Why are they revealed sometimes, but not others? And what are the mechanics of this latent misalignment? Does it matter *who* the user is? (1/n)

🧵Responses to adversarial queries can still remain latent in a safety-tuned model. Why are they revealed sometimes, but not others? And what are the mechanics of this latent misalignment? Does it matter *who* the user is? (1/n)
Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

The Linear Representation Hypothesis is now widely adopted despite its highly restrictive nature. Here, Csordás Róbert, Atticus Geiger, Christopher Manning & I present a counterexample to the LRH and argue for more expressive theories of interpretability: arxiv.org/abs/2408.10920

Senator Scott Wiener (@scott_wiener) 's Twitter Profile Photo

.Anthropic sent a letter to the Governor sharing their analysis of SB 1047. Here are the main takeaways: 1️⃣On balance, the bill is good. 2️⃣Its compliance burdens for companies are reasonable. 3️⃣Catastrophic risks from AI are real. 4️⃣Federal action is uncertain at best.

.<a href="/AnthropicAI/">Anthropic</a> sent a letter to the Governor sharing their analysis of SB 1047. Here are the main takeaways:

1️⃣On balance, the bill is good.
2️⃣Its compliance burdens for companies are reasonable.
3️⃣Catastrophic risks from AI are real.
4️⃣Federal action is uncertain at best.
Usman Anwar (@usmananwar391) 's Twitter Profile Photo

Our agenda paper on alignment and safety of LLMs just got published at TMLR: openreview.net/forum?id=oVTkO… 🥳 The revised version is also now on arxiv arxiv.org/abs/2404.09932.