Hugh Zhang (@hughbzhang) 's Twitter Profile
Hugh Zhang

@hughbzhang

research @scale_AI. co-created @gradientpub.

ID: 1075466686709395457

linkhttp://hughbzhang.com calendar_today19-12-2018 19:03:34

420 Tweet

3,3K Takipçi

1,1K Takip Edilen

Furong Huang (@furongh) 's Twitter Profile Photo

Our very own Evan Wang visited us back at UMD today and gave an awesome talk. Check out his paper here to see how planning improves pass@k significantly for coding problems: arxiv.org/abs/2409.03733

Our very own <a href="/evanzwangg/">Evan Wang</a> visited us back at UMD today and gave an awesome talk. Check out his paper here to see how planning improves pass@k significantly for coding problems: arxiv.org/abs/2409.03733
Daniel Litt (@littmath) 's Twitter Profile Photo

Really good thread trying to guess the x-axis of the plots OpenAI released showing how GPT-o1 scales on AIME (a fairly tricky math contest for high schoolers) with test-time compute. The upshot, as I understand it, is that you can get near 80% for $50 worth of tokens.

Anna Goldie (@annadgoldie) 's Twitter Profile Photo

In 2020, we introduced an AI method capable of generating superhuman chip layouts. Today, we describe its impact on the field and give it a name: AlphaChip!

Tanay Kothari (@tankots) 's Twitter Profile Photo

Building a voice interface that feels magical was my childhood dream since I was 10. They say you spend your lives chasing your dreams. Today, 16 years later, I think we built magic. Here's to the insane @WisprAI team that made it happen 🔥

Xander Davies (@alxndrdavies) 's Twitter Profile Photo

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with Gray Swan AI! 🧵 1/N

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with <a href="/GraySwanAI/">Gray Swan AI</a>!
🧵 1/N
Zifan (Sail) Wang (@_zifan_wang) 's Twitter Profile Photo

(1/7) Excited to share our new red teaming work at Scale, Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents. We find jailbreaking LLM agents that use browsers is surprisingly easy. In many cases, you can just direct ask! Paper & Project page: scale.com/research/brows…

(1/7) Excited to share our new red teaming work at Scale, Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents. We find jailbreaking LLM agents that use browsers is surprisingly easy. In many cases, you can just direct ask!

Paper &amp; Project page: scale.com/research/brows…
Summer Yue (@summeryue0) 's Twitter Profile Photo

Excited to share our latest research on red teaming and agent safety from SEAL team at Scale AI . This work highlights a critical gap: safety mechanisms in advanced LLMs do not generalize well to downstream browser agents. We also found that LLM attacks transfer with high

Miles Turpin (@milesaturpin) 's Twitter Profile Photo

Really excited that this paper is out now! We show that models are capable of a basic form of introspection. Scaling this to more advanced forms would have major ramifications for safety, interpretability, and the moral status of AI systems.

Jason Wei (@_jasonwei) 's Twitter Profile Photo

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA:

1. Very simple setup: there
daniel bashir (@spaniel_bashir) 's Twitter Profile Photo

We’re low on editorial bandwidth, so we’re making a few (hopefully temporary!) changes to our process at The Gradient — I sat down with Hugh Zhang and Andrey Kurenkov to discuss our history and where things stand thegradient.pub/podcasts/some-…