Victor Lecomte (@vclecomte) 's Twitter Profile
Victor Lecomte

@vclecomte

CS PhD student at Stanford / Researcher at the Alignment Research Center

ID: 846711415771664384

linkhttps://vlecomte.github.io/ calendar_today28-03-2017 13:11:47

241 Tweet

653 Followers

201 Following

Victor Lecomte (@vclecomte) 's Twitter Profile Photo

My first dabble at studying learning dynamics (and at AI safety-related work)! It was a lot of fun figuring out the exact speed at which encodings get sparser under L1-regularization; I didn't expect the math to end up being so nice. 🙂

kushal thaman (at ICLR 🇸🇬) (@kushal1t) 's Twitter Profile Photo

Excited to share the first paper of my undergrad: "Incidental Polysemanticity" arxiv.org/abs/2312.03096! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with Victor Lecomte trevor (taylor’s version) Rylan Schaeffer Sanmi Koyejo (1/n)

Excited to share the first paper of my undergrad: "Incidental Polysemanticity" arxiv.org/abs/2312.03096! 

We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs.

Done in collaboration with <a href="/vclecomte/">Victor Lecomte</a> <a href="/tmychow/">trevor (taylor’s version)</a> <a href="/RylanSchaeffer/">Rylan Schaeffer</a> <a href="/sanmikoyejo/">Sanmi Koyejo</a> 

(1/n)
Eric Neyman (@ericneyman) 's Twitter Profile Photo

Last week, ARC put out a new paper! The paper is a research update on the "heuristic estimation" direction of our research into explaining neural network behavior. The paper starts by explaining what we mean by "heuristic estimation", through an example and three analogies 🧵

Gabriel Wu (@gabrieldwu1) 's Twitter Profile Photo

The Alignment Research Center (ARC) just released our first empirical paper: Estimating the Probabilities of Rare Outputs in Language Models. In this thread, I'll motivate the problem of low probability estimation and describe our setting/methods. 🧵

METR (@metr_evals) 's Twitter Profile Photo

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.

How close are current AI agents to automating AI R&amp;D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
Ryan Greenblatt (@ryanpgreenblatt) 's Twitter Profile Photo

New Redwood Research (Redwood Research) paper in collaboration with Anthropic: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)

Rob Wiblin (@robertwiblin) 's Twitter Profile Photo

A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls "the theft of the millennium." The simple facts of the case are both devastating and darkly hilarious. I'll

A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls "the theft of the millennium."

The simple facts of the case are both devastating and darkly hilarious.

I'll
Victor Lecomte (@vclecomte) 's Twitter Profile Photo

A cute question about inner product sketching came up in our research; any leads would be appreciated! 🙂 cstheory.stackexchange.com/questions/5539…