Luke DH Lee (@luke_lee_ai) 's Twitter Profile
Luke DH Lee

@luke_lee_ai

Incoming AI PhD student at Berkeley. CS master’s at UCL. Formerly visiting student researcher at Stanford AI Lab.

ID: 1025186947763847168

calendar_today03-08-2018 01:09:50

45 Tweet

108 Followers

179 Following

Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

Timely and fascinating research on LLM security in multi-agent systems! The discovery of 'Prompt Infection' highlights the vulnerabilities of larger models and the critical need for robust safeguards.

Demis Hassabis (@demishassabis) 's Twitter Profile Photo

Winning the The Nobel Prize is the honour of a lifetime and the realisation of a lifelong dream - it still hasn’t really sunk in yet. With AlphaFold2 we cracked the 50-year grand challenge of protein structure prediction: predicting the 3D structure of a protein purely from its

Felix Petersen @NeurIPS (@fhkpetersen) 's Twitter Profile Photo

Excited to share our NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: arxiv.org/abs/2411.04732

Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

AI as AI compiler? Very excited to release KernelBench, a new code generation benchmark for evaluating models' ability to generate correct and efficient CUDA kernels. KernelBench has 4 levels: Level 1 (100 tasks): Single-kernel operators (e.g. matmuls) Level 2 (100 tasks):

AI as AI compiler? 

Very excited to release KernelBench, a new code generation benchmark for evaluating models' ability to generate correct and efficient CUDA kernels. 

KernelBench has 4 levels:
Level 1 (100 tasks): Single-kernel operators (e.g. matmuls)
Level 2 (100 tasks):
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Training Large Language Models to Reason in a Continuous Latent Space Introduces a new paradigm for LLM reasoning called Chain of Continuous Thought (COCONUT) Extremely simple change: instead of mapping between hidden states and language tokens using the LLM head and embedding

Training Large Language Models to Reason in a Continuous Latent Space

Introduces a new paradigm for LLM reasoning called Chain of Continuous Thought (COCONUT)

Extremely simple change: instead of mapping between hidden states and language tokens using the LLM head and embedding
Jeffrey Scholz (@jeyffre) 's Twitter Profile Photo

I read Google's paper about their quantum computer so you don't have to. They claim to have ran a quantum computation in 5 minutes that would take a normal computer 10^25 years. But what was that computation? Does it live up to the hype? I will break it down.🧵

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

The most bullish AI capability I'm looking for is not whether it's able to solve PhD grade problems. It's whether you'd hire it as a junior intern. Not "solve this theorem" but "get your slack set up, read these onboarding docs, do this task and let's check in next week".

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.

New Anthropic research: Alignment faking in large language models.

In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Thanks for covering our work on test time scaling! Turns out repeated sampling alone is surprisingly effective (~ log linear relationship between num samples and coverage across many reasoning tasks) and even better if combined with sequential “thinking”!

Jeremy Berman (@jerber888) 's Twitter Profile Photo

Announcing LANG-JEPA — a new language model architecture I’m working on that optimizes in “concept” space instead of “token” space. Inspired by Yann LeCun's JEPA (I-JEPA for images, V-JEPA for video), LANG-JEPA asks: What if we train for conceptual understanding directly, rather

Announcing LANG-JEPA — a new language model architecture I’m working on that optimizes in “concept” space instead of “token” space. Inspired by <a href="/ylecun/">Yann LeCun</a>'s JEPA (I-JEPA for images, V-JEPA for video), LANG-JEPA asks: What if we train for conceptual understanding directly, rather
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

We are releasing CodeMonkeys, a system for solving SWE-bench problems with a focus on careful parallel and serial scaling of test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified and and running our selection mechanism on an ensemble of existing top

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are

Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this thread🧵 and our paper📄: arxiv.org/abs/2502.17776

Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved!

The following amazing work theoretically proves the necessary and
Luke DH Lee (@luke_lee_ai) 's Twitter Profile Photo

Beginning of autonomous bug discovery & defense! 🔥 AI agents now match elite hackers — 15 zero-days found, $30K+ bugs patched. Huge milestone by Dawn Song & team!

Seonglae Cho (@seonglaec) 's Twitter Profile Photo

New paper! Rare SAE dataset approach: We train Sparse Autoencoders using only synthetic data generated by the model itself, revealing features that truly reflect what’s inside the model.

New paper! Rare SAE dataset approach:
We train Sparse Autoencoders using only synthetic data generated by the model itself, revealing features that truly reflect what’s inside the model.
Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

My group & collaborators have developed many popular benchmarks over the years, e.g., MMLU, MATH, APPS---really excited about our latest benchmark OMEGA Ω: 🔍Can LLMs really think outside the box in math? a new benchmark probing 3 axes of generalization: 1️⃣ Exploratory 2️⃣