John Thickstun (@jwthickstun) 's Twitter Profile
John Thickstun

@jwthickstun

Assistant Professor @Cornell_CS.

Previously @StanfordCRFM @stanfordnlp @uwcse

Controllable Generative Models. AI for Music.

ID: 1232444191889711104

linkhttps://johnthickstun.com/ calendar_today25-02-2020 23:16:19

287 Tweet

1,1K Takipçi

602 Takip Edilen

Volodymyr Kuleshov 🇺🇦 (@volokuleshov) 's Twitter Profile Photo

If you’re at #iclr2025, you should catch Cornell PhD student Yair Schiff—check out his new paper that derives classifier-based and classifier-free guidance for discrete diffusion models.

If you’re at #iclr2025, you should catch Cornell PhD student <a href="/SchiffYair/">Yair Schiff</a>—check out his new paper that derives classifier-based and classifier-free guidance for discrete diffusion models.
Rose (@rose_e_wang) 's Twitter Profile Photo

I defended my PhD from Stanford CS Stanford NLP Group 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations & interventions to improve Education. So much gratitude for the support & love - and SO excited for the next chapter!!!! 🥳

I defended my PhD from Stanford CS <a href="/stanfordnlp/">Stanford NLP Group</a> 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations &amp; interventions to improve Education.

So much gratitude for the support &amp; love - and SO excited for the next chapter!!!! 🥳
Wenting Zhao (@wzhao_nlp) 's Twitter Profile Photo

Excited to announce our workshop on Visions of Language Modeling at COLM'25! 🔥 We thought that current LM research overly focuses on a narrow set of popular topics (e.g., test-time scaling and LLM agents), and we'd love to bring some entropy back 💪 To do this, we invited a

Excited to announce our workshop on Visions of Language Modeling at COLM'25! 🔥

We thought that current LM research overly focuses on a narrow set of popular topics (e.g., test-time scaling and LLM agents), and we'd love to bring some entropy back 💪 To do this, we invited a
Wenting Zhao (@wzhao_nlp) 's Twitter Profile Photo

Some personal news: I'll join UMass Amherst CS as an assistant professor in fall 2026. Until then, I'll postdoc at Meta nyc. Reasoning will continue to be my main interest, with a focus on data-centric approaches🤩 If you're also interested, apply to me (phds & a postdoc)!

Oliver Li (@oliver54244160) 's Twitter Profile Photo

🤯 GPT-4o knows H&M left Russia in 2022 but still recommends shopping at H&M in Moscow. 🤔 LLMs store conflicting facts from different times, leading to inconsistent responses. We dig into how to better update LLMs with fresh facts that contradict their prior knowledge. 🧵 1/6

🤯 GPT-4o knows H&amp;M left Russia in 2022 but still recommends shopping at H&amp;M in Moscow.

🤔 LLMs store conflicting facts from different times, leading to inconsistent responses. We dig into how to better update LLMs with fresh facts that contradict their prior knowledge.

🧵 1/6
Percy Liang (@percyliang) 's Twitter Profile Photo

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
Rishi Jha (@rishi_d_jha) 's Twitter Profile Photo

I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with jack morris, Collin Zhang, and Vitaly Shmatikov. We present the first method to translate text embeddings across different spaces without any paired data or encoders. Here's why we're excited: 🧵👇🏾

I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with <a href="/jxmnop/">jack morris</a>, Collin Zhang, and <a href="/shmatikov/">Vitaly Shmatikov</a>.

We present the first method to translate text embeddings across different spaces without any paired data or encoders.

Here's why we're excited: 🧵👇🏾
Dasaem Jeong (@dasaemj) 's Twitter Profile Photo

🎶Now a neural network can read scanned score image and generate performance audio in end-to-end😎 I'm super excited to introduce our work on Unified Cross-modal translation between Score Image, Symbolic Music, and Audio. Why does it matter and how to make it? Check the thread🧵

Andrew Ng (@andrewyng) 's Twitter Profile Photo

I am alarmed by the proposed cuts to U.S. funding for basic research, and the impact this would have for U.S. competitiveness in AI and other areas. Funding research that is openly shared benefits the whole world, but the nation it benefits most is the one where the research is

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Esoteric Language Models "In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations." "our method achieves up to **65x** faster inference

Esoteric Language Models

"In this work, we introduce Eso-LMs, a new family of models that fuses AR  and MDM paradigms, enabling smooth interpolation between their  perplexities while overcoming their respective limitations."

"our method achieves up to **65x** faster inference
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:

🚨 [New paper alert] Esoteric Language Models (Eso-LMs)

First Diffusion LM to support KV caching w/o compromising parallel generation.

🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥
🚀 65× faster than MDLM
⚡ 4× faster than Block Diffusion

📜 Paper:
jack morris (@jxmnop) 's Twitter Profile Photo

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

new paper from our work at Meta!

**GPT-style language models memorize 3.6 bits per param**

we compute capacity by measuring total bits memorized, using some theory from Shannon (1953)

shockingly, the memorization-datasize curves look like this:
      ___________
  /
/

(🧵)
Zhihan Yang (@zhihanyangzy) 's Twitter Profile Photo

📢Thrilled to share our new paper: Esoteric Language Models (Eso-LMs) > 🔀Fuses autoregressive (AR) and masked diffusion (MDM) paradigms > 🚀First to unlock KV caching for MDMs (65x speedup!) > 🥇Sets new SOTA on generation speed-vs-quality Pareto frontier How? Dive in👇

📢Thrilled to share our new paper: Esoteric Language Models (Eso-LMs)

&gt; 🔀Fuses autoregressive (AR) and masked diffusion (MDM) paradigms
&gt; 🚀First to unlock KV caching for MDMs (65x speedup!)
&gt; 🥇Sets new SOTA on generation speed-vs-quality Pareto frontier

How? Dive in👇
LLM360 (@llm360) 's Twitter Profile Photo

KV-caching is great, but will it work for Diffusion Language Models. Zhihan Yang and team showed how to make it work with 65x speedup 🚀! Checkout the new preprint: arxiv.org/abs/2506.01928 The LLM360 team is very interested to explore new architectures.

TuringPost (@theturingpost) 's Twitter Profile Photo

.@NVIDIA never stops surprising Together with Cornell University they presented Eso-LMs (Esoteric Language Models) — a new kind of LM that combines the best parts of autoregressive (AR) and diffusion models. • It’s the first diffusion-based model that supports full KV caching. • At the

.@NVIDIA never stops surprising

Together with <a href="/Cornell/">Cornell University</a> they presented Eso-LMs (Esoteric Language Models) — a new kind of LM that combines the best parts of autoregressive (AR) and diffusion models.

• It’s the first diffusion-based model that supports full KV caching.
• At the
Kevin Ellis (@ellisk_kellis) 's Twitter Profile Photo

New paper: World models + Program synthesis by Wasu Top Piriyakulkij 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments

hardmaru (@hardmaru) 's Twitter Profile Photo

I agree with Jensen. If you want AI development to be done safely and responsibly, you do it in the open. Don’t do it in a dark room and tell me it’s “safe”. Article archive: archive.md/CC5VZ

I agree with Jensen. If you want AI development to be done safely and responsibly, you do it in the open. Don’t do it in a dark room and tell me it’s “safe”.

Article archive:
archive.md/CC5VZ
Chris Donahue (@chrisdonahuey) 's Twitter Profile Photo

Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generation with real-time control. 👋 **Try Magenta RT on Colab TPUs**: colab.research.google.com/github/magenta… 👀 Blog post: g.co/magenta/rt 🧵 below