Hadi Pouransari (@hpouransari) Twitter Tweets • TwiCopy

Shenao Zhang

2 months ago

🚀Excited to share our recent research:🚀 “Learning to Reason as Action Abstractions with Scalable Mid-Training RL” We theoretically study 𝙝𝙤𝙬 𝙢𝙞𝙙-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙨𝙝𝙖𝙥𝙚𝙨 𝙥𝙤𝙨𝙩-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙍𝙇. The findings lead to a scalable algorithm for learning action

thumb_up_off_alt397

chat_bubble_outline7

repeat63

shareShare

Huangjie Zheng

@undergroundjeg

2 months ago

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: arxiv.org/abs/2510.01329

thumb_up_off_alt233

chat_bubble_outline6

repeat36

shareShare

Awni Hannun

@awnihannun

2 months ago

I love this line of research from my colleagues at Apple: Augmenting a language model with a hierarchical memory makes perfect sense for several reasons: - Intuitively the memory parameters should be accessed much less frequently than the weights responsible for reasoning. You

thumb_up_off_alt699

chat_bubble_outline10

repeat74

shareShare

Michael Kirchhof

@mkirchhof_

2 months ago

LLMs are currently this one big parameter block that stores all sort of facts. In our new preprint, we add context-specific memory parameters to the model, and pretrain the model along with a big bank of memories. 📑 arxiv.org/abs/2510.02375 Thread 👇

thumb_up_off_alt177

chat_bubble_outline0

repeat22

shareShare

Hadi Pouransari

@hpouransari

2 months ago

How to do Chain-Of-Thoughts reasoning for language diffusion models? See 👇

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Hadi Pouransari

@hpouransari

2 months ago

📣We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.

thumb_up_off_alt459

chat_bubble_outline8

repeat51

shareShare

Fartash Faghri

@fartashfg

2 months ago

🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Fartash Faghri

@fartashfg

2 months ago

📣 Internship at Apple ML Research We’re looking for a PhD research intern with interests in efficient multimodal models and video. For our recent works see machinelearning.apple.com/research/fast-… This is a pure-research internship where the objective is to publish high-quality work. Internship

thumb_up_off_alt296

chat_bubble_outline3

repeat30

shareShare

Awni Hannun

@awnihannun

2 months ago

I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for

thumb_up_off_alt1,1K

chat_bubble_outline53

repeat106

shareShare

Eran Malach

@eranmalach

2 months ago

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

thumb_up_off_alt124

chat_bubble_outline1

repeat32

shareShare