Hadi Pouransari (@hpouransari) 's Twitter Profile
Hadi Pouransari

@hpouransari

ML Research @Apple, PhD @Stanford.

ID: 1150495006521585664

calendar_today14-07-2019 19:59:19

137 Tweet

536 Followers

263 Following

Shenao Zhang (@shenaozhang) 's Twitter Profile Photo

πŸš€Excited to share our recent research:πŸš€ β€œLearning to Reason as Action Abstractions with Scalable Mid-Training RL” We theoretically study 𝙝𝙀𝙬 π™’π™žπ™™-π™©π™§π™–π™žπ™£π™žπ™£π™œ 𝙨𝙝𝙖π™₯π™šπ™¨ π™₯𝙀𝙨𝙩-π™©π™§π™–π™žπ™£π™žπ™£π™œ 𝙍𝙇. The findings lead to a scalable algorithm for learning action

πŸš€Excited to share our recent research:πŸš€

β€œLearning to Reason as Action Abstractions with Scalable Mid-Training RL”

We theoretically study 𝙝𝙀𝙬 π™’π™žπ™™-π™©π™§π™–π™žπ™£π™žπ™£π™œ 𝙨𝙝𝙖π™₯π™šπ™¨ π™₯𝙀𝙨𝙩-π™©π™§π™–π™žπ™£π™žπ™£π™œ 𝙍𝙇. 
The findings lead to a scalable algorithm for learning action
Huangjie Zheng (@undergroundjeg) 's Twitter Profile Photo

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) β€” a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: arxiv.org/abs/2510.01329

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) β€” a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] 

Paper: arxiv.org/abs/2510.01329
Awni Hannun (@awnihannun) 's Twitter Profile Photo

I love this line of research from my colleagues at Apple: Augmenting a language model with a hierarchical memory makes perfect sense for several reasons: - Intuitively the memory parameters should be accessed much less frequently than the weights responsible for reasoning. You

I love this line of research from my colleagues at Apple:

Augmenting a language model with a hierarchical memory makes perfect sense for several reasons:

- Intuitively the memory parameters should be accessed much less frequently than the weights responsible for reasoning. You
Michael Kirchhof (@mkirchhof_) 's Twitter Profile Photo

LLMs are currently this one big parameter block that stores all sort of facts. In our new preprint, we add context-specific memory parameters to the model, and pretrain the model along with a big bank of memories. πŸ“‘ arxiv.org/abs/2510.02375 Thread πŸ‘‡

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

πŸ“£We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.

Fartash Faghri (@fartashfg) 's Twitter Profile Photo

🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.

Fartash Faghri (@fartashfg) 's Twitter Profile Photo

πŸ“£ Internship at Apple ML Research We’re looking for a PhD research intern with interests in efficient multimodal models and video. For our recent works see machinelearning.apple.com/research/fast-… This is a pure-research internship where the objective is to publish high-quality work. Internship

Awni Hannun (@awnihannun) 's Twitter Profile Photo

I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for

I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX.

For example:
- Much faster prefill. In other words time-to-first-token will go down. 
- Faster image / video generation
- Faster fine-tuning (LoRA or otherwise)
- Higher throughput for
Eran Malach (@eranmalach) 's Twitter Profile Photo

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧡

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. 
Arxiv: arxiv.org/pdf/2510.14826
🧡