Jonathan Pilault (@j_pilault) 's Twitter Profile
Jonathan Pilault

@j_pilault

• ML Research Scientist at Silicon Valley startup @ZyphraAI
• Former researcher @GoogleDeepMind @nvidia
• PhD @Mila_Quebec

ID: 110540208

calendar_today01-02-2010 22:32:22

106 Tweet

324 Followers

484 Following

utku (@utkuevci) 's Twitter Profile Photo

Hyped to share JaxPruner: a concise library for sparsity research. JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7 Code: github.com/google-researc… Paper: arxiv.org/pdf/2304.14082…

Hyped to share JaxPruner: a concise library for sparsity research.

JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7

Code: github.com/google-researc…
Paper: arxiv.org/pdf/2304.14082…
Mahan Fathi (@mahanfathi) 's Twitter Profile Photo

Why not get the best of both worlds by combining SSMs and Transformers? Excited to share our work at #NeurIPS2023: "Block-State Transformers." BST hits new highs in long-range language modeling and LRA tasks. paper: arxiv.org/abs/2306.09539 1/

Why not get the best of both worlds by combining SSMs and Transformers?

Excited to share our work at #NeurIPS2023: "Block-State Transformers."

BST hits new highs in long-range language modeling and LRA tasks.

paper: arxiv.org/abs/2306.09539

1/
Richard Socher (@richardsocher) 's Twitter Profile Photo

Rishi I like the SSM/hyena/Block State Transformers arxiv.org/pdf/2306.09539… arxiv.org/pdf/2302.10866… They remind me of Q-RNNs arxiv.org/abs/1611.01576 and play around with different parallelization ideas. I don't think transformers are that special and there are many equivalent

David Krueger (@davidskrueger) 's Twitter Profile Photo

My research group Krueger AI Safety Lab is looking for interns! Applications are due in 2 weeks ***January 29***. The long-awaited form: forms.gle/iLU1uQAxZ2UKEN… Please share widely!!

Mahan Fathi (@mahanfathi) 's Twitter Profile Photo

Course Correcting Koopman Representations Accepted at #ICLR2024! We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 📄 arxiv.org/abs/2310.15386 Google DeepMind 1/🧵

Course Correcting Koopman Representations
Accepted at #ICLR2024!

We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 

📄 arxiv.org/abs/2310.15386
<a href="/GoogleDeepMind/">Google DeepMind</a> 

1/🧵
Ross Goroshin (@rgoroshin) 's Twitter Profile Photo

Last week, I gave a talk at Mila - Institut québécois d'IA. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with Mahan Fathi Clement Gehring Jonathan Pilault David Kanaa Pierre-Luc Bacon. See you at ICLR 2026 in 🇦🇹! drive.google.com/file/d/1mQSXFa…

Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128

Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128
Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

Zyphra is ecstatic to release Zamba2-small: - 2.7B Mamba2/Attention hybrid - Pre-trained on 3T tokens + annealed on 100B high-quality tokens - Model released on HuggingFace and standalone PyTorch - SOTA evaluation performance and superior inference efficiency.

Zyphra is ecstatic to release Zamba2-small:
- 2.7B Mamba2/Attention hybrid
- Pre-trained on 3T tokens + annealed on 100B high-quality tokens
- Model released on HuggingFace and standalone PyTorch
- SOTA evaluation performance and superior inference efficiency.
Vasu Shyam (@vasud3vshyam) 's Twitter Profile Photo

Yann LeCun Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:

<a href="/ylecun/">Yann LeCun</a> Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:
Nick Alonso (@nick__alonso) 's Twitter Profile Photo

1) RAG often struggles on complex multi-hop queries. In this blog, we Zyphra discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost. zyphra.com/post/understan…

1) RAG often struggles on complex multi-hop queries. In this blog, we <a href="/ZyphraAI/">Zyphra</a> discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost.

zyphra.com/post/understan…
Jonathan Pilault (@j_pilault) 's Twitter Profile Photo

33% of online shoppers in Canada leave their cart before check-out due to high shipping costs (emarketer). Free shipping is must in Canada

Jonathan Pilault (@j_pilault) 's Twitter Profile Photo

#Montreal should be the #innovation and #start-up gate keeper between Europe and the US. Wouldn't entering respective markets be easier?