Jonathan Pilault (@j_pilault) Twitter Tweets • TwiCopy

Jonathan Pilault

@j_pilault

+ Follow

• ML Research Scientist at Silicon Valley startup @ZyphraAI
• Former researcher @GoogleDeepMind @nvidia
• PhD @Mila_Quebec

ID: 110540208

calendar_today01-02-2010 22:32:22

106 Tweet

324 Followers

484 Following

utku

@utkuevci

3 years ago

Hyped to share JaxPruner: a concise library for sparsity research. JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7 Code: github.com/google-researc… Paper: arxiv.org/pdf/2304.14082…

thumb_up_off_alt150

chat_bubble_outline1

repeat31

shareShare

Mahan Fathi

@mahanfathi

2 years ago

Why not get the best of both worlds by combining SSMs and Transformers? Excited to share our work at #NeurIPS2023: "Block-State Transformers." BST hits new highs in long-range language modeling and LRA tasks. paper: arxiv.org/abs/2306.09539 1/

thumb_up_off_alt381

chat_bubble_outline8

repeat65

shareShare

Richard Socher

@richardsocher

2 years ago

Rishi I like the SSM/hyena/Block State Transformers arxiv.org/pdf/2306.09539… arxiv.org/pdf/2302.10866… They remind me of Q-RNNs arxiv.org/abs/1611.01576 and play around with different parallelization ideas. I don't think transformers are that special and there are many equivalent

thumb_up_off_alt26

chat_bubble_outline1

repeat3

shareShare

David Krueger

@davidskrueger

2 years ago

My research group Krueger AI Safety Lab is looking for interns! Applications are due in 2 weeks ***January 29***. The long-awaited form: forms.gle/iLU1uQAxZ2UKEN… Please share widely!!

thumb_up_off_alt276

chat_bubble_outline6

repeat74

shareShare

Mahan Fathi

@mahanfathi

2 years ago

Course Correcting Koopman Representations Accepted at #ICLR2024! We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 📄 arxiv.org/abs/2310.15386 Google DeepMind 1/🧵

thumb_up_off_alt93

chat_bubble_outline4

repeat19

shareShare

Ross Goroshin

@rgoroshin

2 years ago

Last week, I gave a talk at Mila - Institut québécois d'IA. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with Mahan Fathi Clement Gehring Jonathan Pilault David Kanaa Pierre-Luc Bacon. See you at ICLR 2026 in 🇦🇹! drive.google.com/file/d/1mQSXFa…

thumb_up_off_alt18

chat_bubble_outline0

repeat5

shareShare

Quentin Anthony

@quentinanthon15

2 years ago

Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128

thumb_up_off_alt428

chat_bubble_outline23

repeat81

shareShare

Quentin Anthony

@quentinanthon15

2 years ago

Zyphra is ecstatic to release Zamba2-small: - 2.7B Mamba2/Attention hybrid - Pre-trained on 3T tokens + annealed on 100B high-quality tokens - Model released on HuggingFace and standalone PyTorch - SOTA evaluation performance and superior inference efficiency.

thumb_up_off_alt203

chat_bubble_outline4

repeat45

shareShare

Vasu Shyam

@vasud3vshyam

2 years ago

Yann LeCun Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:

<a href="/ylecun/">Yann LeCun</a> Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:

thumb_up_off_alt25

chat_bubble_outline0

repeat2

shareShare

Nick Alonso

@nick__alonso

2 years ago

1) RAG often struggles on complex multi-hop queries. In this blog, we Zyphra discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost. zyphra.com/post/understan…