Trevor Gale (@tgale96) 's Twitter Profile
Trevor Gale

@tgale96

Research Scientist @ Google DeepMind | PhD Candidate @ Stanford CS

ID: 1332660512

calendar_today06-04-2013 22:55:47

220 Tweet

1,1K Followers

276 Following

Samuel L Smith (@samuelmlsmith) 's Twitter Profile Photo

Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

Announcing RecurrentGemma!
github.com/google-deepmin…

- A 2B model with open weights based on Griffin
- Replaces transformer with mix of gated linear recurrences and local attention
- Competitive with Gemma-2B on downstream evals
- Higher throughput when sampling long sequences
Mihir Patel (@mvpatel2000) 's Twitter Profile Photo

🚨Open Source Drop🚨 Databricks is adopting MegaBlocks, and we're releasing the MegaBlocks integration into LLMFoundry. This is a critical component in our Dbrx training stack, and we're super excited to bring MoE training to the community (1/N)

🚨Open Source Drop🚨

Databricks is adopting MegaBlocks, and we're releasing the MegaBlocks integration into LLMFoundry. This is a critical component in our Dbrx training stack, and we're super excited to bring MoE training to the community (1/N)
Trevor Gale (@tgale96) 's Twitter Profile Photo

“But to us a “register” is a 16x16 tile of data.” Sounds like you guys might like TPUs 😁 Very fun post to read!

Mihir Patel (@mvpatel2000) 's Twitter Profile Photo

Fun collaboration between Databricks Mosaic Research and PyTorch team! We've been working hard to scale MoEs and PyTorch distributed to thousands of GPUs, and this is a great summary of a lot of the cool things we've added to PyTorch. Quick rundown (1/N)

Adam Paszke (@apaszke) 's Twitter Profile Photo

Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s. The attention example matches FA3 performance, while being only ~200 lines of Python: github.com/google/jax/blo… It's easy to install too! Latest JAX packages have it.

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting News from Chatbot Arena! Google DeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

Exciting News from Chatbot Arena!

<a href="/GoogleDeepMind/">Google DeepMind</a>'s new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive
Davis Blalock (@davisblalock) 's Twitter Profile Photo

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀

arxiv.org/abs/2602.23349

A bunch of cool ideas make this possible: [1/n]