Trevor Gale (@tgale96) Twitter Tweets • TwiCopy

Trevor Gale

@tgale96

2 years ago

Mad respect to Tré Cool for his open-source work 🫡

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

thumb_up_off_alt273

chat_bubble_outline9

repeat65

shareShare

Mihir Patel

@mvpatel2000

2 years ago

🚨Open Source Drop🚨 Databricks is adopting MegaBlocks, and we're releasing the MegaBlocks integration into LLMFoundry. This is a critical component in our Dbrx training stack, and we're super excited to bring MoE training to the community (1/N)

thumb_up_off_alt174

chat_bubble_outline3

repeat32

shareShare

Jonathan Frankle

@jefrankle

2 years ago

Please welcome MegaBlocks to the Databricks family! databricks.com/blog/bringing-…

thumb_up_off_alt152

chat_bubble_outline4

repeat13

shareShare

Trevor Gale

@tgale96

2 years ago

“But to us a “register” is a 16x16 tile of data.” Sounds like you guys might like TPUs 😁 Very fun post to read!

thumb_up_off_alt38

chat_bubble_outline2

repeat0

shareShare

Mihir Patel

@mvpatel2000

2 years ago

Fun collaboration between Databricks Mosaic Research and PyTorch team! We've been working hard to scale MoEs and PyTorch distributed to thousands of GPUs, and this is a great summary of a lot of the cool things we've added to PyTorch. Quick rundown (1/N)

thumb_up_off_alt118

chat_bubble_outline4

repeat27

shareShare

Trevor Gale

@tgale96

2 years ago

Awesome work 😁🔥 Great to see some of these details being communicated more widely.

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

Adam Paszke

@apaszke

2 years ago

Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s. The attention example matches FA3 performance, while being only ~200 lines of Python: github.com/google/jax/blo… It's easy to install too! Latest JAX packages have it.

thumb_up_off_alt667

chat_bubble_outline14

repeat111

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 years ago

Exciting News from Chatbot Arena! Google DeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

Exciting News from Chatbot Arena!

<a href="/GoogleDeepMind/">Google DeepMind</a>'s new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat410

shareShare

Trevor Gale

@tgale96

a year ago

oh good now i get to watch dan fight all the anime anons all morning

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Soumith Chintala

@soumithchintala

a year ago

dr. jack morris unclear tbh, data mix is not specified. probably is the big reason for performance

thumb_up_off_alt68

chat_bubble_outline5

repeat3

shareShare

Trevor Gale

@tgale96

a year ago

If you’re going to read one thing, read this. Incredible job by these guys putting this together!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Trevor Gale

@tgale96

a year ago

Vertical integration is a good way to push efficiency. It also makes working on these models incredibly fun 😁

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Trevor Gale

@tgale96

6 months ago

lol

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Davis Blalock

@davisblalock

2 months ago

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]

thumb_up_off_alt1,1K

chat_bubble_outline26

repeat211

shareShare

Trevor Gale

Trevor Gale

Samuel L Smith

Mihir Patel

Jonathan Frankle

Trevor Gale

Mihir Patel

Trevor Gale

Adam Paszke

lmarena.ai (formerly lmsys.org)

Trevor Gale

Soumith Chintala

Trevor Gale

Trevor Gale

Trevor Gale

Davis Blalock