Richard Kuzma (@rskuzma) Twitter Tweets • TwiCopy

Richard Kuzma

3 years ago

Why train LLMs for only 1 epoch? Do we have to stop there? Demand is growing for small-parameter LLMs trained on more data. Internet-scale data isn't available for all languages and domain-specific (e.g. finance, life science) text Training for more epochs seems advisable!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

renji the 🐳🐳🐳🐳🐳🐳🐳🐳🐳🐳🐳🐳🐳🐳 maximalist

@brickroad7

2 years ago

🚨🚨🚨🚨🚨🚨🚨🚨 deleted sam altman interview... lots of alpha here... wayback machine link --> web.archive.org/web/2023053120…

thumb_up_off_alt2,2K

chat_bubble_outline43

repeat240

shareShare

Richard Kuzma

@rskuzma

2 years ago

Great work by my colleagues Daria Soboleva, Nolan Dey, and others to further clean and deduplicate RedPajama to make SlimPajama, a (still massive) extremely high quality dataset giving practitioners more control over how much (if any) duplication they want!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Cerebras

@cerebrassystems

2 years ago

📣 Today we are announcing Condor Galaxy-1: a 4 exaflop AI supercomputer built in partnership with G42. Powered by 64 Cerebras CS-2 systems, 54M cores, and 82TB of memory – it's the largest AI supercomputer we've ever built. But that's not all: CG-1 is just the start..

📣 Today we are announcing Condor Galaxy-1: a 4 exaflop AI supercomputer built in partnership with <a href="/G42ai/">G42</a>. Powered by 64 Cerebras CS-2 systems, 54M cores, and 82TB of memory – it's the largest AI supercomputer we've ever built. But that's not all: CG-1 is just the start..

thumb_up_off_alt355

chat_bubble_outline18

repeat81

shareShare

Openτensor Foundaτion

@opentensor

2 years ago

The Opentensor Foundation and Cerebras are pleased to announce Bittensor Language Model (BTLM), a new state-of-the-art 3 billion parameter language model that achieves breakthrough accuracy across a dozen AI benchmarks

thumb_up_off_alt646

chat_bubble_outline33

repeat225

shareShare

Cerebras

@cerebrassystems

2 years ago

Introducing BTLM-3B-8K: an open, state-of-the art 3B parameter model with 7B level performance. When quantized, it fits in as little as 3GB of memory 🤯. It runs on iPhone, Google Pixel, even Raspberry Pi. BTLM goes live on Bittensor later this week! 🧵👇 buff.ly/3Q5dtY5

thumb_up_off_alt553

chat_bubble_outline19

repeat163

shareShare

Richard Kuzma

@rskuzma

2 years ago

Announcing BTLM-3B-8k-base! - 7B performance in a 3B model ✅ - 8k context length ✅ - quantize to fit in 3GB of memory ✅ - trained on high quality data ✅ - apache 2.0 license ✅ huggingface.co/cerebras/btlm-… Great work by my colleagues Daria Soboleva Nolan Dey Faisal Al-khateeb and others 👏

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Cerebras

@cerebrassystems

2 years ago

The Cerebras team has had a great time sharing our work at #ICML23. Below is a summary of the posters we presented, let us know if you are interested in discussing any of them further!

thumb_up_off_alt33

chat_bubble_outline2

repeat7

shareShare

Richard Kuzma

@rskuzma

2 years ago

🥳 CerebrasGPT proved to the world in March how effectively you can train LLMs on Cerebras hardware. Now BTLM surpasses 1M downloads in ~3 weeks on Hugging Face! 🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Greg Brockman

@gdb

2 years ago

evals are surprisingly often all you need

thumb_up_off_alt1,1K

chat_bubble_outline68

repeat98

shareShare

Richard Kuzma

@rskuzma

2 years ago

This sounds like the start of a wedding DJ empire

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ritwik Gupta 🇺🇦

@ritwik_g

a year ago

I read Leopold Aschenbrenner's essay on the future of AI research and geopolitical competition. It's well-researched, well-presented, and passionate. However, Leopold advocates for an unreasonably strict and exclusionary future for AI development—a view that's gaining traction. (1/9)

thumb_up_off_alt78

chat_bubble_outline4

repeat19

shareShare

Richard Kuzma

@rskuzma

a year ago

Crazy speed from the team at @CerebrasSystems! Unlocks lots of interesting use cases across fast agent tool calling, multi-agent systems, self-consistency, and more!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Ted Mabrey

@mabreyted

a year ago

Man so excited we could finally unveil this. This is THE applied AI project. Google walked away from it. We embraced it. The world is a different place because of it. It provides so many foundational learnings that we are now applying to the commercial world via AIP. The

thumb_up_off_alt152

chat_bubble_outline3

repeat29

shareShare

Lydia Hylton

@lyd_hylton

a year ago

Thrilled to officially announce what I've been working on for the last year: Strella.io! At Strella, we believe that the customer’s needs should be a company’s North Star. Using Strella’s AI, we enable companies to make informed decisions in hours, not weeks ⭐️🌟🚀

thumb_up_off_alt52

chat_bubble_outline8

repeat10

shareShare

Daria Soboleva

@dmsobol

2 months ago

This might be the most information dense blog I've ever written. Added "show me the math" section into MoE 101 p4 episode. We believe it fully models MoE training perf on both gpu and cerebras wse devices. cerebras.ai/blog/moe-guide… 🧵1/n

thumb_up_off_alt255

chat_bubble_outline11

repeat40

shareShare