ray (@rayzarion) Twitter Tweets • TwiCopy

Georgi Gerganov

@ggerganov

2 years ago

Julien Chaumond you can see how the lm_head is kept in full precision

thumb_up_off_alt296

chat_bubble_outline9

repeat10

shareShare

Eric Hartford

@cognitivecompai

2 years ago

Seems GrokAdamW is an improvement. This is training gemma-2-2b with Dolphin 2.9.4 dataset, all settings equal

thumb_up_off_alt84

chat_bubble_outline5

repeat10

shareShare

vittorio

@iterintellectus

2 years ago

thumb_up_off_alt694

chat_bubble_outline17

repeat32

shareShare

Mathieu

@miniapeur

2 years ago

thumb_up_off_alt6,6K

chat_bubble_outline23

repeat587

shareShare

Sam Altman

@sama

a year ago

there is no wall

thumb_up_off_alt12,12K

chat_bubble_outline1,1K

repeat1,1K

shareShare

yobibyte

@y0b1byte

a year ago

Best illustrations award!

thumb_up_off_alt621

chat_bubble_outline6

repeat45

shareShare

We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥Unsloth AI! No experience or PhD needed. $400K - $500K/yr: Founding Engineer (47 points) $250K - $300K/yr: ML Engineer (32 points) Challenges: 1. Convert nf4 / BnB 4bit to

We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥<a href="/UnslothAI/">Unsloth AI</a>!

No experience or PhD needed.

$400K - $500K/yr: Founding Engineer (47 points)
$250K - $300K/yr: ML Engineer (32 points)

Challenges:
1. Convert nf4 / BnB 4bit to

thumb_up_off_alt5,5K

chat_bubble_outline177

repeat724

shareShare

DryCathode

@drycathode

a year ago

Nicholas Fabiano, MD Quantum entanglement

thumb_up_off_alt75

chat_bubble_outline3

repeat2

shareShare

TNG Technology Consulting GmbH

@tngtech

a year ago

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to DeepSeek V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to <a href="/deepseek_ai/">DeepSeek</a> V3-0324 with a novel construction method.

In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

The Chimera is a child LLM, using V3s

thumb_up_off_alt611

chat_bubble_outline27

repeat110

shareShare

yobibyte

@y0b1byte

a year ago

thumb_up_off_alt628

chat_bubble_outline19

repeat87

shareShare

Tilde

@tilderesearch

10 months ago

Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover: - Novel attention patterns - Hidden "attention sinks" - Better performance - And more A 🧵… ~1/8~

thumb_up_off_alt405

chat_bubble_outline5

repeat80

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

9 months ago

🚨 BREAKING: Kimi.ai’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different

🚨 BREAKING: <a href="/Kimi_Moonshot/">Kimi.ai</a>’s Kimi-K2 is now the #1 open model in the Arena!

With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model.

Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different

thumb_up_off_alt1,1K

chat_bubble_outline46

repeat151

shareShare

kalomaze

@kalomaze

9 months ago

accelerate

thumb_up_off_alt61

chat_bubble_outline8

repeat4

shareShare

gabe

@allgarbled

8 months ago

If you made a workout tracker that looked like the GitHub graph, the average physical fitness among computer programmers would increase by at least 50%

thumb_up_off_alt727

chat_bubble_outline13

repeat15

shareShare

ₕₐₘₚₜₒₙ

@hamptonism

8 months ago

just do it,

thumb_up_off_alt239

chat_bubble_outline3

repeat22

shareShare

Elon Musk

@elonmusk

5 months ago

Optimus will be the Von Neumann probe

thumb_up_off_alt29,29K

chat_bubble_outline3,3K

repeat2,2K

shareShare

Black Forest Labs

@bfl_ml

4 months ago

FLUX.2 [max] is here Our highest quality model to date. * Grounded generation - searches the web for real-time context. * Up to 10 reference images. Products, characters, styles stay consistent. * #2 on Artificial Analysis in text-to-image and image editing.

thumb_up_off_alt977

chat_bubble_outline41

repeat136

shareShare

Andrej Karpathy

@karpathy

a month ago

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the

thumb_up_off_alt12,12K

chat_bubble_outline483

repeat1,1K

shareShare

elie

@eliebakouch

a month ago

Muon optimizer explained by the new Claude interactive feature, this is so nice

Muon optimizer explained by the new <a href="/claudeai/">Claude</a> interactive feature, this is so nice

thumb_up_off_alt602

chat_bubble_outline8

repeat41

shareShare

ray

Georgi Gerganov

Eric Hartford

vittorio

Mathieu

Sam Altman

yobibyte

Daniel Han

DryCathode

TNG Technology Consulting GmbH

yobibyte

Tilde

lmarena.ai (formerly lmsys.org)

kalomaze

gabe

ₕₐₘₚₜₒₙ

Elon Musk

Black Forest Labs

Andrej Karpathy

elie