ray (@rayzarion) 's Twitter Profile
ray

@rayzarion

attn

ID: 1749283689547333632

calendar_today22-01-2024 04:12:04

248 Tweet

20 Takipçi

287 Takip Edilen

Daniel Han (@danielhanchen) 's Twitter Profile Photo

We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥Unsloth AI! No experience or PhD needed. $400K - $500K/yr: Founding Engineer (47 points) $250K - $300K/yr: ML Engineer (32 points) Challenges: 1. Convert nf4 / BnB 4bit to

We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥<a href="/UnslothAI/">Unsloth AI</a>!

No experience or PhD needed.

$400K - $500K/yr: Founding Engineer (47 points)
$250K - $300K/yr: ML Engineer (32 points)

Challenges:
1. Convert nf4 / BnB 4bit to
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to DeepSeek V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to <a href="/deepseek_ai/">DeepSeek</a>  V3-0324 with a novel construction method.

In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

The Chimera is a child LLM, using V3s
Tilde (@tilderesearch) 's Twitter Profile Photo

Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover: - Novel attention patterns - Hidden "attention sinks" - Better performance - And more A 🧵… ~1/8~

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨 BREAKING: Kimi.ai’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different

🚨 BREAKING: <a href="/Kimi_Moonshot/">Kimi.ai</a>’s Kimi-K2 is now the #1 open model in the Arena!

With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model.

Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different
gabe (@allgarbled) 's Twitter Profile Photo

If you made a workout tracker that looked like the GitHub graph, the average physical fitness among computer programmers would increase by at least 50%

Black Forest Labs (@bfl_ml) 's Twitter Profile Photo

FLUX.2 [max] is here Our highest quality model to date. * Grounded generation - searches the web for real-time context. * Up to 10 reference images. Products, characters, styles stay consistent. * #2 on Artificial Analysis in text-to-image and image editing.

FLUX.2 [max] is here

Our highest quality model to date.

* Grounded generation - searches the web for real-time context.
* Up to 10 reference images. Products, characters, styles stay consistent.
* #2 on <a href="/ArtificialAnlys/">Artificial Analysis</a> in text-to-image and image editing.
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:

- the human iterates on the