🍊 (we/wacc) (@orangeeeeeee9) Twitter Tweets • TwiCopy

Vaibhav (VB) Srivastav

a year ago

350M parameters is all you need! ⚡ Revisiting Meta's MobileLLM paper this morning: > Reaches same perf as L2 7B in API callling competitive at chat > Train thin and deep networks (instead of wide) > Grouped Query Attention (even for smaller networks) > Block wise weight

thumb_up_off_alt298

chat_bubble_outline7

repeat55

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a year ago

Learning to (Learn at Test Time): RNNs with Expressive Hidden States - Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training - Achieves better perplexity than Mamba arxiv.org/abs/2407.04620

thumb_up_off_alt411

chat_bubble_outline7

repeat78

shareShare

Hongyu Wang

@realhongyu_wang

a year ago

4 months since we released BitNet b1.58🔥🔥 After we compressed LLM to 1.58 bits, the inference of 1bit LLM is no longer memory-bound, but compute-bound. 🚀🚀Today we introduce Q-Sparse that can significantly speed up LLM computation.

thumb_up_off_alt300

chat_bubble_outline14

repeat56

shareShare

Matt Shumer

@mattshumer_

a year ago

If this actually replicates/works, this is huge Lifelong learning, reduced forgetting, etc. I’ve always had iffy experiences with MoEs, but this is very exciting

thumb_up_off_alt1,1K

chat_bubble_outline10

repeat121

shareShare

The Humanoid Hub

@thehumanoidhub

a year ago

How soon will we see the first example of a single person, shepherding a humanoid robot fleet, building a billion-dollar revenue business as the sole employee?

thumb_up_off_alt84

chat_bubble_outline4

repeat3

shareShare

Andrej Karpathy

@karpathy

a year ago

LLM model size competition is intensifying… backwards! My bet is that we'll see models that "think" very well and reliably that are very very small. There is most likely a setting even of GPT-2 parameters for which most people will consider GPT-2 "smart". The reason current

thumb_up_off_alt7,7K

chat_bubble_outline196

repeat935

shareShare

Chubby♨️

@kimmonismus

a year ago

Currently it looks like Llama3.1-405b beats gpt-4o in almost all benchmarks (except human_eval and mmlu_social_sciences). Previously there was a lot of concern that Llama 3 would perform worse, but initial tests show excellent results. Meanwhile, rumors are growing louder that

thumb_up_off_alt618

chat_bubble_outline45

repeat79

shareShare

Sam Hart

@hxrts

a year ago

Anna's Archive exceptionally based

thumb_up_off_alt33

chat_bubble_outline2

repeat6

shareShare

François Chollet

@fchollet

8 months ago

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task

thumb_up_off_alt8,8K

chat_bubble_outline204

repeat1,1K

shareShare

stochasm

@stochasticchasm

8 months ago

Another win for physics of language models (part 3.3)

thumb_up_off_alt347

chat_bubble_outline9

repeat24

shareShare

Ali Behrouz

@behrouz_ali

8 months ago

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

thumb_up_off_alt3,3K

chat_bubble_outline78

repeat609

shareShare

DeepSeek

@deepseek_ai

7 months ago

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

thumb_up_off_alt37,37K

chat_bubble_outline2,2K

repeat7,7K

shareShare

🍊 (we/wacc)

@orangeeeeeee9

7 months ago

swe-bench verified (real world coding benchmark; github issue input, github PR output) went from 5% solved to 65% in one year. prob near 100% before 2026

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Noam Brown

@polynoamial

7 months ago

This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.

thumb_up_off_alt7,7K

chat_bubble_outline264

repeat720

shareShare

Lin Zheng

@linzhengisme

7 months ago

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

thumb_up_off_alt473

chat_bubble_outline13

repeat94

shareShare

outside five sigma

@jwt0625

6 months ago

Tab groups are not helping at all. Now I just need tab group groups. A tab tree.

thumb_up_off_alt47

chat_bubble_outline11

repeat1

shareShare

Kagi

@kagihq

3 months ago

Kagi products will always be free of ads and trackers. In fact, Kagi Search will actively down-rank sites with lots of ads and trackers in the results and promote sites with little or no advertising. An ad-free web is better, safer, more private and user-friendly.

thumb_up_off_alt154

chat_bubble_outline3

repeat15

shareShare

Albert Gu

@_albertgu

2 months ago

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

thumb_up_off_alt516

chat_bubble_outline19

repeat72

shareShare

Albert Gu

@_albertgu

2 months ago

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

thumb_up_off_alt1,1K

chat_bubble_outline58

repeat177

shareShare

jack morris

@jxmnop

2 months ago

again, the AI labs are obsessed with building reasoning-native language models when they need to be building *memory-native* language models - this is possible (the techniques exist) - no one has done it yet (no popular LLM has a built in memory module) - door = wide open

thumb_up_off_alt937

chat_bubble_outline101

repeat45

shareShare