🍊 (we/wacc) (@orangeeeeeee9) 's Twitter Profile
🍊 (we/wacc)

@orangeeeeeee9

miorange

ID: 1617020498772389898

calendar_today22-01-2023 04:45:11

177 Tweet

32 Followers

450 Following

Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

350M parameters is all you need! ⚡ Revisiting Meta's MobileLLM paper this morning: > Reaches same perf as L2 7B in API callling competitive at chat > Train thin and deep networks (instead of wide) > Grouped Query Attention (even for smaller networks) > Block wise weight

350M parameters is all you need! ⚡

Revisiting Meta's MobileLLM paper this morning:

> Reaches same perf as L2 7B in API callling competitive at chat
> Train thin and deep networks (instead of wide)
> Grouped Query Attention (even for smaller networks)
> Block wise weight
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Learning to (Learn at Test Time): RNNs with Expressive Hidden States - Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training - Achieves better perplexity than Mamba arxiv.org/abs/2407.04620

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

- Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training 
- Achieves better perplexity than Mamba

arxiv.org/abs/2407.04620
Hongyu Wang (@realhongyu_wang) 's Twitter Profile Photo

4 months since we released BitNet b1.58🔥🔥 After we compressed LLM to 1.58 bits, the inference of 1bit LLM is no longer memory-bound, but compute-bound. 🚀🚀Today we introduce Q-Sparse that can significantly speed up LLM computation.

4 months since we released BitNet b1.58🔥🔥

After we compressed LLM to 1.58 bits, the inference of 1bit LLM is no longer memory-bound, but compute-bound.

🚀🚀Today we introduce Q-Sparse that can significantly speed up LLM computation.
Matt Shumer (@mattshumer_) 's Twitter Profile Photo

If this actually replicates/works, this is huge Lifelong learning, reduced forgetting, etc. I’ve always had iffy experiences with MoEs, but this is very exciting

If this actually replicates/works, this is huge

Lifelong learning, reduced forgetting, etc.

I’ve always had iffy experiences with MoEs, but this is very exciting
The Humanoid Hub (@thehumanoidhub) 's Twitter Profile Photo

How soon will we see the first example of a single person, shepherding a humanoid robot fleet, building a billion-dollar revenue business as the sole employee?

How soon will we see the first example of a single person, shepherding a humanoid robot fleet, building a billion-dollar revenue business as the sole employee?
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

LLM model size competition is intensifying… backwards! My bet is that we'll see models that "think" very well and reliably that are very very small. There is most likely a setting even of GPT-2 parameters for which most people will consider GPT-2 "smart". The reason current

Chubby♨️ (@kimmonismus) 's Twitter Profile Photo

Currently it looks like Llama3.1-405b beats gpt-4o in almost all benchmarks (except human_eval and mmlu_social_sciences). Previously there was a lot of concern that Llama 3 would perform worse, but initial tests show excellent results. Meanwhile, rumors are growing louder that

Currently it looks like Llama3.1-405b beats gpt-4o in almost all benchmarks (except human_eval and mmlu_social_sciences).
Previously there was a lot of concern that Llama 3 would perform worse, but initial tests show excellent results.
Meanwhile, rumors are growing louder that
François Chollet (@fchollet) 's Twitter Profile Photo

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task
Ali Behrouz (@behrouz_ali) 's Twitter Profile Photo

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? 

Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

🚀 DeepSeek-R1 is here!

⚡ Performance on par with OpenAI-o1
📖 Fully open-source model & technical report
🏆 MIT licensed: Distill & commercialize freely!

🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today!

🐋 1/n
🍊 (we/wacc) (@orangeeeeeee9) 's Twitter Profile Photo

swe-bench verified (real world coding benchmark; github issue input, github PR output) went from 5% solved to 65% in one year. prob near 100% before 2026

swe-bench verified (real world coding benchmark; github issue input, github PR output) went from 5% solved to 65% in one year. prob near 100% before 2026
Noam Brown (@polynoamial) 's Twitter Profile Photo

This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.

Lin Zheng (@linzhengisme) 's Twitter Profile Photo

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks.

💻 Blog: bit.ly/3CjEmTC 

🧵 1/9
Kagi (@kagihq) 's Twitter Profile Photo

Kagi products will always be free of ads and trackers. In fact, Kagi Search will actively down-rank sites with lots of ads and trackers in the results and promote sites with little or no advertising. An ad-free web is better, safer, more private and user-friendly.

Albert Gu (@_albertgu) 's Twitter Profile Photo

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

I converted one of my favorite talks I've given over the past year into a blog post.

"On the Tradeoffs of SSMs and Transformers"
(or: tokens are bullshit)

In a few days, we'll release what I believe is the next major advance for architectures.
Albert Gu (@_albertgu) 's Twitter Profile Photo

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence.

Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
jack morris (@jxmnop) 's Twitter Profile Photo

again, the AI labs are obsessed with building reasoning-native language models when they need to be building *memory-native* language models - this is possible (the techniques exist) - no one has done it yet (no popular LLM has a built in memory module) - door = wide open