evolvingstuff (@evolvingstuff) 's Twitter Profile
evolvingstuff

@evolvingstuff

I post about machine learning and occasionally some other stuff.

ID: 97275971

calendar_today16-12-2009 19:48:10

4,4K Tweet

2,2K Followers

2,2K Following

hardmaru (@hardmaru) 's Twitter Profile Photo

Text-to-LoRA: Instant Transformer Adaption arxiv.org/abs/2506.06105 Generative models can produce text, images, video. They should also be able to generate models! Here, we trained a Hypernetwork to generate new task-specific LoRAs by simply describing the task as a text prompt.

hardmaru (@hardmaru) 's Twitter Profile Photo

DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B. together.ai/blog/deepswe Fantastic work from Together AI Agentica Project

DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B.

together.ai/blog/deepswe

Fantastic work from <a href="/togethercompute/">Together AI</a> <a href="/Agentica_/">Agentica Project</a>‼
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are:

- small (each line of code costs energy)
- modular (organized into groups of swappable operons)
- self-contained (easily "copy paste-able" via horizontal gene
evolvingstuff (@evolvingstuff) 's Twitter Profile Photo

Looking for examples of questions that stump SOTA LLMs. My current favorite: 'I have a problem with my order from the shoe shop. I received a left shoe instead of a right shoe, and a right shoe instead of a left shoe. What can I do? Can I still wear them?'

DeepLearning.AI (@deeplearningai) 's Twitter Profile Photo

Google researchers introduced ATLAS, a transformer-like language model architecture. ATLAS replaces attention with a trainable memory module and processes inputs up to 10 million tokens. The team trained a 1.3 billion-parameter model on FineWeb, updating only the memory module

Google researchers introduced ATLAS, a transformer-like language model architecture. ATLAS replaces attention with a trainable memory module and processes inputs up to 10 million tokens. 

The team trained a 1.3 billion-parameter model on FineWeb, updating only the memory module
hardmaru (@hardmaru) 's Twitter Profile Photo

Proud to release ShinkaEvolve, our open-source framework that evolves programs for scientific discovery with very good sample-efficiency! 🐙 Paper: arxiv.org/abs/2509.19349 Blog: sakana.ai/shinka-evolve/ Project: github.com/SakanaAI/Shink…

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Thinking Augmented Pre-training "we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens

Thinking Augmented Pre-training

"we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens
Akshay 🚀 (@akshay_pachaar) 's Twitter Profile Photo

How LLMs work under the hood? This is the best place to visually understand the internal workings of a transformer-based LLM. Explore tokenization, self-attention, and more in an interactive way:

DailyPapers (@huggingpapers) 's Twitter Profile Photo

LLM reasoning: longer isn't always better. Meta Research just dropped new insights! We challenge the idea that longer CoT traces are always more effective. Our study shows that *failing less* is key, introducing a new metric 'Failed-Step Fraction' to predict reasoning accuracy.

LLM reasoning: longer isn't always better.

Meta Research just dropped new insights! We challenge the idea that longer CoT traces are always more effective. Our study shows that *failing less* is key, introducing a new metric 'Failed-Step Fraction' to predict reasoning accuracy.
Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨New paper: Stochastic activations 🚨 We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.

🚨New paper: Stochastic activations 🚨

We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.
Awni Hannun (@awnihannun) 's Twitter Profile Photo

The sparse attention in the new DeepSeek v3.2 is quite simple. Here's a little sketch. - You have a full attention layer (or MLA as in DSV3). - You also have a lite-attention layer which only computes query-key scores. - From the lite layer you get the top-k indices for the each

The sparse attention in the new DeepSeek v3.2 is quite simple. Here's a little sketch.

- You have a full attention layer (or MLA as in DSV3).
- You also have a lite-attention layer which only computes query-key scores.
- From the lite layer you get the top-k indices for the each
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea

Sauers (@sauers_) 's Twitter Profile Photo

Sparse autoencoder after being fed vectors from the final hidden state of transformers trained on each author with reconstruction + contrastive loss

Sparse autoencoder after being fed vectors from the final hidden state of transformers trained on each author with reconstruction + contrastive loss
Ethan Mollick (@emollick) 's Twitter Profile Photo

This paper shows that you can predict actual purchase intent (90% accuracy) by asking an LLM to impersonate a customer with a demographic profile, giving it a product & having it give its impressions, which another AI rates. No fine-tuning or training & beats classic ML methods.

This paper shows that you can predict actual purchase intent (90% accuracy) by asking an LLM to impersonate a customer with a demographic profile, giving it a product &amp; having it give its impressions, which another AI rates.

No fine-tuning or training &amp; beats classic ML methods.
Akshay 🚀 (@akshay_pachaar) 's Twitter Profile Photo

Did Stanford just kill LLM fine-tuning? This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight. Here's how it works: Instead of retraining the model, ACE evolves the context

Did Stanford just kill LLM fine-tuning?

This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight.

Here's how it works:

Instead of retraining the model, ACE evolves the context
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

Jürgen Schmidhuber (@schmidhuberai) 's Twitter Profile Photo

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614  With Wenyi Wang, Piotr Piękos,

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614  With <a href="/Wenyi_AI_Wang/">Wenyi Wang</a>, <a href="/PiotrPiekosAI/">Piotr Piękos</a>,
机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

Huge breakthrough from DeepMind! In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the

Huge breakthrough from DeepMind!

In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms.

"Enabling machines to discover learning algorithms for themselves is one of the