evolvingstuff (@evolvingstuff) Twitter Tweets • TwiCopy

hardmaru

6 months ago

Text-to-LoRA: Instant Transformer Adaption arxiv.org/abs/2506.06105 Generative models can produce text, images, video. They should also be able to generate models! Here, we trained a Hypernetwork to generate new task-specific LoRAs by simply describing the task as a text prompt.

thumb_up_off_alt774

chat_bubble_outline12

repeat131

shareShare

hardmaru

@hardmaru

5 months ago

DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B. together.ai/blog/deepswe Fantastic work from Together AI Agentica Project‼

thumb_up_off_alt258

chat_bubble_outline12

repeat43

shareShare

Andrej Karpathy

@karpathy

5 months ago

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene

thumb_up_off_alt4,4K

chat_bubble_outline233

repeat613

shareShare

evolvingstuff

@evolvingstuff

3 months ago

Looking for examples of questions that stump SOTA LLMs. My current favorite: 'I have a problem with my order from the shoe shop. I received a left shoe instead of a right shoe, and a right shoe instead of a left shoe. What can I do? Can I still wear them?'

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

hayden

@haydendevs

3 months ago

there's too many people with "AI/ML" in their bio asking what this image is.

thumb_up_off_alt24,24K

chat_bubble_outline599

repeat1,1K

shareShare

DeepLearning.AI

@deeplearningai

3 months ago

Google researchers introduced ATLAS, a transformer-like language model architecture. ATLAS replaces attention with a trainable memory module and processes inputs up to 10 million tokens. The team trained a 1.3 billion-parameter model on FineWeb, updating only the memory module

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat226

shareShare

hardmaru

@hardmaru

3 months ago

Proud to release ShinkaEvolve, our open-source framework that evolves programs for scientific discovery with very good sample-efficiency! 🐙 Paper: arxiv.org/abs/2509.19349 Blog: sakana.ai/shinka-evolve/ Project: github.com/SakanaAI/Shink…

thumb_up_off_alt368

chat_bubble_outline6

repeat62

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

3 months ago

Thinking Augmented Pre-training "we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens

thumb_up_off_alt503

chat_bubble_outline11

repeat77

shareShare

Akshay 🚀

@akshay_pachaar

2 months ago

How LLMs work under the hood? This is the best place to visually understand the internal workings of a transformer-based LLM. Explore tokenization, self-attention, and more in an interactive way:

thumb_up_off_alt882

chat_bubble_outline9

repeat170

shareShare

DailyPapers

@huggingpapers

2 months ago

LLM reasoning: longer isn't always better. Meta Research just dropped new insights! We challenge the idea that longer CoT traces are always more effective. Our study shows that *failing less* is key, introducing a new metric 'Failed-Step Fraction' to predict reasoning accuracy.

thumb_up_off_alt470

chat_bubble_outline7

repeat60

shareShare

Jason Weston

@jaseweston

2 months ago

🚨New paper: Stochastic activations 🚨 We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.

thumb_up_off_alt504

chat_bubble_outline11

repeat57

shareShare

Awni Hannun

@awnihannun

2 months ago

The sparse attention in the new DeepSeek v3.2 is quite simple. Here's a little sketch. - You have a full attention layer (or MLA as in DSV3). - You also have a lite-attention layer which only computes query-key scores. - From the lite layer you get the top-k indices for the each

thumb_up_off_alt575

chat_bubble_outline8

repeat71

shareShare

Andrej Karpathy

@karpathy

2 months ago

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea

thumb_up_off_alt4,4K

chat_bubble_outline217

repeat522

shareShare

Sauers

@sauers_

2 months ago

Sparse autoencoder after being fed vectors from the final hidden state of transformers trained on each author with reconstruction + contrastive loss

thumb_up_off_alt675

chat_bubble_outline35

repeat43

shareShare

Ethan Mollick

@emollick

2 months ago

This paper shows that you can predict actual purchase intent (90% accuracy) by asking an LLM to impersonate a customer with a demographic profile, giving it a product & having it give its impressions, which another AI rates. No fine-tuning or training & beats classic ML methods.

thumb_up_off_alt7,7K

chat_bubble_outline138

repeat717

shareShare

Akshay 🚀

@akshay_pachaar

2 months ago

Did Stanford just kill LLM fine-tuning? This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight. Here's how it works: Instead of retraining the model, ACE evolves the context

thumb_up_off_alt788

chat_bubble_outline41

repeat141

shareShare

alphaXiv

@askalphaxiv

2 months ago

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

thumb_up_off_alt806

chat_bubble_outline23

repeat112

shareShare

Jürgen Schmidhuber

@schmidhuberai

a month ago

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614 With Wenyi Wang, Piotr Piękos,

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat158

shareShare

机器之心 JIQIZHIXIN

@synced_global

a month ago

Huge breakthrough from DeepMind! In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the

thumb_up_off_alt792

chat_bubble_outline23

repeat138

shareShare