Jinsheng Wang (@wolfwjs) 's Twitter Profile
Jinsheng Wang

@wolfwjs

Make it simple. Make it work.

ID: 1061955660156329984

calendar_today12-11-2018 12:15:34

97 Tweet

60 Followers

742 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.

Mustafa Shukor (@mustafashukor1) 's Twitter Profile Photo

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders !

Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones.  These laws allow 1/n 🧵
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL

elvis (@omarsar0) 's Twitter Profile Photo

Hierarchical Reasoning Model This is one of the most interesting ideas on reasoning I've read in the past couple of months. It uses a recurrent architecture for impressive hierarchical reasoning. Here are my notes:

Hierarchical Reasoning Model

This is one of the most interesting ideas on reasoning I've read in the past couple of months.

It uses a recurrent architecture for impressive hierarchical reasoning. 

Here are my notes:
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

ARC Prize (@arcprize) 's Twitter Profile Photo

Analyzing the Hierarchical Reasoning Model by Guan Wang We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:

Analyzing the Hierarchical Reasoning Model by <a href="/makingAGI/">Guan Wang</a>

We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source

ARC-AGI Semi Private Scores:
* ARC-AGI-1: 32%
* ARC-AGI-2: 2%

Our 4 findings:
François Chollet (@fchollet) 's Twitter Profile Photo

We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiments to get to the bottom of what's behind it. Key findings: 1. The HRM model architecture itself (the centerpiece of the paper) is not an important factor.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my

Lilian Weng (@lilianweng) 's Twitter Profile Photo

On-policy distillation provides an elegant way to use the teacher model as a process reward model to provide dense reward while preventing SFT style "OOD shock" during rollout.

BAAI (@baaibeijing) 's Twitter Profile Photo

🎃 This Halloween, Explore Multimodal Universe! 🚀 Explore, navigate, and interact across space and time — all with long-horizon consistency. Guess how many worlds Emu3.5 jumps through? 👀👀 #Emu3_5 #WorldModel #AI #Halloween2025

Tairan He (@tairanhe99) 's Twitter Profile Photo

Classic scaling law curves from Generalist: more data, lower validation loss — standard ML story. But if we swap the y-axis to real-world success rate, do the same trends hold? Does 0.0105 vs 0.0115 loss actually mean a more reliable policy?

Stefano Ermon (@stefanoermon) 's Twitter Profile Photo

When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of Inception. Proud to raise $50M with support from top investors.