Silun (@silunwang) 's Twitter Profile
Silun

@silunwang

LLM Post-training

ID: 2634851233

calendar_today13-07-2014 05:00:42

24 Tweet

47 Followers

131 Following

AI Will (@financeyf5) 's Twitter Profile Photo

距离Anthropic发布Claude的“模型上下文协议”(MCP)才一天,就像是AI的万能接口。 现在,AI连接工具和数据变得异常简单。 人们已经开始疯狂地用它来完成各种工作。 以下是10个令人惊叹的例子:

random (@zeyu_kap) 's Twitter Profile Photo

这两天付鹏的文章刷屏了,里面提到了分配是个大问题。在AI时代这个问题似乎更无解了。 AI对于工作效率的提升巨大,意味着每个AI进入的行业,企业都可以用更少的岗位换取更多更好的产出。 目前受这类威胁的还只是中产白领,10年内TSLA的Optimus量产后,估计不少蓝领岗位也会被替换掉。

Silun (@silunwang) 's Twitter Profile Photo

2023-2024对我来说应该连在一块看,可能是生命中最兼具“广度”和“厚度”的两年:横祸、斗争、失意、恐惧、探索、受助、觉醒,最终归于平静和快乐。有懂占星的朋友说男人三十土星回归,懂八门遁甲的算出青龙折足招灾失财。我只知道一定要感激这个赛季,感谢一出又一出drama提升了我生命的认知。2025,阿门

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n

wh (@nrehiew_) 's Twitter Profile Photo

How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Llama 405B

How to train a 670B parameter model. 

Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Llama 405B
howie.serious (@howie_serious) 's Twitter Profile Photo

使用 reasoning model 的新范式:要思想对话(dialogue),不要无脑闲聊(chat) 昨天晚上和o1 的一段对话,4 个来回;虽然是在手机上敲的,但也得分行分段有标点。 o1 说完后,我会仔细阅读。然后把新想法用文字组织起来。往往是 一到几分钟后。

使用 reasoning model 的新范式:要思想对话(dialogue),不要无脑闲聊(chat)

昨天晚上和o1 的一段对话,4 个来回;虽然是在手机上敲的,但也得分行分段有标点。

o1 说完后,我会仔细阅读。然后把新想法用文字组织起来。往往是 一到几分钟后。
Silun (@silunwang) 's Twitter Profile Photo

DeepSeek是一个战术上比较成功,但战略上很蠢的例子。很取巧地汇报最后一次训练成功的单次炼丹成本,战术上赢得了掌声、关注、国内资本青睐。战略上使得美国对东方大国的警惕大增。以后连H20都别想买了,而国内显卡没法用。这可能导致东方大国在未来很长一段时间军备竞赛落后,烧了国内其他AI公司的路

Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reasoning capability. ✨ Key Highlights: 💡 An MoE VLM and an MoE Reasoning VLM with only ~3B activated parameters 🧠 Strong multimodal reasoning (36.8% on

🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reasoning capability.

✨ Key Highlights:
💡 An MoE VLM and an MoE Reasoning VLM with only ~3B activated parameters  
🧠 Strong multimodal reasoning (36.8% on
elvis (@omarsar0) 's Twitter Profile Photo

Why does RL work for enhancing agentic reasoning? This paper studies what actually works when using RL to improve tool-using LLM agents, across three axes: data, algorithm, and reasoning mode. Instead of chasing bigger models or fancy algorithms, the authors find that real,

Why does RL work for enhancing agentic reasoning?

This paper studies what actually works when using RL to improve tool-using LLM agents, across three axes: data, algorithm, and reasoning mode.

Instead of chasing bigger models or fancy algorithms, the authors find that real,
vLLM (@vllm_project) 's Twitter Profile Photo

🚀 No More Train–Inference Mismatch! We demonstrate bitwise consistent on-policy RL with TorchTitan (training) + vLLM (inference) — the first open-source run where training and inference numerics match exactly. It only takes 3 steps: 1️⃣ Make vLLM batch-invariant (same seq →

Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Glad to introduce our research on understanding the "mathematical principles" behind reinforcement learning (RL) with LLMs, and how stabilization techniques work 🧠 📄 huggingface.co/papers/2512.01… 👇 Thread below

Glad to introduce our research on understanding the "mathematical principles" behind reinforcement learning (RL) with LLMs, and how stabilization techniques work 🧠

📄 huggingface.co/papers/2512.01…
👇 Thread below
Kimbo Chen (@kimbochen) 's Twitter Profile Photo

Hot topics in RL On-policy RL Everyone faces training rollout mismatch - Truncated importance sampling: fengyao.notion.site/off-policy-rl#… - IcePop: doubled-ended importance ratio clipping - Rollout Routing Replay: arxiv.org/abs/2510.11370 Efficient rollout systems design PipelineRL:

Rosinality (@rosinality) 's Twitter Profile Photo

Single rollout RL for multimodal RL. It is similar to the previous approach of single rollout RL (arxiv.org/abs/2509.13232) but they were able to stabilize this only after applying advantage shaping with an entropy bonus.

Single rollout RL for multimodal RL. It is similar to the previous approach of single rollout RL (arxiv.org/abs/2509.13232) but they were able to stabilize this only after applying advantage shaping with an entropy bonus.
Boris Cherny (@bcherny) 's Twitter Profile Photo

I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in