Yuhao Dong (@dyhthu) 's Twitter Profile
Yuhao Dong

@dyhthu

ID: 1709110084834598912

calendar_today03-10-2023 07:36:26

61 Tweet

81 Takipçi

172 Takip Edilen

Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile Photo

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

Moonshot’s Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier - with only OpenAI, Anthropic and Google models ahead Key takeaways: ➤ Impressive performance on agentic tasks: Kimi.ai's Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA

Moonshot’s Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier - with only OpenAI, Anthropic and Google models ahead

Key takeaways:

➤ Impressive performance on agentic tasks: <a href="/Kimi_Moonshot/">Kimi.ai</a>'s Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA
Ziwei Liu (@liuziwei7) 's Twitter Profile Photo

🚤Real-Time Streaming VLA for Dynamic Manipulation🚤 #DynamicVLA is a 0.4B vision-language-action model that manipulates *moving* objects in real-time, with continuous inference and latent-aware action streaming - Project: infinitescript.com/project/dynami… - Code: github.com/hzxie/DynamicV…

Ziqi Huang (@ziqi_huang_) 's Twitter Profile Photo

𝗧𝗵𝗲 𝗔𝗜 𝗧𝗮𝗹𝗸𝘀 will be hosting SAM 3D (weiyaow ) and SAM 3D Body (Xitong Yang) from @MetaAI. 🕐 Feb 3 (Tue) - 13:00 SGT | Feb 2 (Mon) - 21:00 PST 📩 PM me for the Zoom link 🔔 Get notified of future talks The AI Talks: theaitalks.org/subscribe/

𝗧𝗵𝗲 𝗔𝗜 𝗧𝗮𝗹𝗸𝘀 will be hosting SAM 3D (<a href="/weiyaow1/">weiyaow</a> ) and SAM 3D Body (<a href="/XitongYang1/">Xitong Yang</a>) from @MetaAI.

🕐 Feb 3 (Tue) - 13:00 SGT | Feb 2 (Mon) - 21:00 PST

📩 PM me for the Zoom link
🔔 Get notified of future talks <a href="/TheAITalksOrg/">The AI Talks</a>: theaitalks.org/subscribe/
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

We're introducing WorldVQA, a new benchmark to measure atomic vision-centric world knowledge in Multimodal Large Language Models. Current evaluations often conflate visual knowledge retrieval with reasoning. In contrast, WorldVQA decouples these capabilities to strictly measure

We're introducing WorldVQA, a new benchmark to measure atomic vision-centric world knowledge in Multimodal Large Language Models. 

Current evaluations often conflate visual knowledge retrieval with reasoning. In contrast, WorldVQA decouples these capabilities to strictly measure
Yuhao Dong (@dyhthu) 's Twitter Profile Photo

✨Moving beyond static knowledge in Video Understanding! We are thrilled to unveil Demo-ICL🧠, a new framework that challenges current models to do more than just "remember"—we want them to learn from context. We introduce: 1️⃣ Demo-ICL-Bench📜: A massive, challenging

Shulin Tian (@shulin_tian) 's Twitter Profile Photo

Can MLLMs learn from video demonstrations just like humans do? 🤔 Introducing 𝗗𝗲𝗺𝗼-𝗜𝗖𝗟: 𝗜𝗻-𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗣𝗿𝗼𝗰𝗲𝗱𝘂𝗿𝗮𝗹 𝗩𝗶𝗱𝗲𝗼 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗔𝗰𝗾𝘂𝗶𝘀𝗶𝘁𝗶𝗼𝗻 Most video MLLMs rely on static internal knowledge. This work

Can MLLMs learn from video demonstrations just like humans do? 🤔

Introducing 𝗗𝗲𝗺𝗼-𝗜𝗖𝗟: 𝗜𝗻-𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗣𝗿𝗼𝗰𝗲𝗱𝘂𝗿𝗮𝗹 𝗩𝗶𝗱𝗲𝗼 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗔𝗰𝗾𝘂𝗶𝘀𝗶𝘁𝗶𝗼𝗻

Most video MLLMs rely on static internal knowledge. This work
Ziwei Liu (@liuziwei7) 's Twitter Profile Photo

🤔In-Context Learning (ICL) in Video LLMs🤔 🎞️Demo-ICL🎞️ equips video LLMs with the ability to learn and adapt from *dynamic novel contexts from few examples*, rather than relying on the static internal knowledge. - Paper: arxiv.org/pdf/2602.08439 - Code: github.com/dongyh20/Demo-…

🤔In-Context Learning (ICL) in Video LLMs🤔

🎞️Demo-ICL🎞️ equips video LLMs with the ability to learn and adapt from *dynamic novel contexts from few examples*, rather than relying on the static internal knowledge.

- Paper: arxiv.org/pdf/2602.08439
- Code: github.com/dongyh20/Demo-…
Ziwei Liu (@liuziwei7) 's Twitter Profile Photo

🚀Codec-Aligned Sparsity for Multimodal Intelligence🚀 lmms-lab presents 🎇OneVision-Encoder🎇, a *scalable, efficient & powerful vision encoder* for next-gen LMMs with streaming inputs 🧠Insight: codec-aligned, patch-level sparsity as a foundational principle 📊Performance:

Z.ai (@zai_org) 's Twitter Profile Photo

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens.

Introducing GLM-5: From Vibe Coding to Agentic Engineering

GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens.
Tianzhu Ye ✈️ ICLR Singapore (@ytz2024) 's Twitter Profile Photo

(1/n) Introduce On-Policy Context Distillation (OPCD), a framework to internalize transient in-context knowledge into model parameters via on-policy learning. This also launches our series, Experiential Learning -- Part I: On-Policy Context Distillation for Experiential Learning

(1/n) Introduce On-Policy Context Distillation (OPCD), a framework to internalize transient in-context knowledge into model parameters via on-policy learning.

This also launches our series, Experiential Learning -- Part I: On-Policy Context Distillation for Experiential Learning
Li Dong (@donglixp) 's Twitter Profile Photo

On-Policy Context Distillation for Experiential Learning: learning from experience (consolidated from trajectories) at test time.