Sicong (@leon_l_s_c) 's Twitter Profile
Sicong

@leon_l_s_c

CS Graduate, Ph.D. majoring in Multi-Modality AI Research, Alibaba-NTU Talent Programme.

ID: 976350326688071680

calendar_today21-03-2018 06:50:32

234 Tweet

228 Takipçi

619 Takip Edilen

Xin (Ted) Li (@lixin4ever) 's Twitter Profile Photo

🔥🔥Introducing VideoLLaMA2-72B 🔥🔥 Supercharge VideoLLaMA2's video understanding capability with #Qwen2 from Qwen - Egoschema (full set): 63.9% - Perception-Test (test): 57.5% - MVBench: 62.0% - VideoMME: w/o subs: 61.4%, w subs: 63.1% The progress of

🔥🔥Introducing VideoLLaMA2-72B 🔥🔥
Supercharge VideoLLaMA2's video understanding capability with #Qwen2 from <a href="/Alibaba_Qwen/">Qwen</a> 
- Egoschema (full set): 63.9% 
- Perception-Test (test): 57.5%
- MVBench: 62.0%
- VideoMME: w/o subs: 61.4%, w subs: 63.1%

The progress of
Adina Yakup (@adinayakup) 's Twitter Profile Photo

🎥 New Video-LLMs update from the Chinese community! VideoLLaMA 2-72B released by 达摩院 🔥 Model:huggingface.co/collections/DA… Demo: huggingface.co/spaces/lixin4e… Paper: huggingface.co/papers/2406.07… ✨ Join the discussion thread, communicate with the authors on the paper page!

Sicong (@leon_l_s_c) 's Twitter Profile Photo

🚀🚀 Excited to share our latest research: "The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio" CMM presents the first systematic investigation of hallucinations in LMMs involving the three most common

Sicong (@leon_l_s_c) 's Twitter Profile Photo

🚀 Big thanks to all the co-authors for their incredible efforts! We’re truly excited to see how scaling up further can unlock new possibilities and drive groundbreaking research. This is just the beginning! 💡

Hao AI Lab (@haoailab) 's Twitter Profile Photo

🎥 Frustrated by Sora's credit limits? Still waiting for Veo 2? 🚀 Open-source video DiTs are actually on par. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Alibaba presents: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Open-sources VideoLLaMA 3, the SotA open-source model on both image and video understanding benchmarks

Alibaba presents:

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Open-sources VideoLLaMA 3, the SotA open-source model on both image and video understanding benchmarks
AIGCLINK (@aigclink) 's Twitter Profile Photo

阿里巴巴达摩院发布了专注于图像和视频理解的多模态基础模型:VideoLLaMA 3,一个智能看视频助手,可以看懂视频内容、理解图片、能对话 基于最新的Qwen2.5架构,支持多帧视频理解 #VideoLLaMA3 #视频理解模型 #LLM

阿里巴巴达摩院发布了专注于图像和视频理解的多模态基础模型:VideoLLaMA 3,一个智能看视频助手,可以看懂视频内容、理解图片、能对话

基于最新的Qwen2.5架构,支持多帧视频理解

#VideoLLaMA3 #视频理解模型 #LLM
Adina Yakup (@adinayakup) 's Twitter Profile Photo

VideoLLaMA 3🔥multimodal foundation models for Image and Video Understanding by DAMO Alibaba Model: huggingface.co/collections/DA… Paper: huggingface.co/papers/2501.13… ✨ 2B/7B ✨ Apache2.0

Sicong (@leon_l_s_c) 's Twitter Profile Photo

🚀🚀 So excited to release such a MLLM series that delivers strong performance on both image and video across different sizes! ♥️♥️ Feel really good to work with such a good team.

Chuanyang Jin (@chuanyang_jin) 's Twitter Profile Photo

How to achieve human-level open-ended machine Theory of Mind? Introducing #AutoToM: a fully automated and open-ended ToM reasoning method combining the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving SOTA results across five benchmarks. 🧵[1/n]

How to achieve human-level open-ended machine Theory of Mind?

Introducing #AutoToM: a fully automated and open-ended ToM reasoning method combining the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving SOTA results across five benchmarks. 🧵[1/n]
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 &amp; DeepSeek’s R1 are pushing the boundaries! 
Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress.
Paper: arxiv.org/pdf/2502.17419
Github: github.com/zzli2022/Aweso…
Yi Xu (@_yixu) 's Twitter Profile Photo

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

🔥Are we ranking LLMs correctly?🔥

Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A &gt; B, B &gt; C, but… C &gt; A?! 🔄
Han Wu (@hahahawu2) 's Twitter Profile Photo

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.
AK (@_akhaliq) 's Twitter Profile Photo

Video Game Bench introduce a research preview of VideoGameBench, a benchmark which challenges vision-language models to complete, in real-time, a suite of 20 different popular video games from both hand-held consoles and PC GPT-4o, Claude Sonnet 3.7, Gemini 2.5 Pro, and Gemini

Hou Pong (Ken) Chan (@kenchanhp) 's Twitter Profile Photo

✨ Meet our latest multimodal reasoning model — VL-Cogito! Inspired by the Latin word “Cogito” (“I think”), VL-Cogito is built for complex and diverse multimodal reasoning tasks, with a strong focus on autonomous thinking and adaptability 💡 🧠 What makes it special? VL-Cogito

Sicong (@leon_l_s_c) 's Twitter Profile Photo

We are excited to officially release RynnVLA-001, a new open-source Vision-Language-Action model! 🤖 Our model outperforms strong baselines like Pi-0 & GR00T-N1.5 in real-world robot manipulations. This is achieved through several key innovations: 🔹 Generative Pre-training: