Sicong (@leon_l_s_c) Twitter Tweets • TwiCopy

Xin (Ted) Li

a year ago

🔥🔥Introducing VideoLLaMA2-72B 🔥🔥 Supercharge VideoLLaMA2's video understanding capability with #Qwen2 from Qwen - Egoschema (full set): 63.9% - Perception-Test (test): 57.5% - MVBench: 62.0% - VideoMME: w/o subs: 61.4%, w subs: 63.1% The progress of

🔥🔥Introducing VideoLLaMA2-72B 🔥🔥
Supercharge VideoLLaMA2's video understanding capability with #Qwen2 from <a href="/Alibaba_Qwen/">Qwen</a>
- Egoschema (full set): 63.9%
- Perception-Test (test): 57.5%
- MVBench: 62.0%
- VideoMME: w/o subs: 61.4%, w subs: 63.1%

The progress of

thumb_up_off_alt46

chat_bubble_outline0

repeat12

shareShare

Adina Yakup

@adinayakup

a year ago

🎥 New Video-LLMs update from the Chinese community! VideoLLaMA 2-72B released by 达摩院 🔥 Model:huggingface.co/collections/DA… Demo: huggingface.co/spaces/lixin4e… Paper: huggingface.co/papers/2406.07… ✨ Join the discussion thread, communicate with the authors on the paper page!

thumb_up_off_alt87

chat_bubble_outline1

repeat22

shareShare

Sicong

@leon_l_s_c

10 months ago

🚀🚀 Excited to share our latest research: "The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio" CMM presents the first systematic investigation of hallucinations in LMMs involving the three most common

thumb_up_off_alt31

chat_bubble_outline1

repeat7

shareShare

Sicong

@leon_l_s_c

10 months ago

🚀 Big thanks to all the co-authors for their incredible efforts! We’re truly excited to see how scaling up further can unlock new possibilities and drive groundbreaking research. This is just the beginning! 💡

thumb_up_off_alt18

chat_bubble_outline1

repeat3

shareShare

Hao AI Lab

@haoailab

8 months ago

🎥 Frustrated by Sora's credit limits? Still waiting for Veo 2? 🚀 Open-source video DiTs are actually on par. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P

thumb_up_off_alt285

chat_bubble_outline9

repeat62

shareShare

Aran Komatsuzaki

@arankomatsuzaki

7 months ago

Alibaba presents: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Open-sources VideoLLaMA 3, the SotA open-source model on both image and video understanding benchmarks

thumb_up_off_alt231

chat_bubble_outline7

repeat48

shareShare

AK

@_akhaliq

7 months ago

VideoLLaMA 3 Frontier Multimodal Foundation Models for Image and Video Understanding

thumb_up_off_alt227

chat_bubble_outline5

repeat51

shareShare

AIGCLINK

@aigclink

7 months ago

阿里巴巴达摩院发布了专注于图像和视频理解的多模态基础模型：VideoLLaMA 3，一个智能看视频助手，可以看懂视频内容、理解图片、能对话基于最新的Qwen2.5架构，支持多帧视频理解 #VideoLLaMA3 #视频理解模型 #LLM

thumb_up_off_alt63

chat_bubble_outline3

repeat11

shareShare

Adina Yakup

@adinayakup

7 months ago

VideoLLaMA 3🔥multimodal foundation models for Image and Video Understanding by DAMO Alibaba Model: huggingface.co/collections/DA… Paper: huggingface.co/papers/2501.13… ✨ 2B/7B ✨ Apache2.0

thumb_up_off_alt62

chat_bubble_outline3

repeat18

shareShare

Sicong

@leon_l_s_c

7 months ago

🚀🚀 So excited to release such a MLLM series that delivers strong performance on both image and video across different sizes! ♥️♥️ Feel really good to work with such a good team.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Chuanyang Jin

@chuanyang_jin

6 months ago

How to achieve human-level open-ended machine Theory of Mind? Introducing #AutoToM: a fully automated and open-ended ToM reasoning method combining the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving SOTA results across five benchmarks. 🧵[1/n]

thumb_up_off_alt65

chat_bubble_outline1

repeat22

shareShare

Zhijiang Guo

@zhijiangg

6 months ago

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…

thumb_up_off_alt158

chat_bubble_outline2

repeat63

shareShare

Yi Xu

@_yixu

5 months ago

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

thumb_up_off_alt133

chat_bubble_outline2

repeat33

shareShare

Han Wu

@hahahawu2

5 months ago

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.

thumb_up_off_alt17

chat_bubble_outline1

repeat11

shareShare

AK

@_akhaliq

4 months ago

Video Game Bench introduce a research preview of VideoGameBench, a benchmark which challenges vision-language models to complete, in real-time, a suite of 20 different popular video games from both hand-held consoles and PC GPT-4o, Claude Sonnet 3.7, Gemini 2.5 Pro, and Gemini

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat152

shareShare

Hou Pong (Ken) Chan

@kenchanhp

a month ago

✨ Meet our latest multimodal reasoning model — VL-Cogito! Inspired by the Latin word “Cogito” (“I think”), VL-Cogito is built for complex and diverse multimodal reasoning tasks, with a strong focus on autonomous thinking and adaptability 💡 🧠 What makes it special? VL-Cogito

thumb_up_off_alt15

chat_bubble_outline0

repeat7

shareShare

Sicong

@leon_l_s_c

18 days ago

We are excited to officially release RynnVLA-001, a new open-source Vision-Language-Action model! 🤖 Our model outperforms strong baselines like Pi-0 & GR00T-N1.5 in real-world robot manipulations. This is achieved through several key innovations: 🔹 Generative Pre-training:

thumb_up_off_alt26

chat_bubble_outline3

repeat8

shareShare

Sicong

@leon_l_s_c

18 days ago

Thanks for posting🙏🩷

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare