Lin Chen (@lin_chen_98) Twitter Tweets • TwiCopy

Lin Chen

@lin_chen_98

+ Follow

PhD in USTC ｜ Large multimodal models ｜Research intern in Shanghai AI Lab

ID: 1727375153376759808

linkhttp://lin-chen.site calendar_today22-11-2023 17:15:40

25 Tweet

54 Followers

47 Following

AK

@_akhaliq

2 years ago

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper page: huggingface.co/papers/2311.12… In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address

thumb_up_off_alt251

chat_bubble_outline4

repeat53

shareShare

AK

@_akhaliq

a year ago

Are We on the Right Way for Evaluating Large Vision-Language Models? Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify

thumb_up_off_alt307

chat_bubble_outline6

repeat64

shareShare

Lin Chen

@lin_chen_98

a year ago

Looking forward to working on a longer version together! You can preview our ShareGPT4Video project in the following link! sharegpt4video.github.io

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Lin Chen

@lin_chen_98

a year ago

Thanks for AK sharing our work! We sincerely hope this series can help the video-language community！😆😆

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Lin Chen

@lin_chen_98

a year ago

Built our Gradio app and deployed ShareCaptioner-Video on Hugging Face Spaces with ZeroGPU. Now, you can try to generate detailed caption for your own video. Have fun! huggingface.co/spaces/Lin-Che…

thumb_up_off_alt23

chat_bubble_outline0

repeat6

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a year ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output - Excels in various text-image tasks w/ GPT-4V level capabilities with merely 7B LLM backend - Opensourced arxiv.org/abs/2407.03320

thumb_up_off_alt161

chat_bubble_outline4

repeat52

shareShare

AK

@_akhaliq

a year ago

InternLM-XComposer-2.5 A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image

thumb_up_off_alt174

chat_bubble_outline2

repeat41

shareShare

Vaibhav (VB) Srivastav

@reach_vb

a year ago

New SoTA VLM: InternLM XComposer 2.5 🐐 > Beats GPT-4V, Gemini Pro across myriads of benchmarks. > 7B params, 96K context window (w/ RoPE ext) > Trained w/ 24K high quality image-text pairs > InternLM 7B text backbone > Supports high resolution (4K) image understanding tasks >

thumb_up_off_alt271

chat_bubble_outline4

repeat69

shareShare

Lin Chen

@lin_chen_98

a year ago

Thrilled to see myself in the #3 spot on HuggingFace’s most influential users for July! I look forward to doing more impactful works to give back to the community in the future.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Haodong Duan

@kennyutc

a year ago

Excited to share several of our recent works: 1. MMBench (ECCV'24 Oral@6C, Oct 3, 13:30): A comprehensive mutli-modal evaluation benchmark adopted by hundreds of teams working on LMMs. mmbench.opencompass.org.cn 2. Prism (NeurIPS'24): A framework that can disentangle and assess the

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare