Lin Chen (@lin_chen_98) 's Twitter Profile
Lin Chen

@lin_chen_98

PhD in USTC | Large multimodal models |Research intern in Shanghai AI Lab

ID: 1727375153376759808

linkhttp://lin-chen.site calendar_today22-11-2023 17:15:40

25 Tweet

54 Followers

47 Following

AK (@_akhaliq) 's Twitter Profile Photo

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper page: huggingface.co/papers/2311.12… In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

paper page: huggingface.co/papers/2311.12…

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address
AK (@_akhaliq) 's Twitter Profile Photo

Are We on the Right Way for Evaluating Large Vision-Language Models? Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify

Are We on the Right Way for Evaluating Large Vision-Language Models?

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify
Lin Chen (@lin_chen_98) 's Twitter Profile Photo

Looking forward to working on a longer version together! You can preview our ShareGPT4Video project in the following link! sharegpt4video.github.io

Lin Chen (@lin_chen_98) 's Twitter Profile Photo

Built our Gradio app and deployed ShareCaptioner-Video on Hugging Face Spaces with ZeroGPU. Now, you can try to generate detailed caption for your own video. Have fun! huggingface.co/spaces/Lin-Che…

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output - Excels in various text-image tasks w/ GPT-4V level capabilities with merely 7B LLM backend - Opensourced arxiv.org/abs/2407.03320

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

- Excels in various text-image tasks w/ GPT-4V level capabilities with merely 7B LLM backend
- Opensourced

arxiv.org/abs/2407.03320
AK (@_akhaliq) 's Twitter Profile Photo

InternLM-XComposer-2.5 A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image

Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

New SoTA VLM: InternLM XComposer 2.5 🐐 > Beats GPT-4V, Gemini Pro across myriads of benchmarks. > 7B params, 96K context window (w/ RoPE ext) > Trained w/ 24K high quality image-text pairs > InternLM 7B text backbone > Supports high resolution (4K) image understanding tasks >

New SoTA VLM: InternLM XComposer 2.5 🐐
> Beats GPT-4V, Gemini Pro across myriads of benchmarks.
> 7B params, 96K context window (w/ RoPE ext)
> Trained w/ 24K high quality image-text pairs
> InternLM 7B text backbone
> Supports high resolution (4K) image understanding tasks
>
Lin Chen (@lin_chen_98) 's Twitter Profile Photo

Thrilled to see myself in the #3 spot on HuggingFace’s most influential users for July! I look forward to doing more impactful works to give back to the community in the future.

Haodong Duan (@kennyutc) 's Twitter Profile Photo

Excited to share several of our recent works: 1. MMBench (ECCV'24 Oral@6C, Oct 3, 13:30): A comprehensive mutli-modal evaluation benchmark adopted by hundreds of teams working on LMMs. mmbench.opencompass.org.cn 2. Prism (NeurIPS'24): A framework that can disentangle and assess the

Excited to share several of our recent works:

1. MMBench (ECCV'24 Oral@6C, Oct 3, 13:30): A comprehensive mutli-modal evaluation benchmark adopted by hundreds of teams working on LMMs.
mmbench.opencompass.org.cn
2. Prism (NeurIPS'24): A framework that can disentangle and assess the