Xinting Huang (@timhuangxt) Twitter Tweets • TwiCopy

Xinting Huang

@timhuangxt

+ Follow

Senior Researcher @TencentGlobal, working on LLMs.
Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

ID: 744818118317400064

linkhttps://timhuang1.github.io/ calendar_today20-06-2016 09:04:12

12 Tweet

132 Followers

333 Following

Longyue Wang

@wangly0229

2 years ago

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟 📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. 💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5

thumb_up_off_alt251

chat_bubble_outline7

repeat175

shareShare

AK

@_akhaliq

a year ago

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative

thumb_up_off_alt133

chat_bubble_outline3

repeat29

shareShare

Xinting Huang

@timhuangxt

a year ago

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Longyue Wang

@wangly0229

a year ago

🚀Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io 🚀 Dive into our leaderboard: - 📊 Evaluating 33 Video-LMMs across 27 tasks; - 🥉 The latest GPT-4o-Mini clinches 3rd place; - 🏆 InternLM-XComposer-2.5 emerges as the

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare

AK

@_akhaliq

a year ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10… Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been

thumb_up_off_alt304

chat_bubble_outline6

repeat67

shareShare

Xinting Huang

@timhuangxt

10 months ago

Exciting to see our old friend continuing to push the real-world boundaries of LLM applications (shoutout to MT here)!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Xinting Huang

@timhuangxt

9 months ago

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this ‘rebuttal’ backed by such rigorous analysis — reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Longyue Wang

@wangly0229

4 months ago

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly

thumb_up_off_alt50

chat_bubble_outline2

repeat22

shareShare