Xinting Huang (@timhuangxt) 's Twitter Profile
Xinting Huang

@timhuangxt

Senior Researcher @TencentGlobal, working on LLMs.
Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

ID: 744818118317400064

linkhttps://timhuang1.github.io/ calendar_today20-06-2016 09:04:12

12 Tweet

132 Takipçi

333 Takip Edilen

Longyue Wang (@wangly0229) 's Twitter Profile Photo

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟 📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. 💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟

📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5
AK (@_akhaliq) 's Twitter Profile Photo

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative

FuseChat

Knowledge Fusion of Chat Models

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative
Xinting Huang (@timhuangxt) 's Twitter Profile Photo

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

Longyue Wang (@wangly0229) 's Twitter Profile Photo

🚀Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io 🚀 Dive into our leaderboard: - 📊 Evaluating 33 Video-LMMs across 27 tasks; - 🥉 The latest GPT-4o-Mini clinches 3rd place; - 🏆 InternLM-XComposer-2.5 emerges as the

AK (@_akhaliq) 's Twitter Profile Photo

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10… Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been

To Code, or Not To Code? 

Exploring Impact of Code in Pre-training

discuss: huggingface.co/papers/2408.10…

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been
Xinting Huang (@timhuangxt) 's Twitter Profile Photo

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this ‘rebuttal’ backed by such rigorous analysis — reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

Longyue Wang (@wangly0229) 's Twitter Profile Photo

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔

We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly