Xinting Huang (@timhuangxt) 's Twitter Profile
Xinting Huang

@timhuangxt

Senior Researcher @TencentGlobal, working on LLMs.
Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

ID: 744818118317400064

linkhttps://timhuang1.github.io/ calendar_today20-06-2016 09:04:12

12 Tweet

132 Followers

333 Following

Longyue Wang (@wangly0229) 's Twitter Profile Photo

๐Ÿš€ A game-changer benchmark: LLM-Uncertainty-Bench ๐ŸŒŸ ๐Ÿ“š We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. ๐Ÿ’ก Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5

๐Ÿš€ A game-changer benchmark: LLM-Uncertainty-Bench ๐ŸŒŸ

๐Ÿ“š We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
๐Ÿ’ก Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5
AK (@_akhaliq) 's Twitter Profile Photo

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative

FuseChat

Knowledge Fusion of Chat Models

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative
Xinting Huang (@timhuangxt) 's Twitter Profile Photo

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out ๐Ÿ‘‡

Longyue Wang (@wangly0229) 's Twitter Profile Photo

๐Ÿš€Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io ๐Ÿš€ Dive into our leaderboard: - ๐Ÿ“Š Evaluating 33 Video-LMMs across 27 tasks; - ๐Ÿฅ‰ The latest GPT-4o-Mini clinches 3rd place; - ๐Ÿ† InternLM-XComposer-2.5 emerges as the

AK (@_akhaliq) 's Twitter Profile Photo

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10โ€ฆ Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been

To Code, or Not To Code? 

Exploring Impact of Code in Pre-training

discuss: huggingface.co/papers/2408.10โ€ฆ

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been
Xinting Huang (@timhuangxt) 's Twitter Profile Photo

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this โ€˜rebuttalโ€™ backed by such rigorous analysis โ€” reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

Longyue Wang (@wangly0229) 's Twitter Profile Photo

๐ŸŒบGPT-4oโ€™s image generation is stunning โ€” but how well does it handle complex scenarios? ๐Ÿค” We introduce ๐Ÿš€CIGEVAL๐Ÿš€, a novel method to evaluate models' capabilities in Conditional Image Generation ๐Ÿ–ผ๏ธโž•๐Ÿ–ผ๏ธ๐ŸŸฐ๐Ÿ–ผ๏ธ. Find out how top models perform when conditions get truly

๐ŸŒบGPT-4oโ€™s image generation is stunning โ€” but how well does it handle complex scenarios? ๐Ÿค”

We introduce ๐Ÿš€CIGEVAL๐Ÿš€, a novel method to evaluate models' capabilities in Conditional Image Generation ๐Ÿ–ผ๏ธโž•๐Ÿ–ผ๏ธ๐ŸŸฐ๐Ÿ–ผ๏ธ. Find out how top models perform when conditions get truly