ML@CMU (@mlcmublog) 's Twitter Profile
ML@CMU

@mlcmublog

Official twitter account for the ML@CMU blog @mldcmu @SCSatCMU

ID: 1233552889055834112

linkhttps://blog.ml.cmu.edu/ calendar_today29-02-2020 00:42:45

110 Tweet

2,2K Takipçi

20 Takip Edilen

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2024/10/07/vqa… With the rapid advancement of text-to-visual models like Sora, Midjourney, and Stable Diffusion, evaluating how well the generated imagery follows input text prompts has become a major challenge. However, work by Zhiqiu Lin, Deepak Pathak, Baiqi Li, Emily

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2024/10/29/jai… AI-powered robots are alarmingly easy to jailbreak to perform dangerous tasks, including delivering bombs, surveilling humans, and ignoring traffic laws. What does the future hold for AI-powered robots? Learn more in our latest blog post, based on work

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2024/11/07/ide… Demining 70+ war-affected countries could take 1,100 years at the current pace. This AI-powered tool, developed in close collaboration with the UN in work led by Mateo Dulce, halves false alarms and speeds up clearance. Now tested in Afghanistan &

blog.ml.cmu.edu/2024/11/07/ide…

Demining 70+ war-affected countries could take 1,100 years at the current pace. This AI-powered tool, developed in close collaboration with the UN in work led by Mateo Dulce, halves false alarms and speeds up clearance. Now tested in Afghanistan &
ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2024/12/06/scr… A critical question arises when using large language models: should we fine-tune them or rely on prompting with in-context examples? Recent work led by Junhong Shen and collaborators demonstrates that we can develop state-of-the-art web agents by

blog.ml.cmu.edu/2024/12/06/scr…

A critical question arises when using large language models: should we fine-tune them or rely on prompting with in-context examples? Recent work led by <a href="/JunhongShen1/">Junhong Shen</a> and collaborators demonstrates that we can develop state-of-the-art web agents by
ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2024/12/12/hum… Have you had difficulty using a new machine for DIY or latte-making? Have you forgotten to add spice during cooking? Riku Arakawa Hiromu Yakura Vimal Mollyn, Jill Fain Lehman, and Mayank Goel are leveraging multimodal sensing to improve the

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/01/02/ind… Why is our brain 🧠 modular with specialized areas? Recent research by Ruiyi Zhang @Xaqlab shows that artificial agents 🤖 with modular architectures—mirroring brain-like specialization—achieve better learning and generalization in naturalistic navigation

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, Amrith Setlur, Yuxiao Qu Matthew Yang, Lunjun Zhang , Virginia Smith  and Aviral Kumar demonstrate that Meta RL can help LLMs better optimize test time compute

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/04/09/cop… How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by Wayne Chi and Valerie Chen created Copilot Arena to collect user preferences on in-the-wild workflows. This blogpost overviews the  design and

blog.ml.cmu.edu/2025/04/09/cop…

How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by <a href="/iamwaynechi/">Wayne Chi</a> and <a href="/valeriechen_/">Valerie Chen</a> created <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on in-the-wild workflows. This blogpost overviews the  design and
ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/04/18/llm… 📈⚠️ Is your LLM unlearning benchmark measuring what you think it is? In a new blog post authored by Pratiksha Thaker, Shengyuan Hu, Neil Kale, Yash Maurya, Steven Wu, and Virginia Smith, we discuss why empirical benchmarks are necessary but not

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/04/21/all… Check out our new blog post on ALLIE, a new chess AI that actually plays like a human! Unlike Stockfish or AlphaZero that focus on winning at all costs, ALLIE uses a transformer model trained on human chess games to make moves, ponder and resign like

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/05/22/unl… Are your LLMs truly forgetting unwanted data?  In this new blog post authored by Shengyuan Hu, Yiwei Fu, Steven Wu, and Virginia Smith, we discuss how benign relearning can jog unlearned LLM's memory to recover knowledge that is supposed to be forgotten.

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/06/01/rlh… In this in-depth coding tutorial, Zhaolin Gao and Gokul Swamy walk through the steps to train an LLM via RL from Human Feedback!