Wei Xu (@cocoweixu) Twitter Tweets • TwiCopy

Wei Xu

10 months ago

It was a pleasure to host Mike Lewis from Meta to speak at Georgia Tech's ML Seminar this week. His insights into training LLMs, based on his firsthand experience with successfully developing Llama-3 🦙, were incredibly engaging and informative. Thanks, Mike Lewis!

thumb_up_off_alt39

chat_bubble_outline1

repeat1

shareShare

Yu Lu Liu 🦋@ liuyulu.bsky.social

@liu_yu_lu

9 months ago

Human-centered Evalulation and Auditing of Language models (HEAL) workshop is back for #CHI2025, with this year's special theme: “Mind the Context”! Come join us on this bridge between #HCI and #NLProc! Submission deadline: Feb 17 AoE More info at heal-workshop.github.io.

thumb_up_off_alt42

chat_bubble_outline1

repeat9

shareShare

Chao Jiang

@chaojiang06

9 months ago

I defended my doctoral thesis, "Studying Text Revision in Scientific Writing," yesterday!🎓 A big thanks to my advisor Wei Xu for training me to become a researcher, and thanks to my respected committee Alan Ritter kartik goyal Violet Peng & Dr. Cheng Li from DeepMind!

I defended my doctoral thesis, "Studying Text Revision in Scientific Writing," yesterday!🎓

A big thanks to my advisor <a href="/cocoweixu/">Wei Xu</a> for training me to become a researcher, and thanks to my respected committee <a href="/alan_ritter/">Alan Ritter</a> <a href="/kartik_goyal_/">kartik goyal</a> <a href="/VioletNPeng/">Violet Peng</a> & Dr. Cheng Li from DeepMind!

thumb_up_off_alt62

chat_bubble_outline7

repeat2

shareShare

Ziang Xiao

@ziangxiao

8 months ago

Dr. Susu Zhang (sites.google.com/view/susuzhang/) and I are recruiting a postdoc to work on Gen AI evaluation design and assessment. If you want to join the team, please contact me and apply through the DSAI fellowship!

thumb_up_off_alt43

chat_bubble_outline0

repeat12

shareShare

Jim Fan

@drjimfan

8 months ago

This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv

thumb_up_off_alt3,3K

chat_bubble_outline93

repeat401

shareShare

Tarek Naous

@tareknaous

8 months ago

What causes entity-related cultural biases in LMs? Is it just pre-training data? Our latest paper shows how varying linguistic phenomena exhibited by entities (such as word sense in Arabic) impact the cross-cultural performance of LMs. arxiv.org/abs/2501.04662

thumb_up_off_alt46

chat_bubble_outline2

repeat13

shareShare

Tu Vu

@tuvllms

8 months ago

📢📢 If you're interested in a full-time Research Scientist/Engineer position at Google DeepMind (Mountain View, CA) working on connecting retrieval to Gemini, RAG, generative retrieval, open-book QA, etc., please email me at [email protected] with your CV and/or website.

thumb_up_off_alt447

chat_bubble_outline10

repeat46

shareShare

Georgia Tech Computing

@gtcomputing

8 months ago

As artificial intelligence (AI) continues to evolve, its impact on society becomes increasingly profound. To gain insights into the trends shaping the AI landscape in 2025, we spoke with Wei Xu (Wei Xu), an associate professor at Georgia Tech’s School of Interactive Computing

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Mohit

@mohit_r9a

7 months ago

🚨Just out Targeted data curation for SFT and RLHF is a significant cost factor 💰for improving LLM performance during post-training. How should you allocate your data annotation budgets between SFT and Preference Data? We ran 1000+ experiments to find out! 1/7

thumb_up_off_alt139

chat_bubble_outline2

repeat30

shareShare

Hanna Hajishirzi

@hannahajishirzi

6 months ago

Excited to drive innovation and push the boundaries of open, scientific AI research & development! 🚀 Join us at Ai2 to shape the future of OLMo, Molmo, Tulu, and more. We’re hiring at all levels—apply now! 👇 #AI #Hiring Research Engineer job-boards.greenhouse.io/thealleninstit… Research

thumb_up_off_alt66

chat_bubble_outline1

repeat15

shareShare

Jungsoo Park

@jungsoo___park

6 months ago

🚨 Just Out Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior? We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

thumb_up_off_alt39

chat_bubble_outline1

repeat12

shareShare

Jonathan Zheng

@jonathanqzheng

6 months ago

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

thumb_up_off_alt18

chat_bubble_outline1

repeat9

shareShare

Alan Ritter

@alan_ritter

5 months ago

Wondering what review scores you need to get accepted at ACL? Maybe this data from NAACL 2025 can help: gist.github.com/aritter/8b65a9…

thumb_up_off_alt77

chat_bubble_outline3

repeat13

shareShare

Agam A. Shah

@shahagam4

5 months ago

Thrilled to share our new preprint: "Beyond the Reported Cutoff: Where LLMs Fall Short on Financial Knowledge" We evaluated 197,011 revenue questions across 17,621 U.S. companies (1980–2022) using 6 top LLMs. Key insights 🧵

thumb_up_off_alt14

chat_bubble_outline1

repeat5

shareShare

Wei Xu

@cocoweixu

4 months ago

I am giving a keynote at PrivateNLP Workshop (sites.google.com/view/privatenl…) at #NAACL2025 (Sunday 9am CT). * GPT4-v is a performant geolocator, predicting exact GPS coordinates of image > any SOTA * LLMs can estimate privacy risk based on probabilistic reasoning > chain-of-thoughts

thumb_up_off_alt79

chat_bubble_outline1

repeat8

shareShare

Wei Xu

@cocoweixu

3 months ago

Thank you JinYeong Bak for hosting!

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Andrej Karpathy

@karpathy

3 months ago

An attempt to explain (current) ChatGPT versions. I still run into many, many people who don't know that: - o3 is the obvious best thing for important/hard things. It is a reasoning model that is much stronger than 4o and if you are using ChatGPT professionally and not using o3

thumb_up_off_alt11,11K

chat_bubble_outline558

repeat1,1K

shareShare

Geyang Guo

@cherylolguo

2 months ago

❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning 3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers 💡 Key insights: - Even a small amount of cultural data improves popular LLMs consistently. - Deepseek-v3

thumb_up_off_alt61

chat_bubble_outline2

repeat14

shareShare