Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile
Xinyan Velocity Yu

@xinyanvyu

#NLProc PhD @usc, bsms @uwcse | Previously @Meta @Microsoft @Pinterest | Doing random walks in Seattle

ID: 1288519465756315648

linkhttps://velocitycavalry.github.io calendar_today29-07-2020 16:59:25

117 Tweet

802 Followers

662 Following

Leo Du (@leoduw) 's Twitter Profile Photo

Following up a weekend effort by another weekend effort: llama2. rs 🦀 github.com/leo-du/llama2.… In a single Rust file w/ * zero dependencies (i.e. custom rng w/ PCG) * zero lines of `unsafe` code (very 🦀!) * support user prompts * (almost) same performance

Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

Very cool work! Happy to see that our findings in CREPE (arxiv.org/abs/2211.17257) that false presuppositions/premises are still challenging for LLMs😆, and integrating search engine makes the answer better!

Fangyuan Xu (@brunchavecmoi) 's Twitter Profile Photo

🔌Enhancing language models with retrieval boosts performance but demands more computes for encoding the retrieved documents. Do we need all the documents for the gains? We present 𝐑etrieve 𝐂𝐨𝐦press 𝐏repend (𝐑𝐄𝐂𝐎𝐌𝐏) arxiv.org/abs/2310.04408 (w/Weijia Shi, Eunsol Choi)

🔌Enhancing language models with retrieval boosts performance but demands more computes for encoding the retrieved documents. Do we  need all the documents for the gains?
We present 𝐑etrieve 𝐂𝐨𝐦press 𝐏repend (𝐑𝐄𝐂𝐎𝐌𝐏)
arxiv.org/abs/2310.04408 (w/<a href="/WeijiaShi2/">Weijia Shi</a>, <a href="/eunsolc/">Eunsol Choi</a>)
Kaiser Sun (@kaiserwholearns) 's Twitter Profile Photo

🤔 How much do compositional generalization datasets agree with each other? We compare common compositional generalization benchmarks and find that they rank modeling approaches differently (❗) 🧵👇 #CoNLL2023 arxiv.org/abs/2310.17514

🤔 How much do compositional generalization datasets agree with each other?
We compare common compositional generalization benchmarks and find that they rank modeling approaches differently (❗) 🧵👇
#CoNLL2023 arxiv.org/abs/2310.17514
Deqing Fu (@deqingfu) 's Twitter Profile Photo

Do multimodal foundation models treat every modality equally? Hint: Humans have picture superiority. How about machines? Introducing IsoBench, a benchmark for multimodal models with isomorphic inputs. 🔗 IsoBench.github.io

Do multimodal foundation models treat every modality equally?

Hint: Humans have picture superiority. How about machines?

Introducing IsoBench, a benchmark for multimodal models with isomorphic inputs.

🔗 IsoBench.github.io
Zhaofeng Wu (@zhaofeng_wu) 's Twitter Profile Photo

Want to train an aligned LM in a new language 🌏 but don’t have preference data for training the reward model (RM)? 💡 Just use a RM for another language: it often works well, sometimes even BETTER than if you had a RM in your target language! 🤯 arxiv.org/abs/2404.12318

Want to train an aligned LM in a new language 🌏 but don’t have preference data for training the reward model (RM)?

💡 Just use a RM for another language: it often works well, sometimes even BETTER than if you had a RM in your target language! 🤯 arxiv.org/abs/2404.12318
Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

My takeaways when figuring out living arrangements: (1) PhD students need to be better paid as 50%-75% of my salary is on rent and commute, (2) accessible and affordable on-campus housing should be given, and (3) learn to drive early and live in less sketchy places. 🥲

My takeaways when figuring out living arrangements: (1) PhD students need to be better paid as 50%-75% of my salary is on rent and commute, (2) accessible and affordable on-campus housing should be given, and (3) learn to drive early and live in less sketchy places. 🥲
Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

It is a great pleasure working with Ting-Rui and others on this project to understand retrieval augmentation and LM training a little bit better!

Yushi Hu (@huyushi98) 's Twitter Profile Photo

Humans draw to facilitate reasoning and communication. Why not let LLMs do so? 🚀We introduce✏️Sketchpad, which gives multimodal LLMs a sketchpad to draw and facilitate reasoning! arxiv.org/abs/2406.09403 Sketchpad gives GPT-4o great boosts on many vision and math tasks 📈 The

Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

So happy to meet new and old friends in NAACL ❤️! I’ll be presenting our work BUFFET🎉: ⏰ Monday, June 17th at 14:00 📍Don Alberto 4 If you’re into multilinguality and seeking a benchmark for fair comparison of models for both languages & methods, don’t miss it! 🤩 #NAACL2024

Michael Saxon (in Seattle) (@m2saxon) 's Twitter Profile Photo

Venelin Kovatchev Ben Zhou 🌴Muhao Chen🌴 Awesome analysis of what KNN-LM says abt training: Is the seeming "free lunch" of KNN-LM (replacing top LM layers with embedding store and KNN lookup) due to a weakness of the LM objctve? Seems no! Training a replacement MLP on the KNN does better! 🤔 aclanthology.org/2024.naacl-sho…

<a href="/sintelion/">Venelin Kovatchev</a> <a href="/BenZhou96/">Ben Zhou</a> <a href="/muhao_chen/">🌴Muhao Chen🌴</a> Awesome analysis of what KNN-LM says abt training:

Is the seeming "free lunch" of KNN-LM (replacing top LM layers with embedding store and KNN lookup) due to a weakness of the LM objctve? Seems no!

Training a replacement MLP on the KNN does better! 🤔

aclanthology.org/2024.naacl-sho…
Mukund Srinath @ NAACL (@mukundsrinath3) 's Twitter Profile Photo

#NAACL2024 NAACL HLT 2024 Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks Zhaofeng Wu (Zhaofeng Wu @ ACL) arxiv.org/pdf/2307.02477

#NAACL2024
<a href="/naaclmeeting/">NAACL HLT 2024</a>
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
Zhaofeng Wu (<a href="/zhaofeng_wu/">Zhaofeng Wu @ ACL</a>)
arxiv.org/pdf/2307.02477
Belinda Li (@belindazli) 's Twitter Profile Photo

As the world changes, documents go out of date. How can we adapt RAG systems to a stream of changing world data? We introduce ERASE, a way of updating and propagating facts within knowledge bases, and CLARK, a dataset targeting these update problems arxiv.org/abs/2406.11830… 1/

As the world changes, documents go out of date. How can we adapt RAG systems to a stream of changing world data?

We introduce ERASE, a way of updating and propagating facts within knowledge bases, and CLARK, a dataset targeting these update problems

arxiv.org/abs/2406.11830…

1/
Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

CodeRAG-Bench is extremely meaningful! We experiment with different retrievers, types of retrieval source documents, code generation tasks, and language models to find out how retrieval can help! For more, please read our exciting paper 👉👉

CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.