Haoran Xu(@fe1ixxu) 's Twitter Profileg
Haoran Xu

@fe1ixxu

PhD student in CS @jhuclsp | Intern @Microsoft Research | ex-intern @Meta AI and @Amazon Alexa AI

ID:899148379279818752

linkhttp://www.fe1ixxu.com calendar_today20-08-2017 05:57:33

59 Tweets

247 Followers

158 Following

Weiting (Steven) Tan(@weiting_nlp) 's Twitter Profile Photo

Is your model struggling with high latency and huge memory costs for real-time sequence processing?

🚀 Introducing STAR: A transformer-based model for streaming seq2seq transduction with compression.
arxiv.org/abs/2402.01172

account_circle
Young(@yjkim362) 's Twitter Profile Photo

We love DPO for its elegance and simplicity. So, we are making it even better! By eliminating the reference model, the loss function becomes contrastive and we call it CPO (Contrastive Preference Optimization). It's even more effective at our target task than DPO!

account_circle
Young(@yjkim362) 's Twitter Profile Photo

Opening up a new generation of machine translation leveraging the power of LLMs! It's now in (w/ Haoran Xu , Amr Sharaf , Hany Hassan Awadalla ).

Teaser: Another breakthrough is coming, soonish..

account_circle
Lingfeng Shen(@Lingfeng_nlp) 's Twitter Profile Photo

So happy to share that our paper 'The Trickle-down Impact of Reward (In-)consistency on RLHF' (arxiv.org/abs/2309.16155…) has been accepted by ICLR this year.

I believe that we should explore/enhance RLHF through a data-centric perspective! JHU CLSP

account_circle
JHU Computer Science(@JHUCompSci) 's Twitter Profile Photo

Multi-language mastery: minimized hardware, maximized efficiency! Johns Hopkins computer scientists (feat. Haoran Xu & Kenton Murray) introduce a new method to reduce the size of multilingual language models. hub.jhu.edu/2023/12/07/mul…

Multi-language mastery: minimized hardware, maximized efficiency! Johns Hopkins computer scientists (feat. @fe1ixxu & @kentonmurray) introduce a new method to reduce the size of multilingual language models. hub.jhu.edu/2023/12/07/mul…
account_circle
JHU CLSP(@jhuclsp) 's Twitter Profile Photo

“Condensing Multilingual Knowledge with Lightweight Language-Specific Modules”

Draft: arxiv.org/abs/2305.13993
By: Haoran Xu and authors

TLDR: We propose lightweight but parameter-efficient language-specific modules and further fuse multilingual knowledge in a shared module.

account_circle
Lingfeng Shen(@Lingfeng_nlp) 's Twitter Profile Photo

Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔

Find out in our latest paper: arxiv.org/abs/2310.08540

Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔 Find out in our latest paper: arxiv.org/abs/2310.08540
account_circle
Lingfeng Shen(@Lingfeng_nlp) 's Twitter Profile Photo

Happy to share our findings paper: Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency: arxiv.org/abs/2305.10713

account_circle
Tianjian Li(@tli104) 's Twitter Profile Photo

(1/5) The standard MLE objective is notoriously vulnerable to noise! How can we make LLMs robust to noise in the training data? 🤔

We propose Error Norm Truncation (ENT), a modified training objective that ignores noisy tokens in the training corpus.

📰: arxiv.org/abs/2310.00840

(1/5) The standard MLE objective is notoriously vulnerable to noise! How can we make LLMs robust to noise in the training data? 🤔 We propose Error Norm Truncation (ENT), a modified training objective that ignores noisy tokens in the training corpus. 📰: arxiv.org/abs/2310.00840
account_circle
Lingfeng Shen(@Lingfeng_nlp) 's Twitter Profile Photo

'Does the consistency/robustness of reward model matter in RLHF?' Check our latest work🥳! arxiv.org/pdf/2309.16155…

Had a lot excitement with this work! Feel so lucky to have Sihao Chen and Daniel Khashabi 🕊️ as collaborators!

account_circle