Haoran Xu (@fe1ixxu) 's Twitter Profile
Haoran Xu

@fe1ixxu

PhD student in CS @jhuclsp | Intern @Microsoft Research | ex-intern @Meta AI and @Amazon Alexa AI

ID: 899148379279818752

linkhttp://www.fe1ixxu.com calendar_today20-08-2017 05:57:33

64 Tweet

372 Takipçi

169 Takip Edilen

Lingfeng Shen (@lingfeng_nlp) 's Twitter Profile Photo

Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔 Find out in our latest paper: arxiv.org/abs/2310.08540

Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔

Find out in our latest paper: arxiv.org/abs/2310.08540
JHU CLSP (@jhuclsp) 's Twitter Profile Photo

“Condensing Multilingual Knowledge with Lightweight Language-Specific Modules” Draft: arxiv.org/abs/2305.13993 By: Haoran Xu and authors TLDR: We propose lightweight but parameter-efficient language-specific modules and further fuse multilingual knowledge in a shared module.

JHU Computer Science (@jhucompsci) 's Twitter Profile Photo

Multi-language mastery: minimized hardware, maximized efficiency! Johns Hopkins computer scientists (feat. Haoran Xu & Kenton Murray) introduce a new method to reduce the size of multilingual language models. hub.jhu.edu/2023/12/07/mul…

Multi-language mastery: minimized hardware, maximized efficiency! Johns Hopkins computer scientists (feat. <a href="/fe1ixxu/">Haoran Xu</a> &amp; <a href="/kentonmurray/">Kenton Murray</a>) introduce a new method to reduce the size of multilingual language models. hub.jhu.edu/2023/12/07/mul…
Lingfeng Shen (@lingfeng_nlp) 's Twitter Profile Photo

So happy to share that our paper 'The Trickle-down Impact of Reward (In-)consistency on RLHF' (arxiv.org/abs/2309.16155…) has been accepted by ICLR this year. #ICLR #RLHF I believe that we should explore/enhance RLHF through a data-centric perspective! JHU CLSP

Young (@yjkim362) 's Twitter Profile Photo

Opening up a new generation of machine translation leveraging the power of LLMs! It's now in #ICLR2024 (w/ Haoran Xu , Amr Sharaf , Hany Awadalla ). Teaser: Another breakthrough is coming, soonish..

Young (@yjkim362) 's Twitter Profile Photo

We love DPO for its elegance and simplicity. So, we are making it even better! By eliminating the reference model, the loss function becomes contrastive and we call it CPO (Contrastive Preference Optimization). It's even more effective at our target task than DPO!

Weiting (Steven) Tan (@weiting_nlp) 's Twitter Profile Photo

Is your model struggling with high latency and huge memory costs for real-time sequence processing? 🚀 Introducing STAR: A transformer-based model for streaming seq2seq transduction with compression. arxiv.org/abs/2402.01172 #NLProc #Streaming #Seq2seq #Compression #SpeechToText

Lingfeng Shen (@lingfeng_nlp) 's Twitter Profile Photo

📢 Happy to share that our paper on #LLM safety in multilingual contexts has been accepted at #ACL 2024! ✨ We show the difficulty of alleviating multilingual safety issues in LLMs through standard alignment methods. arxiv.org/abs/2401.13136 🧵1/7

Lingfeng Shen (@lingfeng_nlp) 's Twitter Profile Photo

Super excited that our work got picked for an #Oral presentation at #ICML this year! Had an awesome time collaborating with Aayush Mishra and Daniel Khashabi 🕊️ at JHU CLSP. Pity I can't make it to Vienna because of visa issues😅

Haoran Xu (@fe1ixxu) 's Twitter Profile Photo

We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response

Haoran Xu (@fe1ixxu) 's Twitter Profile Photo

Here’s some better news: Combining CPO and SimPO can likely improve the model! Check out more details in our GitHub code: github.com/fe1ixxu/CPO_SI…

Here’s some better news: Combining CPO and SimPO can likely improve the model! Check out more details in our GitHub code: github.com/fe1ixxu/CPO_SI…
Young (@yjkim362) 's Twitter Profile Photo

"What if Phi meets MoE?" I am super excited to share our new Phi-3.5-MoE. Phi-3.5-MoE is a 16 x 3.8B MoE model that only activates 6.6B params with 2 experts. MMLU score of 78.9! It outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash. And, close to GPT-4o-mini. MIT lic