Yao Dou (@yaooo01) Twitter Tweets • TwiCopy

Yang Chen

2 years ago

If you are interested in multimodal RAG, check out our new paper! - We propose a unified multimodal retriever trained with instruction-tuning on 8 tasks (🖼️image/📝text query -> target) arxiv.org/abs/2311.17136 #RAG #multimodal

thumb_up_off_alt142

chat_bubble_outline0

repeat30

shareShare

Duong Le

@duonglm38

2 years ago

[paper@ #ICLR2024] We study the problem of Label Projection for translating span-level training datasets from high to low-resource languages Our proposed approach - CODEC - uses a customized constrained decoding algorithm to more accurately translate and project label spans 1/n

thumb_up_off_alt36

chat_bubble_outline5

repeat5

shareShare

Wei Xu

@cocoweixu

a year ago

Upcoming tutorial at ACL 2024 on Automatic & Human-AI Interactive Text Generation tutorial with Yao Dou Philippe Laban Claire Gardent. Stay tuned :) arxiv.org/pdf/2310.03878

Upcoming tutorial at ACL 2024 on Automatic & Human-AI Interactive Text Generation tutorial with <a href="/Yaooo01/">Yao Dou</a> <a href="/PhilippeLaban/">Philippe Laban</a> <a href="/ClaireGardent/">Claire Gardent</a>.

Stay tuned :)

arxiv.org/pdf/2310.03878

thumb_up_off_alt94

chat_bubble_outline4

repeat11

shareShare

Yang Chen

@ychennlp

a year ago

Do you know GPT-4o is super good at image geolocate🌎? I've been taking random photos and testing it. The model just keep giving me surprise. What does this mean to user privacy when they post photos on social media? Check out our new paper and benchmark GPTGeoChat!👇

thumb_up_off_alt34

chat_bubble_outline0

repeat7

shareShare

Alan Ritter

@alan_ritter

a year ago

Check out our recent work that aims to protect privacy by helping people make more informed decisions about what they share online. Drop by our poster tomorrow morning at 11am if you are at ACL in Bankok. ACL 2025 arxiv.org/abs/2311.09538

thumb_up_off_alt49

chat_bubble_outline1

repeat9

shareShare

Wei Xu

@cocoweixu

a year ago

Excited to be in Thailand ☀️😎 🌺🥥 to attend #ACL2024! If you run into my PhD students from Georgia Tech NLP group - Yao Dou Yao Dou, Tarek Naous Tarek Naous, and Jonathan Zheng Jonathan Zheng, please say hi! 👋 Please also say hi to my other collaborators!

Excited to be in Thailand ☀️😎 🌺🥥 to attend #ACL2024!

If you run into my PhD students from Georgia Tech NLP group - Yao Dou <a href="/Yaooo01/">Yao Dou</a>, Tarek Naous <a href="/tareknaous/">Tarek Naous</a>, and Jonathan Zheng <a href="/JonathanQZheng/">Jonathan Zheng</a>, please say hi! 👋

Please also say hi to my other collaborators!

thumb_up_off_alt58

chat_bubble_outline0

repeat8

shareShare

Mu Cai

@mucai7

a year ago

1/N) Are current large multimodal models like #GPT4o really good at video understanding? 🚀 We are thrilled to introduce TemporalBench to examine temporal dynamics understanding for LMMs! Our TemporalBench reveals even the SOTA LMM #GPT4o achieves only 38.5, far from

thumb_up_off_alt59

chat_bubble_outline1

repeat15

shareShare

Mu Cai

@mucai7

a year ago

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 🤗 Dataset: huggingface.co/datasets/micro… 📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by Chunyuan Li Yuanhan (John) Zhang ) 📗 Our

thumb_up_off_alt54

chat_bubble_outline4

repeat13

shareShare

Tarek Naous

@tareknaous

10 months ago

What causes entity-related cultural biases in LMs? Is it just pre-training data? Our latest paper shows how varying linguistic phenomena exhibited by entities (such as word sense in Arabic) impact the cross-cultural performance of LMs. arxiv.org/abs/2501.04662

thumb_up_off_alt46

chat_bubble_outline2

repeat13

shareShare

Yang Chen

@ychennlp

10 months ago

excited to introduce AceMath, a series of math LLMs and reward models 🤗ckpt and training data are on huggingface: huggingface.co/collections/nv… paper: arxiv.org/abs/2412.15084 accelerating

thumb_up_off_alt87

chat_bubble_outline1

repeat22

shareShare

Ethan Mendes

@ethanmendes3

8 months ago

🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language? Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀 🧵

thumb_up_off_alt24

chat_bubble_outline1

repeat9

shareShare

Jonathan Zheng

@jonathanqzheng

8 months ago

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

thumb_up_off_alt18

chat_bubble_outline1

repeat9

shareShare

Kai Zhang

@drogokhal4

7 months ago

🚀Big WebDreamer update! We train 💭Dreamer-7B, a small but strong world model for real-world web planning. 💥Beats Qwen2-72B ⚖️Matches #GPT-4o Trained on 3M synthetic examples — and yes, all data + models are open-sourced.

thumb_up_off_alt81

chat_bubble_outline1

repeat25

shareShare

Agam A. Shah

@shahagam4

7 months ago

Thrilled to share our new preprint: "Beyond the Reported Cutoff: Where LLMs Fall Short on Financial Knowledge" We evaluated 197,011 revenue questions across 17,621 U.S. companies (1980–2022) using 6 top LLMs. Key insights 🧵

thumb_up_off_alt14

chat_bubble_outline1

repeat5

shareShare

Yang Chen

@ychennlp

7 months ago

Had a lot of fun to scale up RL to improve math reasoning! Excited to introduce AceMath-RL-Nemotron-7B with a scalable training recipe 📑Full blog: research.nvidia.com/labs/adlr/acem… 🔗Model: huggingface.co/nvidia/AceMath…

thumb_up_off_alt25

chat_bubble_outline0

repeat7

shareShare

Wei Xu

@cocoweixu

6 months ago

I am giving a keynote at PrivateNLP Workshop (sites.google.com/view/privatenl…) at #NAACL2025 (Sunday 9am CT). * GPT4-v is a performant geolocator, predicting exact GPS coordinates of image > any SOTA * LLMs can estimate privacy risk based on probabilistic reasoning > chain-of-thoughts

thumb_up_off_alt79

chat_bubble_outline1

repeat8

shareShare

Yang Chen

@ychennlp

6 months ago

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier

thumb_up_off_alt48

chat_bubble_outline0

repeat9

shareShare

Yang Chen

@ychennlp

5 months ago

Does RL incentive reasoning capability over the starting SFT model? We show an interesting result with our recent published AceReason-Nemotron-7B model, which was trained with RL pass@K from 1 to 1024 consistently +10% on LiveCodeBench v6 perhaps scaling RL is the key

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare