JIE GAO
@jerrygaodextrys
Researcher in NLP/text analysis, semantic/content technology, misinformation/disinformation; retweets are bookmarks for myself; Husband, father; reasonable cook
ID: 38875070
https://jerrygaolondon.github.io/ 09-05-2009 15:54:30
916 Tweet
212 Takipçi
977 Takip Edilen
Relabeling datasets for Information Retrieval improves NDCG@10 of both embedding models & cross-encoder rerankers. This was already the prevalent belief, but now it's been confirmed. Great job Nandan Thakur, Crystina Zhang, Xueguang Ma & Jimmy Lin
Towards Better Instruction Following Retrieval Models Yuchen Zhuang et al. introduce a large-scale training corpus with over 38,000 instruction-query-passage triplets for enhancing retrieval models in instruction-following IR 📝arxiv.org/abs/2505.21439 👨🏽💻huggingface.co/datasets/InF-I…
You know all those arguments that LLMs think like humans? Turns out it's not true. 🧠 In our paper "From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning" we test it by checking if LLMs form concepts the same way humans do Yann LeCun Chen Shani Dan Jurafsky
Really excited about our recent large collaboration work on NLP for Social Good. The work stems from our discussions at the NLP for Positive Impact Workshop (EMNLP 2024) Workshop at #EMNLP2024 EMNLP 2025. Thanks to all our awesome collaborators, workshop attendees and all supporters!
Excited to announce MIRIAD — a large-scale dataset of 5,821,948 medical question-answer pairs, each rephrased from passages in the medical literature. Great collab with Qinyue Zheng, Salman Abdullah, Sam Rawal, MD, Cyril Zakka, MD, Sophie Ostmeier, Maximilian Purk, Eduardo Reis, Eric Topol &