Meiru Zhang (@zhang_meiru) 's Twitter Profile
Meiru Zhang

@zhang_meiru

4th year PhD in @CambridgeLTL, Gates Cambridge Scholar 2021

ID: 1633786294273822721

calendar_today09-03-2023 11:06:28

29 Tweet

71 Takipçi

142 Takip Edilen

Han Zhou (@hanzhou032) 's Twitter Profile Photo

Which output is better? [A] or [B]? LLM🤖: B❌ [B] or [A]? LLM🤖: A✅ Thrilled to share our preprint in addressing preference biases in LLM judgments!🧑‍⚖️We introduce ZEPO, a 0-shot prompt optimizer that enhances your LLM evaluators via fairness⚖️ 📰Paper: arxiv.org/abs/2406.11370

Which output is better?
[A] or [B]? LLM🤖: B❌
[B] or [A]? LLM🤖: A✅

Thrilled to share our preprint in addressing preference biases in LLM judgments!🧑‍⚖️We introduce ZEPO, a 0-shot prompt optimizer that enhances your LLM evaluators via fairness⚖️

📰Paper: arxiv.org/abs/2406.11370
Tiancheng Hu (@tiancheng_hu) 's Twitter Profile Photo

Thrilled to share our new paper: "Can LLM be a Personalized Judge?" We investigate the reliability of LLMs in judging user preferences based on personas and propose improvements using verbal uncertainty estimation to enhance accuracy. 🎭👨‍⚖️ 📄 Paper: arxiv.org/abs/2406.11657

Jiahui Gao (@jiahuigao3) 's Twitter Profile Photo

Excited to share our latest work: "Jailbreaking as a Reward Misspecification Problem." We explore why safety-aligned LLMs remain vulnerable to adversarial attacks, identifying reward misspecification during the alignment process as a key factor. Find more details in our paper.

Markus Frohmann (@frohmannm) 's Twitter Profile Photo

Introducing 🪓Segment any Text! 🪓 A new state-of-the-art sentence segmentation tool! Compared to existing tools (and strong LLMs!), our models are far more: 1. efficient ⚡ 2. performant 🔝 3. robust 🚀 4. adaptable 🎯 5. multilingual 🗺

Introducing 🪓Segment any Text! 🪓

A new state-of-the-art sentence segmentation tool!
Compared to existing tools (and strong LLMs!), our models are far more:
1. efficient ⚡
2. performant 🔝
3. robust 🚀
4. adaptable 🎯
5. multilingual 🗺
zhongshen (@ruiss1) 's Twitter Profile Photo

Check out our new reasoning benchmark !!🚀 Is LLM really reasoning or just parroting🦜 ? Why not test if LLM can judge the correctness of different reasoning paths🧠? We cover diverse subjects and reasoning paradigms from logic, coding, maths and more 🔥 📄arxiv.org/abs/2406.13975

Check out our new reasoning benchmark !!🚀
Is LLM really reasoning or just parroting🦜 ? Why not test if LLM can judge the correctness of different reasoning paths🧠? We cover diverse subjects and reasoning paradigms from logic, coding, maths and more 🔥
📄arxiv.org/abs/2406.13975
Meiru Zhang (@zhang_meiru) 's Twitter Profile Photo

First, thanks to the organizers of the workshop. However, we are disappointed about the single brief review that dismissed the attention probing of LLM from the scope, while it is explicitly mentioned in the call for paper. Any response about scope if possible? Ece Takmaz

Jinyuan Fang (@jinyuanf) 's Twitter Profile Photo

TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation (w/ Zaiqiao Meng, Craig Macdonald) Takeaway: using reasoning chains (purely KG triples) built from docs beats using full docs for RAGs. #rag Paper: arxiv.org/pdf/2406.11460

TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation (w/ <a href="/mengzaiqiao/">Zaiqiao Meng</a>, <a href="/craig_macdonald/">Craig Macdonald</a>)

Takeaway: using reasoning chains (purely KG triples) built from docs beats using full docs for RAGs. #rag 

Paper: arxiv.org/pdf/2406.11460
Chenchen Ye (@chenchenye_ccye) 's Twitter Profile Photo

📢New LLM Agents Benchmark! Introducing 🌟MIRAI🌟: A groundbreaking benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning! 📜 Arxiv: arxiv.org/abs/2407.01231 🔗 Project page: mirai-llm.github.io 🧵1/N

Tom Huang (@tuturetom) 's Twitter Profile Photo

斯坦福爆火的 Prompt 编程框架 DSPy 的 TypeScript 实现来了!ax 实现了 DSPy 支持构建复杂 Agentic Workflow,目前已开源,697 Star 🌟 - 模块化编程:提供标准模块帮助你写 Prompt - 自动编译器:自动为特定 LLM 微调 Prompt 与参数 - 类似 HippoRAG 支持解决复杂多跳检索 github.com/ax-llm/ax

Zaiqiao Meng (@mengzaiqiao) 's Twitter Profile Photo

Glad to share two papers accepted to EMNLP 2025 #EMNLP2024 ! One work on improving RAG using reasoning KG chains. w. Jinyuan Fang Craig Macdonald Another is on reducing position bias of LLMs via instruction. w. Meiru Zhang Nigel Collier

Glad to share two papers accepted to <a href="/emnlpmeeting/">EMNLP 2025</a> #EMNLP2024 !
One work on improving RAG using reasoning KG chains. w. <a href="/JinyuanF/">Jinyuan Fang</a> <a href="/craig_macdonald/">Craig Macdonald</a> 
Another is on reducing position bias of LLMs via instruction. w. <a href="/zhang_meiru/">Meiru Zhang</a> <a href="/nigelhcollier/">Nigel Collier</a>
Jinyuan Fang (@jinyuanf) 's Twitter Profile Photo

🎉Glad to share that our paper "TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation"(w/ Zaiqiao Meng and Craig Macdonald ) has been accepted at #EMNLP2024 EMNLP 2025 as a findings paper!

Yingjia Alisa Wan @ICLR2025 (@yingjia_wan) 's Twitter Profile Photo

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024! AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers. 🔗 arxiv.org/abs/2405.16802 🧵1/5

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024!

AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers.
🔗 arxiv.org/abs/2405.16802

🧵1/5
KEIR (@keirworkshop) 's Twitter Profile Photo

📢 We are excited to announce the Call for Papers for the 2nd KEIR at ECIR2025 #ECIR2025! 📅 Submission Deadline: January 12, 2025 🔗 Details: keir-ecir2025.github.io Submit your work and join use to explore knowledge-enhanced IR systems! 🚀

Yinhong Liu (@yinhongliu2) 's Twitter Profile Photo

🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: arxiv.org/abs/2410.02205

🚨 New Paper Alert! 🚨
When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔
Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚

🔗 Read more: arxiv.org/abs/2410.02205
KEIR (@keirworkshop) 's Twitter Profile Photo

📢 Deadline Extension Alert! The deadline for the KEIR workshop has been extended to January 31, 2025 (AOE). Don't miss out this chance to finalise and submit your innovative work . 📜 For detailed submission guidelines, please visit: keir-ecir2025.github.io

Chengzu Li (@li_chengzu) 's Twitter Profile Photo

Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

Forget just thinking in words.

🚀 New Era of Multimodal Reasoning🚨
🔍 Imagine While Reasoning in Space with MVoT

Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.
Milan Gritta (@milangritta) 's Twitter Profile Photo

1) Delighted to introduce our latest work🥳 (under review) 🙃 🔗arxiv.org/pdf/2502.15572 🔗 We propose **DReSD: Dense Retrieval for Speculative Decoding** with Huiyin Xue and Gerasimos Lampouras

1) Delighted to introduce our latest work🥳 (under review) 🙃 🔗arxiv.org/pdf/2502.15572 🔗

We propose **DReSD: Dense Retrieval for Speculative Decoding** with <a href="/HuiyinXue/">Huiyin Xue</a> and <a href="/glampouras_NLP/">Gerasimos Lampouras</a>