Meiru Zhang (@zhang_meiru) Twitter Tweets • TwiCopy

Han Zhou

a year ago

Which output is better? [A] or [B]? LLM🤖: B❌ [B] or [A]? LLM🤖: A✅ Thrilled to share our preprint in addressing preference biases in LLM judgments!🧑‍⚖️We introduce ZEPO, a 0-shot prompt optimizer that enhances your LLM evaluators via fairness⚖️ 📰Paper: arxiv.org/abs/2406.11370

thumb_up_off_alt97

chat_bubble_outline3

repeat22

shareShare

Tiancheng Hu

@tiancheng_hu

a year ago

Thrilled to share our new paper: "Can LLM be a Personalized Judge?" We investigate the reliability of LLMs in judging user preferences based on personas and propose improvements using verbal uncertainty estimation to enhance accuracy. 🎭👨‍⚖️ 📄 Paper: arxiv.org/abs/2406.11657

thumb_up_off_alt105

chat_bubble_outline4

repeat22

shareShare

Jiahui Gao

@jiahuigao3

a year ago

Excited to share our latest work: "Jailbreaking as a Reward Misspecification Problem." We explore why safety-aligned LLMs remain vulnerable to adversarial attacks, identifying reward misspecification during the alignment process as a key factor. Find more details in our paper.

thumb_up_off_alt42

chat_bubble_outline0

repeat12

shareShare

Markus Frohmann

@frohmannm

a year ago

Introducing 🪓Segment any Text! 🪓 A new state-of-the-art sentence segmentation tool! Compared to existing tools (and strong LLMs!), our models are far more: 1. efficient ⚡ 2. performant 🔝 3. robust 🚀 4. adaptable 🎯 5. multilingual 🗺

thumb_up_off_alt180

chat_bubble_outline2

repeat26

shareShare

zhongshen

@ruiss1

a year ago

Check out our new reasoning benchmark !!🚀 Is LLM really reasoning or just parroting🦜 ? Why not test if LLM can judge the correctness of different reasoning paths🧠? We cover diverse subjects and reasoning paradigms from logic, coding, maths and more 🔥 📄arxiv.org/abs/2406.13975

thumb_up_off_alt20

chat_bubble_outline5

repeat8

shareShare

Meiru Zhang

@zhang_meiru

a year ago

First, thanks to the organizers of the workshop. However, we are disappointed about the single brief review that dismissed the attention probing of LLM from the scope, while it is explicitly mentioned in the call for paper. Any response about scope if possible? Ece Takmaz

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Jinyuan Fang

@jinyuanf

a year ago

TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation (w/ Zaiqiao Meng, Craig Macdonald) Takeaway: using reasoning chains (purely KG triples) built from docs beats using full docs for RAGs. #rag Paper: arxiv.org/pdf/2406.11460

TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation (w/ <a href="/mengzaiqiao/">Zaiqiao Meng</a>, <a href="/craig_macdonald/">Craig Macdonald</a>)

Takeaway: using reasoning chains (purely KG triples) built from docs beats using full docs for RAGs. #rag

Paper: arxiv.org/pdf/2406.11460

thumb_up_off_alt27

chat_bubble_outline1

repeat7

shareShare

Chenchen Ye

@chenchenye_ccye

a year ago

📢New LLM Agents Benchmark! Introducing 🌟MIRAI🌟: A groundbreaking benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning! 📜 Arxiv: arxiv.org/abs/2407.01231 🔗 Project page: mirai-llm.github.io 🧵1/N

thumb_up_off_alt302

chat_bubble_outline15

repeat70

shareShare

Yucheng Li

@liyucheng_2

a year ago

Now, you can process 1M context 10x faster, with even better accuracy, try MInference 1.0 right now!

thumb_up_off_alt16

chat_bubble_outline1

repeat3

shareShare

Tom Huang

@tuturetom

a year ago

斯坦福爆火的 Prompt 编程框架 DSPy 的 TypeScript 实现来了！ax 实现了 DSPy 支持构建复杂 Agentic Workflow，目前已开源，697 Star 🌟 - 模块化编程：提供标准模块帮助你写 Prompt - 自动编译器：自动为特定 LLM 微调 Prompt 与参数 - 类似 HippoRAG 支持解决复杂多跳检索 github.com/ax-llm/ax

thumb_up_off_alt339

chat_bubble_outline12

repeat85

shareShare

Zaiqiao Meng

@mengzaiqiao

a year ago

Glad to share two papers accepted to EMNLP 2025 #EMNLP2024 ! One work on improving RAG using reasoning KG chains. w. Jinyuan Fang Craig Macdonald Another is on reducing position bias of LLMs via instruction. w. Meiru Zhang Nigel Collier

Glad to share two papers accepted to <a href="/emnlpmeeting/">EMNLP 2025</a> #EMNLP2024 !
One work on improving RAG using reasoning KG chains. w. <a href="/JinyuanF/">Jinyuan Fang</a> <a href="/craig_macdonald/">Craig Macdonald</a>
Another is on reducing position bias of LLMs via instruction. w. <a href="/zhang_meiru/">Meiru Zhang</a> <a href="/nigelhcollier/">Nigel Collier</a>

thumb_up_off_alt43

chat_bubble_outline2

repeat8

shareShare

Jinyuan Fang

@jinyuanf

a year ago

🎉Glad to share that our paper "TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation"(w/ Zaiqiao Meng and Craig Macdonald ) has been accepted at #EMNLP2024 EMNLP 2025 as a findings paper!

thumb_up_off_alt26

chat_bubble_outline1

repeat6

shareShare

Caiqi Zhang

@caiqizh

a year ago

🔥Conformity in Large Language Models🔥 Our latest paper dives into how LLMs align their answers with incorrect majorities. We explore the fascinating interplay between LLM behavior and human psychology!🧠 Preprint: arxiv.org/abs/2410.12428 #AI #NLP #LLMs #Conformity

thumb_up_off_alt31

chat_bubble_outline2

repeat20

shareShare

Yingjia Alisa Wan @ICLR2025

@yingjia_wan

a year ago

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024! AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers. 🔗 arxiv.org/abs/2405.16802 🧵1/5

thumb_up_off_alt110

chat_bubble_outline7

repeat38

shareShare

KEIR

@keirworkshop

a year ago

📢 We are excited to announce the Call for Papers for the 2nd KEIR at ECIR2025 #ECIR2025! 📅 Submission Deadline: January 12, 2025 🔗 Details: keir-ecir2025.github.io Submit your work and join use to explore knowledge-enhanced IR systems! 🚀

thumb_up_off_alt20

chat_bubble_outline1

repeat9

shareShare

Zaiqiao Meng

@mengzaiqiao

a year ago

If you missed our tutorial, please you can watch recordings via this link: youtube.com/watch?v=Af-_EL…

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Yinhong Liu

@yinhongliu2

10 months ago

🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: arxiv.org/abs/2410.02205

thumb_up_off_alt250

chat_bubble_outline14

repeat70

shareShare

KEIR

@keirworkshop

10 months ago

📢 Deadline Extension Alert! The deadline for the KEIR workshop has been extended to January 31, 2025 (AOE). Don't miss out this chance to finalise and submit your innovative work . 📜 For detailed submission guidelines, please visit: keir-ecir2025.github.io

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Chengzu Li

@li_chengzu

10 months ago

Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

thumb_up_off_alt739

chat_bubble_outline14

repeat165

shareShare

Milan Gritta

@milangritta

8 months ago

1) Delighted to introduce our latest work🥳 (under review) 🙃 🔗arxiv.org/pdf/2502.15572 🔗 We propose **DReSD: Dense Retrieval for Speculative Decoding** with Huiyin Xue and Gerasimos Lampouras

1) Delighted to introduce our latest work🥳 (under review) 🙃 🔗arxiv.org/pdf/2502.15572 🔗

We propose **DReSD: Dense Retrieval for Speculative Decoding** with <a href="/HuiyinXue/">Huiyin Xue</a> and <a href="/glampouras_NLP/">Gerasimos Lampouras</a>

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare