Lucy Zhao (@lucy_xyzhao) 's Twitter Profile
Lucy Zhao

@lucy_xyzhao

NSF CSGrad4US Fellow | Garden leave from @jumptrading doing LLM things @UTCompSci

ID: 1269773816806473728

linkhttps://xinyuzhao.io/ calendar_today07-06-2020 23:31:05

11 Tweet

98 Followers

98 Following

Manya Wadhwa (@manyawadhwa1) 's Twitter Profile Photo

Refine LLM responses to improve factuality with our new three-stage process: 🔎Detect errors 🧑‍🏫Critique in language ✏️Refine with those critiques DCR improves factuality refinement across model scales: Llama 2, Llama 3, GPT-4. w/ Lucy Zhao Jessy Li Greg Durrett 🧵

Refine LLM responses to improve factuality with our new three-stage process:

🔎Detect errors
🧑‍🏫Critique in language
✏️Refine with those critiques

DCR improves factuality refinement across model scales: Llama 2, Llama 3, GPT-4.
w/ <a href="/lucy_xyzhao/">Lucy Zhao</a> <a href="/jessyjli/">Jessy Li</a> <a href="/gregd_nlp/">Greg Durrett</a> 🧵
Zayne Sprague (@zaynesprague) 's Twitter Profile Photo

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

To CoT or not to CoT?🤔

300+ experiments with 14 LLMs &amp; systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation
Greg Durrett (@gregd_nlp) 's Twitter Profile Photo

I won't be at #EMNLP2024, but my students & collaborators are presenting: 🔍 Detecting factual errors from LLMs Liyan Tang 🛠️ Detect, critique, & refine pipeline Manya Wadhwa Lucy Zhao 🏭 Synthetic data generation Abhishek Divekar 📄 Fact-checking @anxruddy at FEVER. Links🧵

I won't be at #EMNLP2024, but my students &amp; collaborators are presenting:
🔍 Detecting factual errors from LLMs <a href="/LiyanTang4/">Liyan Tang</a>
🛠️ Detect, critique, &amp; refine pipeline <a href="/ManyaWadhwa1/">Manya Wadhwa</a> <a href="/lucy_xyzhao/">Lucy Zhao</a> 
🏭 Synthetic data generation <a href="/adivekar_/">Abhishek Divekar</a> 
📄 Fact-checking @anxruddy at FEVER.
Links🧵
Manya Wadhwa (@manyawadhwa1) 's Twitter Profile Photo

I'll be presenting this work at #EMNLP2024 🌴on Tuesday, 4-5:30pm, Poster Session C in Jasmine Hall ! Stop by or reach out if you are interested in tools for verification, making explanations useful or evaluation in general! Updated 📜 arxiv.org/abs/2407.02397 Lucy Zhao

Liyan Tang (@liyantang4) 's Twitter Profile Photo

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!

✍🏻Entirely human-written questions by 13 CS researchers
👀Emphasis on visual reasoning – hard to be verbalized via text CoTs
📉Humans reach 93% but 63% from Gemini-2.5-Pro &amp; 38% from Qwen2.5-72B