Prasann Singhal (@prasann_singhal) Twitter Tweets • TwiCopy

Ryo Kamoi

a year ago

We will present our survey on self-correction of LLMs (TACL) at #EMNLP2024 in person! Oral: Nov 12 (Tue) 11:00- (Language Modeling 1) When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs arxiv.org/abs/2406.01297 x.com/RyoKamoi/statu…

thumb_up_off_alt58

chat_bubble_outline1

repeat11

shareShare

Greg Durrett

@gregd_nlp

a year ago

I won't be at #EMNLP2024, but my students & collaborators are presenting: 🔍 Detecting factual errors from LLMs Liyan Tang 🛠️ Detect, critique, & refine pipeline Manya Wadhwa Lucy Zhao 🏭 Synthetic data generation Abhishek Divekar 📄 Fact-checking @anxruddy at FEVER. Links🧵

I won't be at #EMNLP2024, but my students & collaborators are presenting:
🔍 Detecting factual errors from LLMs <a href="/LiyanTang4/">Liyan Tang</a>
🛠️ Detect, critique, & refine pipeline <a href="/ManyaWadhwa1/">Manya Wadhwa</a> <a href="/lucy_xyzhao/">Lucy Zhao</a>
🏭 Synthetic data generation <a href="/adivekar_/">Abhishek Divekar</a>
📄 Fact-checking @anxruddy at FEVER.
Links🧵

thumb_up_off_alt60

chat_bubble_outline2

repeat16

shareShare

Manya Wadhwa

@manyawadhwa1

a year ago

I'll be presenting this work at #EMNLP2024 🌴on Tuesday, 4-5:30pm, Poster Session C in Jasmine Hall ! Stop by or reach out if you are interested in tools for verification, making explanations useful or evaluation in general! Updated 📜 arxiv.org/abs/2407.02397 Lucy Zhao

thumb_up_off_alt50

chat_bubble_outline1

repeat11

shareShare

Xi Ye

@xiye_nlp

a year ago

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

thumb_up_off_alt526

chat_bubble_outline16

repeat158

shareShare

Zayne Sprague

@zaynesprague

10 months ago

Interesting perspective, thanks for sharing! As one of the authors of the “CoT mainly helps on math/logic paper”, I agree with a lot of this, especially the connection to generator/validator gaps. One of our aims going into this project was to find datasets beyond math/logic

thumb_up_off_alt28

chat_bubble_outline2

repeat9

shareShare

Jessy Li

@jessyjli

8 months ago

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

thumb_up_off_alt71

chat_bubble_outline1

repeat23

shareShare

Jacob Springer

@jacspringer

7 months ago

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

thumb_up_off_alt790

chat_bubble_outline16

repeat173

shareShare

Tanishq Kumar

@tanishqkumar07

6 months ago

trained a nanoGPT? feeling behind before o4-mini? 🚨🚨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. 🚨🚨 it contains thousands of lines of from-scratch, annotated pytorch implementing advanced

thumb_up_off_alt318

chat_bubble_outline6

repeat48

shareShare

Sriram Padmanabhan

@srirampad05

6 months ago

Are LMs sensitive to suspicious coincidences? Our paper finds that, when given access to knowledge of the hypothesis space, LMs can show sensitivity to such coincidences, displaying parallels with human inductive reasoning. w/Kanishka Misra 🌊, Kyle Mahowald, Eunsol Choi

thumb_up_off_alt26

chat_bubble_outline1

repeat6

shareShare

Greg Durrett

@gregd_nlp

6 months ago

Check out Ramya et al.'s work on understanding discourse similarities in LLM-generated text! We see this as an important step in quantifying the "sameyness" of LLM text, which we think will be a step towards fixing it!

thumb_up_off_alt24

chat_bubble_outline0

repeat2

shareShare

Manya Wadhwa

@manyawadhwa1

6 months ago

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

thumb_up_off_alt116

chat_bubble_outline4

repeat33

shareShare

Greg Durrett

@gregd_nlp

6 months ago

Check out Manya's work on evaluation for open-ended tasks! The criteria from EvalAgent can be plugged into LLM-as-a-judge or used for refinement. Great tool with a ton of potential, and there's LOTS to do here for making LLMs better at writing!

thumb_up_off_alt52

chat_bubble_outline1

repeat3

shareShare

Anirudh Khatry

@anirudhkhatry

6 months ago

🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]

thumb_up_off_alt66

chat_bubble_outline2

repeat19

shareShare

Greg Durrett

@gregd_nlp

6 months ago

New work led by Liyan Tang with a strong new model for chart understanding! Check out the blog post, model, and playground! Very fun to play around with Bespoke-MiniChart-7B and see what a 7B VLM can do!

thumb_up_off_alt31

chat_bubble_outline1

repeat8

shareShare

Greg Durrett

@gregd_nlp

6 months ago

Check out Anirudh's work on a new benchmark for C-to-Rust transpilation! 100 realistic-scale C projects, plus target Rust interfaces + Rust tests that let us validate the transpiled code beyond what prior benchmarks allow.

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Mahesh Sathiamoorthy

@madiator

6 months ago

Happy to announce Bespoke-Minichart-7B! This was a tough cookie to crack, and involved a lot of data curation and modeling work, but overall very happy with the results! Congrats to the team and especially to Liyan Tang for running so many experiments that helped us understand

thumb_up_off_alt37

chat_bubble_outline4

repeat7

shareShare

thom lake

@thomlake

6 months ago

Interested in how alignment changes the response distribution defined by LLMs? Come check out my poster at 2 PM at #NAACL2025 x.com/thomlake/statu…

thumb_up_off_alt23

chat_bubble_outline0

repeat6

shareShare

Liyan Tang

@liyantang4

5 months ago

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

thumb_up_off_alt70

chat_bubble_outline2

repeat26

shareShare

Gaurav Ghosal

@gaurav_ghosal

3 months ago

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

thumb_up_off_alt57

chat_bubble_outline1

repeat23

shareShare

Greg Durrett

@gregd_nlp

3 months ago

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please

thumb_up_off_alt755

chat_bubble_outline91

repeat45

shareShare