Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile
Jack Jingyu Zhang @ NAACL🌵

@jackjingyuzhang

PhD student @jhuclsp | Student researcher @Microsoft | Previously undergrad @JohnsHopkins @JHUCompSci.

ID: 1560437924

linkhttps://jackz.io/ calendar_today01-07-2013 12:29:46

155 Tweet

430 Followers

646 Following

Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile Photo

Just arrived in Albuquerque for #NAACL2025! Excited to connect and chat about LLM safety, alignment, reasoning, RLVR, and beyond. Feel free to reach out or DM if you’d like to meet up.

Just arrived in Albuquerque for #NAACL2025! Excited to connect and chat about LLM safety, alignment, reasoning, RLVR, and beyond. Feel free to reach out or DM if you’d like to meet up.
Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile Photo

Excited to present two papers today and tomorrow at #NAACL2025! Look out for our oral sessions: TurkingBench: arxiv.org/abs/2403.11905 📅 4-5:30pm, Thur, May 1 📍 Ballroom A (R&E.4) Verifiable by Design: arxiv.org/abs/2404.03862 📅 9-10:30am, Fri, May 2 📍 Ballroom A (HC.1)

Excited to present two papers today and tomorrow at #NAACL2025! Look out for our oral sessions:

TurkingBench: arxiv.org/abs/2403.11905 
📅 4-5:30pm, Thur, May 1
📍 Ballroom A (R&E.4)

Verifiable by Design: arxiv.org/abs/2404.03862 
📅 9-10:30am, Fri, May 2
📍 Ballroom A (HC.1)
Tianjian Li (@tli104) 's Twitter Profile Photo

Excited to be presenting our paper on training language models under heavily imbalanced data tomorrow at #NAACL2025! If you want to chat about data curation for both pre- and post-training, feel free to reach out! 📝 arxiv.org/abs/2410.04579 📅 11-12:30am, Fri, May 2 📍 Hall 3

Yining Lu (@yining__lu) 's Twitter Profile Photo

Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today! 📅 11AM-12:30PM, Fri, May 2 📍 Hall 3 📝 arxiv.org/abs/2407.09007 🎥 youtube.com/watch?v=v1cHyC…

Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!

📅 11AM-12:30PM, Fri, May 2
📍 Hall 3
📝 arxiv.org/abs/2407.09007
🎥 youtube.com/watch?v=v1cHyC…
Dongwei Jiang (@dongwei__jiang) 's Twitter Profile Photo

Now accepted by #ACL2025! Thrilled to see our paper also referenced in Lilian Weng's latest blog post on reasoning in LLMs! Check it out: lilianweng.github.io/posts/2025-05-…

Daniel Khashabi 🕊️ (@danielkhashabi) 's Twitter Profile Photo

There have been various efforts on disentangling "task learning" vs "task recall" in LLMs. We've recently explored a fresh angle by borrowing from cryptography: with substitution ciphers, we transform a given task into an equivalent, but cryptic (no pun intended!!) forms.

There have been various efforts on disentangling "task learning" vs "task recall" in LLMs. We've recently explored a fresh angle by borrowing from cryptography: with substitution ciphers, we transform a given task into an equivalent, but cryptic (no pun intended!!) forms.
Anthony Peng (@realanthonypeng) 's Twitter Profile Photo

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training.

We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving
Daniel Khashabi 🕊️ (@danielkhashabi) 's Twitter Profile Photo

Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context. In our

Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context.

In our
Anthony Peng (@realanthonypeng) 's Twitter Profile Photo

🚨 Sharing our new #ACL2025NLP main paper! 🎥 Deploying video VLMs at scale? Inference compute is your bottleneck. We study how to optimally allocate inference FLOPs across LLM size, frame count, and visual tokens. 💡 Large-scale training sweeps (~100k A100 hrs) 📊 Parametric

🚨 Sharing our new #ACL2025NLP main paper!
🎥 Deploying video VLMs at scale? Inference compute is your bottleneck.

We study how to optimally allocate inference FLOPs across LLM size, frame count, and visual tokens.
💡 Large-scale training sweeps (~100k A100 hrs)
📊 Parametric