Jacqueline He (@jcqln_h) 's Twitter Profile
Jacqueline He

@jcqln_h

cs phd @uwnlp, prev. bse cs @princeton

ID: 992257972955791360

linkhttp://jacqueline-he.github.io calendar_today04-05-2018 04:21:50

196 Tweet

149 Takipçi

61 Takip Edilen

Howard Yen (@howardyen1) 's Twitter Profile Photo

Introducing HELMET, a long-context benchmark that supports >=128K length, covering 7 diverse applications. We evaluated 51 long-context models and found HELMET provide more reliable signals for model development github.com/princeton-nlp/… A 🧵 on why you should use HELMET⛑️

Introducing HELMET, a long-context benchmark that supports >=128K length, covering 7 diverse applications.

We evaluated 51 long-context models and found HELMET provide more reliable signals for model development

github.com/princeton-nlp/…

A 🧵 on why you should use HELMET⛑️
Jacqueline He (@jcqln_h) 's Twitter Profile Photo

Check out our OpenScholar project!! Huge congrats to Akari Asai for leading the project — working with her has been a wonderful experience!! 🌟

Akari Asai (@akariasai) 's Twitter Profile Photo

🚨 I’m on the job market this year! 🚨 I’m completing my Allen School Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

🚨 I’m on the job market this year! 🚨
I’m completing my <a href="/uwcse/">Allen School</a> Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵
Hila Gonen (@hila_gonen) 's Twitter Profile Photo

Extremely excited to share that I will be joining UBC Computer Science as an Assistant Professor this summer! I will be recruiting students this coming cycle!

Ai2 (@allen_ai) 's Twitter Profile Photo

Can AI really help with literature reviews? 🧐 Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections

Can AI really help with literature reviews? 🧐

Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections
Hamish Ivison (@hamishivi) 's Twitter Profile Photo

We trained a diffusion LM! 🔁 Adapted from Mistral v0.1/v0.3. 📊 Beats AR models in GSM8k when we finetune on math data. 📈 Performance improves by using more test-time compute (reward guidance or more diffusion steps). Check out Jake Tae's thread for more details!

We trained a diffusion LM!

🔁 Adapted from Mistral v0.1/v0.3.
📊 Beats AR models in GSM8k when we finetune on math data.
📈 Performance improves by using more test-time compute (reward guidance or more diffusion steps).

Check out <a href="/jaesungtae/">Jake Tae</a>'s  thread for more details!
Stella Li (@stellalisy) 's Twitter Profile Photo

Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with Jimin Mun) 👉🏻🧵

Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with <a href="/jiminmun_/">Jimin Mun</a>)
👉🏻🧵
Hamish Ivison (@hamishivi) 's Twitter Profile Photo

How well do data-selection methods work for instruction-tuning at scale? Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best! More below ⬇️ (1/8)

How well do data-selection methods work for instruction-tuning at scale?

Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!

More below ⬇️ (1/8)
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

Ilia Shumailov🦔 (@iliaishacked) 's Twitter Profile Photo

Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)

Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)