Yutong (Kelly) He (@electronickale) 's Twitter Profile
Yutong (Kelly) He

@electronickale

PhD student @mldcmu, I’m so delusional that doing generative modeling is my job

ID: 1375541940867887104

linkhttps://kellyyutonghe.github.io calendar_today26-03-2021 20:15:39

86 Tweet

819 Followers

382 Following

Jian Ma (@jmuiuc) 's Twitter Profile Photo

A troubling incident unfolded at #NeurIPS2024, where a keynote speaker used a slide that perpetuated harmful stereotypes and racial biases against Chinese students and researchers. I wasn't attending the conference, but I watched the talk recording and followed this closely. 1/7

Dylan Sam (@dylanjsam) 's Twitter Profile Photo

To trust LLMs in deployment (e.g., agentic frameworks or for generating synthetic data), we should predict how well they will perform. Our paper shows that we can do this by simply asking black-box models multiple follow-up questions! w/ Marc Finzi and Zico Kolter 1/ 🧵

To trust LLMs in deployment (e.g., agentic frameworks or for generating synthetic data), we should predict how well they will perform. Our paper shows that we can do this by simply asking black-box models multiple follow-up questions! w/ <a href="/m_finzi/">Marc Finzi</a> and <a href="/zicokolter/">Zico Kolter</a>

1/ 🧵
Samuel Sokota (@ssokota) 's Twitter Profile Photo

Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker). We performed the largest-ever comparison of these algorithms. We find that they do not outperform generic policy gradient methods, such as PPO. 1/N

Model-free deep RL algorithms like NFSP, PSRO, ESCHER, &amp; R-NaD are tailor-made for games with hidden information (e.g. poker).

We performed the largest-ever comparison of these algorithms.

We find that they do not outperform generic policy gradient methods, such as PPO. 1/N
Dylan Sam (@dylanjsam) 's Twitter Profile Photo

Excited to share new work from my internship Google AI ! Curious as to how we should measure the similarity between examples in pretraining datasets? We study the role of similarity in pretraining 1.7B parameter language models on the Pile. arxiv: arxiv.org/abs/2502.02494 1/🧵

Excited to share new work from my internship <a href="/GoogleAI/">Google AI</a> !

Curious as to how we should measure the similarity between examples in pretraining datasets? We study the role of similarity in pretraining 1.7B parameter language models on the Pile.

arxiv: arxiv.org/abs/2502.02494

1/🧵
Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited.

Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot.

🧵 1/n
Yutong (Kelly) He (@electronickale) 's Twitter Profile Photo

Dear program chairs of all conferences, please don’t put a 5000 character limit on our rebuttal response, especially when the reviewers have more than ten 7500-character text boxes for them to write reviews, thank you so much

Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?

Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!

🧵 1/n
Ricky T. Q. Chen (@rickytqchen) 's Twitter Profile Photo

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.