Christina Baek (@_christinabaek) 's Twitter Profile
Christina Baek

@_christinabaek

PhD student @mldcmu | Past: intern @GoogleAI

ID: 1409604978004594688

linkhttp://kebaek.github.io calendar_today28-06-2021 20:11:47

53 Tweet

978 Followers

309 Following

Qing Qu (@qu_1006) 's Twitter Profile Photo

Our recent work won the best paper at NeurIPS workshop on diffusion model 2023. We recently made significant revisions to the entire article and re-uploaded a new version to Arxiv arxiv.org/abs/2310.05264, adding many new experiments and insights.

Our recent work won the best paper at NeurIPS workshop on diffusion model 2023. We recently made significant revisions to the entire article and re-uploaded a new version to Arxiv  arxiv.org/abs/2310.05264, adding many new experiments and insights.
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/Sachin Goyal Zachary Lipton Aditi Raghunathan Zico Kolter 📝:arxiv.org/abs/2404.07177

1/ 🥁Scaling Laws for Data Filtering 🥁

TLDR: Data Curation *cannot* be compute agnostic!
In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data.

w/<a href="/goyalsachin007/">Sachin Goyal</a> <a href="/zacharylipton/">Zachary Lipton</a> <a href="/AdtRaghunathan/">Aditi Raghunathan</a> <a href="/zicokolter/">Zico Kolter</a>
📝:arxiv.org/abs/2404.07177
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/What does it mean for an LLM to “memorize” a doc? Exactly regurgitating a NYT article? Of course. Just training on NYT?Harder to say We take big strides in this discourse w/*Adversarial Compression* w/Avi Schwarzschild Zhili Feng Zachary Lipton Zico Kolter 🌐:locuslab.github.io/acr-memorizati…🧵

1/What does it mean for an LLM to “memorize” a doc? Exactly regurgitating a NYT article? Of course. Just training on NYT?Harder to say

We take big strides in this discourse w/*Adversarial Compression*
w/<a href="/A_v_i__S/">Avi Schwarzschild</a> <a href="/zhilifeng/">Zhili Feng</a> <a href="/zacharylipton/">Zachary Lipton</a> <a href="/zicokolter/">Zico Kolter</a>
🌐:locuslab.github.io/acr-memorizati…🧵
Amrith Setlur (@setlur_amrith) 's Twitter Profile Photo

🚨 Interested in synthetic data and LLM reasoning? Our new work studies scaling laws for synthetic data and RL for math reasoning. TLDR: Step-level RL (per-step DPO in fig) on self-generated answers improves sample efficiency of synthetic data by 8x! arxiv.org/abs/2406.14532 1/🧵

🚨 Interested in synthetic data and LLM reasoning? Our new work studies scaling laws for synthetic data and RL for math reasoning.
TLDR: Step-level RL (per-step DPO in fig) on self-generated answers improves sample efficiency of synthetic data by 8x! arxiv.org/abs/2406.14532

1/🧵
Zico Kolter (@zicokolter) 's Twitter Profile Photo

I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my perspectives and expertise on AI safety and robustness to help guide the amazing work being done at OpenAI.

Kevin Li (@kevinyli_) 's Twitter Profile Photo

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK! We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK!

We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!