Zhepeng Cen (@zhepengcen) 's Twitter Profile
Zhepeng Cen

@zhepengcen

PhD student @ CMU

ID: 906700014864363521

calendar_today10-09-2017 02:05:04

4 Tweet

24 Takipรงi

23 Takip Edilen

Ke Yang (@empathyang) 's Twitter Profile Photo

๐Ÿ™Œ Happy New Year everyone! ๐Ÿค– New preprint: TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment ๐Ÿค– We train and evaluate tiny language models (LMs) using a novel text dataset with systematically simplified vocabularies and

๐Ÿ™Œ Happy New Year everyone!
๐Ÿค– New preprint: TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment
๐Ÿค– We train and evaluate tiny language models (LMs) using a novel text dataset with systematically simplified vocabularies and
Zhepeng Cen (@zhepengcen) 's Twitter Profile Photo

๐Ÿš€ Introducing BRIDGE โ€” a task-agnostic data augmentation strategy to prepare LLMs for RL! ๐Ÿค– Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) ๐Ÿ” Rollout Accuracy 2) ๐Ÿ”— Data Co-Influence. ๐Ÿ’ก BRIDGE injects both exploration & exploitation

๐Ÿš€ Introducing BRIDGE โ€” a task-agnostic data augmentation strategy to prepare LLMs for RL!

๐Ÿค– Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) ๐Ÿ” Rollout Accuracy 2) ๐Ÿ”— Data Co-Influence. ๐Ÿ’ก BRIDGE injects both exploration & exploitation
Leo Liu (@zeyuliu10) 's Twitter Profile Photo

LLMs trained to memorize new facts canโ€™t use those facts well.๐Ÿค” We apply a hypernetwork to โœ๏ธeditโœ๏ธ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!๐Ÿ’ก Our approach, PropMEND, extends MEND with a new objective for propagation.

LLMs trained to memorize new facts canโ€™t use those facts well.๐Ÿค”

We apply a hypernetwork to โœ๏ธeditโœ๏ธ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!๐Ÿ’ก

Our approach, PropMEND, extends MEND with a new objective for propagation.
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

๐Ÿšจ Introducing LoCoBench: a comprehensive benchmark for evaluating long-context LLMs in complex software development ๐Ÿ“„ Paper: bit.ly/4ponX3P ๐Ÿ”— GitHub: bit.ly/4pvIfbZ โœจ Key Features: ๐Ÿ“Š 8,000 evaluation scenarios across 10 programming languages ๐Ÿ” Context

๐Ÿšจ Introducing LoCoBench: a comprehensive benchmark for evaluating long-context LLMs in complex software development

๐Ÿ“„ Paper: bit.ly/4ponX3P
๐Ÿ”— GitHub: bit.ly/4pvIfbZ

โœจ Key Features:
๐Ÿ“Š 8,000 evaluation scenarios across 10 programming languages
๐Ÿ” Context
Weiran Yao (@iscreamnearby) 's Twitter Profile Photo

Today my team at Salesforce AI Research drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. โšก๏ธ Faster inference, 1.7B rivaling 7B. ๐Ÿ“Š 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus ๐Ÿค—HF: huggingface.co/Salesforce/CoDโ€ฆ Any questions, lmk!

Today my team at <a href="/SFResearch/">Salesforce AI Research</a> drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel.

โšก๏ธ Faster inference, 1.7B rivaling 7B.
๐Ÿ“Š 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus

๐Ÿค—HF: huggingface.co/Salesforce/CoDโ€ฆ

Any questions, lmk!
Zhepeng Cen (@zhepengcen) 's Twitter Profile Photo

๐Ÿš€ Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data โ€” scaling RL to pretraining levels! All codes and

๐Ÿš€ Scaling RL to Pretraining Levels with Webscale-RL

RL for LLMs has been bottlenecked by tiny datasets (&lt;10B tokens) vs pretraining (&gt;1T).
Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data โ€” scaling RL to pretraining levels!

All codes and
Weiran Yao (@iscreamnearby) 's Twitter Profile Photo

Today I finally get to share something our team has been quietly grinding on for months โ€“ we've created an ๐—ผ๐—ฝ๐—ฒ๐—ป ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐—ฑ ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ Cursor ๐—•๐—ฒ๐—ป๐—ฐ๐—ต Cursor . If youโ€™ve been following Cursorโ€™s Composer launch and their internal "Cursor Bench" for testing

Today I finally get to share something our team has been quietly grinding on for months โ€“ we've created an ๐—ผ๐—ฝ๐—ฒ๐—ป ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐—ฑ ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ Cursor ๐—•๐—ฒ๐—ป๐—ฐ๐—ต <a href="/cursor_ai/">Cursor</a> . 

If youโ€™ve been following Cursorโ€™s Composer launch and their internal "Cursor Bench" for testing