Zhepeng Cen (@zhepengcen) Twitter Tweets • TwiCopy

Ke Yang

a year ago

🙌 Happy New Year everyone! 🤖 New preprint: TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment 🤖 We train and evaluate tiny language models (LMs) using a novel text dataset with systematically simplified vocabularies and

thumb_up_off_alt37

chat_bubble_outline2

repeat21

shareShare

Zhepeng Cen

@zhepengcen

6 months ago

🚀 Introducing BRIDGE — a task-agnostic data augmentation strategy to prepare LLMs for RL! 🤖 Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) 🔍 Rollout Accuracy 2) 🔗 Data Co-Influence. 💡 BRIDGE injects both exploration & exploitation

thumb_up_off_alt59

chat_bubble_outline1

repeat14

shareShare

Leo Liu

@zeyuliu10

6 months ago

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

thumb_up_off_alt109

chat_bubble_outline3

repeat40

shareShare

Salesforce AI Research

@sfresearch

3 months ago

🚨 Introducing LoCoBench: a comprehensive benchmark for evaluating long-context LLMs in complex software development 📄 Paper: bit.ly/4ponX3P 🔗 GitHub: bit.ly/4pvIfbZ ✨ Key Features: 📊 8,000 evaluation scenarios across 10 programming languages 🔍 Context

thumb_up_off_alt19

chat_bubble_outline0

repeat13

shareShare

Weiran Yao

@iscreamnearby

2 months ago

Today my team at Salesforce AI Research drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. ⚡️ Faster inference, 1.7B rivaling 7B. 📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus 🤗HF: huggingface.co/Salesforce/CoD… Any questions, lmk!

Today my team at <a href="/SFResearch/">Salesforce AI Research</a> drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel.

⚡️ Faster inference, 1.7B rivaling 7B.
📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus

🤗HF: huggingface.co/Salesforce/CoD…

Any questions, lmk!

thumb_up_off_alt311

chat_bubble_outline11

repeat51

shareShare

Zhepeng Cen

@zhepengcen

2 months ago

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and

thumb_up_off_alt236

chat_bubble_outline12

repeat34

shareShare

Weiran Yao

@iscreamnearby

15 days ago

Today I finally get to share something our team has been quietly grinding on for months – we've created an 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝗼𝗳 Cursor 𝗕𝗲𝗻𝗰𝗵 Cursor . If you’ve been following Cursor’s Composer launch and their internal "Cursor Bench" for testing

Today I finally get to share something our team has been quietly grinding on for months – we've created an 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝗼𝗳 Cursor 𝗕𝗲𝗻𝗰𝗵 <a href="/cursor_ai/">Cursor</a> .

If you’ve been following Cursor’s Composer launch and their internal "Cursor Bench" for testing

thumb_up_off_alt7

chat_bubble_outline2

repeat6

shareShare