Tianjun Zhang (@tianjun_zhang) Twitter Tweets • TwiCopy

Aran Komatsuzaki

a year ago

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. repo: github.com/YangLing0818/b… abs: arxiv.org/abs/2406.04271

thumb_up_off_alt165

chat_bubble_outline3

repeat32

shareShare

Tianjun Zhang

@tianjun_zhang

a year ago

🤔why LLMs can only follow 1 thought template (e.g., CoT)? In our paper, LLMs can select their own thought process flexibly! Big improvement on agentic tasks! 🎉

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

elvis

@omarsar0

a year ago

Thought-Augmented Reasoning with LLMs Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from

thumb_up_off_alt444

chat_bubble_outline13

repeat100

shareShare

Rohan Paul

@rohanpaul_ai

a year ago

This paper claims that Llama3-8B+BoT (Buffer of Thoughts) has the potential to surpass Llama3-70B model. 🤯 'Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models' - Propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity

thumb_up_off_alt624

chat_bubble_outline10

repeat93

shareShare

Aviral Kumar

@aviral_kumar2

a year ago

Two new papers on self-improvement: paper 1 today ⬇️ In RISE, we build on online imitation to teach LLMs *how* to improve their own responses *sequentially*. w/ Llama2/3/Mistral, this gives solid +10-20% in 5 turns, outperforms parallel sampling! cohenqu.github.io/rise.github.io/ 🧵⬇️

thumb_up_off_alt127

chat_bubble_outline1

repeat28

shareShare

Tianjun Zhang

@tianjun_zhang

a year ago

Check out the amazing BFCL V2!

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Tianjun Zhang

@tianjun_zhang

a year ago

Check the new video arena! Pick your favorite video🚀

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Tianjun Zhang

@tianjun_zhang

a year ago

I will be at #NeurIPS2024 this week! Happy to chat about large scale RL for reasoning and agents!

thumb_up_off_alt18

chat_bubble_outline0

repeat0

shareShare

Tianjun Zhang

@tianjun_zhang

10 months ago

Congrats to OpenAI on the impressive performance of o1 model! Seems o1 already achieves 76% on LoveCodeBench, how should we improve it to make it harder🤔🤔

thumb_up_off_alt14

chat_bubble_outline2

repeat0

shareShare

Tianjun Zhang

@tianjun_zhang

9 months ago

Magic of RL! You don’t need super large models to develop such behavior! Congrats Jiayi Pan!

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Agentica Project

@agentica_

9 months ago

✨RL magic is in the air! Introducing DeepScaleR-1.5B-Preview—a fully open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning. 📜Blog: pretty-radio-b75.notion.site/DeepScaleR-Sur… 💻Github: github.com/agentica-proje…

thumb_up_off_alt152

chat_bubble_outline16

repeat51

shareShare

Brandon Trabucco @ ICLR

@brandontrabucco

9 months ago

With the success of LLM agents like OpenAI Operator, we are entering a new scaling era, but how do we train these agent models? We present InSTA, the largest training environment for LLM agents, containing live web navigation tasks for 150k diverse websites in multiple

thumb_up_off_alt158

chat_bubble_outline9

repeat30

shareShare

Simon Guo 🦝

@simonguozirui

8 months ago

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

thumb_up_off_alt305

chat_bubble_outline9

repeat68

shareShare

Wayne Chi

@iamwaynechi

8 months ago

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵

thumb_up_off_alt161

chat_bubble_outline2

repeat34

shareShare

Yuxiao Qu

@quyuxiao

8 months ago

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

thumb_up_off_alt309

chat_bubble_outline6

repeat62

shareShare

Tianjun Zhang

@tianjun_zhang

7 months ago

Proud to share what we have built! Tops the lmarena.ai leaderboard with only 17B parameters. Huge wing for the open source! Enjoy 😉

thumb_up_off_alt28

chat_bubble_outline1

repeat0

shareShare

Agentica Project

@agentica_

7 months ago

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

thumb_up_off_alt886

chat_bubble_outline23

repeat224

shareShare