Tianjun Zhang (@tianjun_zhang) 's Twitter Profile
Tianjun Zhang

@tianjun_zhang

Project Lead of LiveCodeBench, RAFT and Gorilla LLM, PhD student @berkeley_ai

ID: 841759489502121984

linkhttp://tianjunz.github.io calendar_today14-03-2017 21:14:36

151 Tweet

1,1K Takipçi

964 Takip Edilen

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. repo: github.com/YangLing0818/b… abs: arxiv.org/abs/2406.04271

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One.

repo: github.com/YangLing0818/b…
abs: arxiv.org/abs/2406.04271
Tianjun Zhang (@tianjun_zhang) 's Twitter Profile Photo

🤔why LLMs can only follow 1 thought template (e.g., CoT)? In our paper, LLMs can select their own thought process flexibly! Big improvement on agentic tasks! 🎉

elvis (@omarsar0) 's Twitter Profile Photo

Thought-Augmented Reasoning with LLMs Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from

Thought-Augmented Reasoning with LLMs

Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. 

It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

This paper claims that Llama3-8B+BoT (Buffer of Thoughts) has the potential to surpass Llama3-70B model. 🤯 'Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models' - Propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity

This paper claims that Llama3-8B+BoT (Buffer of Thoughts) has the potential to surpass Llama3-70B model. 🤯

'Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models'

- Propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity
Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

Two new papers on self-improvement: paper 1 today ⬇️ In RISE, we build on online imitation to teach LLMs *how* to improve their own responses *sequentially*. w/ Llama2/3/Mistral, this gives solid +10-20% in 5 turns, outperforms parallel sampling! cohenqu.github.io/rise.github.io/ 🧵⬇️

Two new papers on self-improvement: paper 1 today ⬇️

In RISE, we build on online imitation to teach LLMs *how* to improve their own responses *sequentially*. 

w/ Llama2/3/Mistral, this gives solid +10-20% in 5 turns, outperforms parallel sampling! 

cohenqu.github.io/rise.github.io/ 🧵⬇️
Tianjun Zhang (@tianjun_zhang) 's Twitter Profile Photo

Congrats to OpenAI on the impressive performance of o1 model! Seems o1 already achieves 76% on LoveCodeBench, how should we improve it to make it harder🤔🤔

Agentica Project (@agentica_) 's Twitter Profile Photo

✨RL magic is in the air! Introducing DeepScaleR-1.5B-Preview—a fully open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning. 📜Blog: pretty-radio-b75.notion.site/DeepScaleR-Sur… 💻Github: github.com/agentica-proje…

✨RL magic is in the air! Introducing DeepScaleR-1.5B-Preview—a fully open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning.

📜Blog: pretty-radio-b75.notion.site/DeepScaleR-Sur…
💻Github: github.com/agentica-proje…
Brandon Trabucco @ ICLR (@brandontrabucco) 's Twitter Profile Photo

With the success of LLM agents like OpenAI Operator, we are entering a new scaling era, but how do we train these agent models? We present InSTA, the largest training environment for LLM agents, containing live web navigation tasks for 150k diverse websites in multiple

Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench!

Turns out KernelBench is quite challenging 🧠 —  frontier models outperform the PyTorch Eager baseline &lt;20% of the time.

More 🧵👇
Wayne Chi (@iamwaynechi) 's Twitter Profile Photo

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵
Yuxiao Qu (@quyuxiao) 's Twitter Profile Photo

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"!

🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it!

Website: cohenqu.github.io/mrt.github.io/

🧵[1/9]
Agentica Project (@agentica_) 's Twitter Profile Photo

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math.

The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥

Links below: