Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profileg
Aran Komatsuzaki

@arankomatsuzaki

@TeraflopAI

ID:794433401591693312

linkhttps://arankomatsuzaki.wordpress.com/about-me/ calendar_today04-11-2016 06:57:37

4,8K Tweets

94,6K Followers

78 Following

Follow People
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents?

Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives…

Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents? Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives…
account_circle
Jiayi Pan(@pan_jiayipan) 's Twitter Profile Photo

Thanks Aran for sharing!
AI feedbacks will enable autonomous evaluation and improvement of language agents at scale. We have a thread here if you wanna learn more :)

twitter.com/pan_jiayipan/s…

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Autonomous Evaluation and Refinement of Digital Agents

Improves WebArena's GPT4 SotA agent by 30%+ and CogAgent in iOS by 75% without any extra supervision but only a VLM-based evaluator

repo: github.com/Berkeley-NLP/A…
abs: arxiv.org/abs/2404.06474

Autonomous Evaluation and Refinement of Digital Agents Improves WebArena's GPT4 SotA agent by 30%+ and CogAgent in iOS by 75% without any extra supervision but only a VLM-based evaluator repo: github.com/Berkeley-NLP/A… abs: arxiv.org/abs/2404.06474
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

Presents a runtime for LLMs with an intuitive undo and damage confinement abstractions, enabling the safer deployment of LLM agents in practice

repo: github.com/ShishirPatil/g…
abs:…

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications Presents a runtime for LLMs with an intuitive undo and damage confinement abstractions, enabling the safer deployment of LLM agents in practice repo: github.com/ShishirPatil/g… abs:…
account_circle
Zhibin Gou(@zebgou) 's Twitter Profile Photo

🔥 Not All Tokens Are What You Need!
🚀 Releasing the Rho-1 series, including the first 1B LLM to hit 40.6% on MATH.

Rho-1 introduces Selective Language Modeling ( ) for token-level pretraining data selection.

Thanks to AK and Aran Komatsuzaki for sharing our work!

🔥 Not All Tokens Are What You Need! 🚀 Releasing the Rho-1 series, including the first 1B LLM to hit 40.6% on MATH. Rho-1 introduces Selective Language Modeling (#SLM) for token-level pretraining data selection. Thanks to @_akhaliq and @arankomatsuzaki for sharing our work!
account_circle
Iker García-Ferrero(@iker_garciaf) 's Twitter Profile Photo

We have released the training corpus, models and a lot of multilingual evaluation benchmarks in Hugging Face: huggingface.co/collections/Hi…

We hope this project begins a wave of multilingual models for the medical domain!

account_circle
Ruibo Liu(@RuiboLiu) 's Twitter Profile Photo

Thanks Aran for sharing our work!

This is a survey paper I’ve been thinking about for a long time, as we have seen an increasing need for synthetic data. As we will probably run out of fresh tokens soon, the audience of this paper should be everyone who cares about AI progress.

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

AssemblyAI presents Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Presents an end-to-end ASR model trained on 570k hours of speech data

arxiv.org/abs/2404.07341

AssemblyAI presents Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping Presents an end-to-end ASR model trained on 570k hours of speech data arxiv.org/abs/2404.07341
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Google presents Best Practices and Lessons Learned on Synthetic Data for Language Models

Provides an overview of synthetic data research, discussing its applications, challenges, and future directions

arxiv.org/abs/2404.07503

Google presents Best Practices and Lessons Learned on Synthetic Data for Language Models Provides an overview of synthetic data research, discussing its applications, challenges, and future directions arxiv.org/abs/2404.07503
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Several LLMs (e.g., GPT-4) perform on par w/ supervised methods like Random Forest on regression

repo:github.com/robertvacarean…
abs: arxiv.org/abs/2404.07544

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Several LLMs (e.g., GPT-4) perform on par w/ supervised methods like Random Forest on regression repo:github.com/robertvacarean… abs: arxiv.org/abs/2404.07544
account_circle
Tianbao Xie(@TianbaoX) 's Twitter Profile Photo

Thanks Aran for sharing!! 🤗🤗

Let’s work together towards general purpose computer agent this time again, with multimodal language model agent.

(Will have an official post later.

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

ByteDance presents InfiCoder-Eval

InfiCoder-Eval comprises 270 carefully picked high-quality StackOverflow questions, covering 18 programming languages, for evaluating code LLMs

proj: infi-coder.github.io/inficoder-eval/
abs: arxiv.org/abs/2404.07940

ByteDance presents InfiCoder-Eval InfiCoder-Eval comprises 270 carefully picked high-quality StackOverflow questions, covering 18 programming languages, for evaluating code LLMs proj: infi-coder.github.io/inficoder-eval/ abs: arxiv.org/abs/2404.07940
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Microsoft presents Rho-1: Not All Tokens Are What You Need

RHO-1-1B and 7B achieves SotA results of 40.6% and 51.8% on MATH dataset, respectively — matching DeepSeekMath with only 3% of the pretraining tokens.

repo: github.com/microsoft/rho
abs: arxiv.org/abs/2404.07965

Microsoft presents Rho-1: Not All Tokens Are What You Need RHO-1-1B and 7B achieves SotA results of 40.6% and 51.8% on MATH dataset, respectively — matching DeepSeekMath with only 3% of the pretraining tokens. repo: github.com/microsoft/rho abs: arxiv.org/abs/2404.07965
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Apple presents Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Substantial improvements over SotA VLMs, thanks to its high-resolution scaling and fine-grained visual processing

arxiv.org/abs/2404.07973

Apple presents Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Substantial improvements over SotA VLMs, thanks to its high-resolution scaling and fine-grained visual processing arxiv.org/abs/2404.07973
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

LLoCO: Learning Long Contexts Offline

- Significantly outperforms ICL while using 30x fewer tokens during inference
- Achieves up to 7.62x speed-up and substantially reduces the cost of long document QA

arxiv.org/abs/2404.07979

LLoCO: Learning Long Contexts Offline - Significantly outperforms ICL while using 30x fewer tokens during inference - Achieves up to 7.62x speed-up and substantially reduces the cost of long document QA arxiv.org/abs/2404.07979
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

ControlNet++: Improving Conditional Controls
with Efficient Consistency Feedback

Proposes an approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency

proj: liming-ai.github.io/ControlNet_Plu…
abs: arxiv.org/abs/2404.07987

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Proposes an approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency proj: liming-ai.github.io/ControlNet_Plu… abs: arxiv.org/abs/2404.07987
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Attention international students in the U.S. exploring work options!

Did you know that unlike OPT or full-time CPT, part-time CPT (under 20 hours/week) is not subject to a cap?

This means that you can work for as many semesters as you want w/o affecting your future CPT / OPT!

account_circle