Mr. Agent (@agenticai) 's Twitter Profile
Mr. Agent

@agenticai

Creator of new things.

ID: 1794485331883892736

calendar_today25-05-2024 21:47:12

84 Tweet

66 Followers

270 Following

AK (@_akhaliq) 's Twitter Profile Photo

Husky A Unified, Open-Source Language Agent for Multi-Step Reasoning Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as

Husky

A Unified, Open-Source Language Agent for Multi-Step Reasoning

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as
Chief AI Officer (@chiefaioffice) 's Twitter Profile Photo

BREAKING: Mistral raises a $640M Series B led by General Catalyst at a $6B valuation. Here's their Seed pitch deck to remind you of their vision:

BREAKING: Mistral raises a $640M Series B led by General Catalyst at a $6B valuation.

Here's their Seed pitch deck to remind you of their vision:
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Simple and Effective Masked Diffusion Language Models Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity repo: github.com/kuleshov-group… abs: arxiv.org/abs/2406.07524

Simple and Effective Masked Diffusion Language Models

Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity

repo: github.com/kuleshov-group…
abs: arxiv.org/abs/2406.07524
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Google presents Improve Mathematical Reasoning in Language Models by Automated Process Supervision - MCTS for the efficient collection of high-quality process supervision data - 51% -> 69.4% on MATH - No human intervention arxiv.org/abs/2406.06592

Google presents Improve Mathematical Reasoning in Language Models by Automated Process Supervision

- MCTS for the efficient collection of high-quality process supervision data
- 51% -> 69.4% on MATH
- No human intervention

arxiv.org/abs/2406.06592
Sumit (@_reachsumit) 's Twitter Profile Photo

Synthetic Query Generation using Large Language Models for Virtual Assistants Apple investigates the use of LLMs to generate synthetic queries for virtual assistants that are similar to real user queries and specific to retrieving relevant entities. 📝arxiv.org/abs/2406.06729

Synthetic Query Generation using Large Language Models for Virtual Assistants

Apple investigates the use of LLMs to generate synthetic queries for virtual assistants that are similar to real user queries and specific to retrieving relevant entities.

📝arxiv.org/abs/2406.06729
elvis (@omarsar0) 's Twitter Profile Photo

Towards Lifelong Learning of LLMs Nice survey on techniques to enable LLMs to learn continuously, integrate new knowledge, retain previously learned information, and prevent catastrophic forgetting. arxiv.org/abs/2406.06391

Towards Lifelong Learning of LLMs

Nice survey on techniques to enable LLMs to learn continuously, integrate new knowledge, retain previously learned information, and prevent catastrophic forgetting.

arxiv.org/abs/2406.06391
Bindu Reddy (@bindureddy) 's Twitter Profile Photo

Announcing LiveBench AI - The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!! We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI! LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval,

Announcing LiveBench AI -  The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!!

We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI!

LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval,
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math > Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral. > Supports 338 programming languages and 128K context length. > Fully open-sourced with two sizes: 230B (also

DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math

> Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral.
> Supports 338 programming languages and 128K context length.
> Fully open-sourced with two sizes: 230B (also
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Learning Iterative Reasoning through Energy Diffusion abs: arxiv.org/abs/2406.11179 project page: energy-based-model.github.io/ired/ "IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of

Learning Iterative Reasoning through Energy Diffusion

abs: arxiv.org/abs/2406.11179
project page: energy-based-model.github.io/ired/

"IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Transcendence: Generative Models Can Outperform The Experts That Train Them abs: arxiv.org/abs/2406.11741 Uses chess games as a simple testbed for studying transcedence: generative models trained on human labels that outperform humans. Transformer models are trained on public

Transcendence: Generative Models Can Outperform The Experts That Train Them

abs: arxiv.org/abs/2406.11741

Uses chess games as a simple testbed for studying transcedence: generative models trained on human labels that outperform humans.

Transformer models are trained on public
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

How Do Large Language Models Acquire Factual Knowledge During Pretraining? Reveals several important insights into the dynamics of factual knowledge acquisition during pretraining arxiv.org/abs/2406.11813

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Reveals several important insights into the dynamics of factual knowledge acquisition during pretraining

arxiv.org/abs/2406.11813
Harrison Chase (@hwchase17) 's Twitter Profile Photo

I have lots of thoughts on "agents"! ❓What is an agent? Why do the basic agents not work reliably? How are teams bringing "agentic" applications to production 🙏I had a lot of fun talking about these topics (and more!) for nearly a hour with Sonya/Pat open.spotify.com/episode/786INO…

I have lots of thoughts on "agents"!

❓What is an agent? Why do the basic agents not work reliably? How are teams bringing "agentic" applications to production

🙏I had a lot of fun talking about these topics (and more!) for nearly a hour with Sonya/Pat

open.spotify.com/episode/786INO…
elvis (@omarsar0) 's Twitter Profile Photo

From RAG to Rich Parameters Investigates more closely how LLMs utilize external knowledge over parametric information for factual queries. Finds that in a RAG pipeline, LLMs take a “shortcut” and display a strong bias towards utilizing only the context information to answer the

From RAG to Rich Parameters

Investigates more closely how LLMs utilize external knowledge over parametric information for factual queries.

Finds that in a RAG pipeline, LLMs take a “shortcut” and display a strong bias towards utilizing only the context information to answer the
Sumit (@_reachsumit) 's Twitter Profile Photo

A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges Explores applications of LLMs in various financial tasks, discussing the challenges, opportunities, and resources for further development in this domain. 📝arxiv.org/abs/2406.11903

A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges

Explores applications of LLMs in various financial tasks, discussing the challenges, opportunities, and resources for further development in this domain.

📝arxiv.org/abs/2406.11903
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯 For a challenging reasoning task with a large search space,

Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro)  through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯

For a challenging reasoning task with a large search space,
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Enables MLLMs to express intermediate reasoning as images using code. You probably didn't use typography knowledge to solve this query proj: whiteboard.cs.columbia.edu abs: arxiv.org/abs/2406.14562

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Enables MLLMs to express intermediate reasoning as images using code. You probably didn't use typography knowledge to solve this query

proj: whiteboard.cs.columbia.edu
abs: arxiv.org/abs/2406.14562
Harrison Chase (@hwchase17) 's Twitter Profile Photo

❓What is an agent? I get asked this question a lot, so I wrote a little blog on this topic and other things: - What is an agent? - What does it mean to be agentic? - Why is “agentic” a helpful concept? - Agentic is new Check it out here: blog.langchain.dev/what-is-an-age…

❓What is an agent?

I get asked this question a lot, so I wrote a little blog on this topic and other things:
- What is an agent?
- What does it mean to be agentic?
- Why is “agentic” a helpful concept?
- Agentic is new

Check it out here: blog.langchain.dev/what-is-an-age…
Namgyu Ho (@itsnamgyu) 's Twitter Profile Photo

Do you know your LLM uses less than 1% of your GPU at inference? Too much time is wasted on KV cache memory access ➡️ We tackle this with the 🎁 Block Transformer: a global-to-local architecture that speeds up decoding up to 20x 🚀 KAIST AI LG AI Research w/ Google DeepMind 🧵

Do you know your LLM uses less than 1% of your GPU at inference? Too much time is wasted on KV cache memory access ➡️ We tackle this with the 🎁 Block Transformer: a global-to-local architecture that speeds up decoding up to 20x 🚀

<a href="/kaist_ai/">KAIST AI</a> <a href="/LG_AI_Research/">LG AI Research</a> w/ <a href="/GoogleDeepMind/">Google DeepMind</a> 🧵