Fanjia Yan (@fanjia_yan) Twitter Tweets • TwiCopy

Shishir Patil

2 years ago

📢 Introducing Gorilla OpenFunctions! 🔥 We've listened to your calls for an open-source function calling model, and are thrilled to present Gorilla OpenFunctions 🦍 And yes, we've made parallel functions a reality in open-source! 😎 Curious about typical scenarios where GPT-4

thumb_up_off_alt308

chat_bubble_outline10

repeat56

shareShare

Shishir Patil

@shishirpatil_

2 years ago

📢Excited to release the live Berkeley Function-Calling Leaderboard! 🔥 Also debuting openfunctions-v2 🤩 the latest open-source SoTA function-calling model on-par with GPT-4🆕Native support for Javascript, Java, REST! 🫡 Leaderboard: gorilla.cs.berkeley.edu/leaderboard.ht… Blog:

thumb_up_off_alt294

chat_bubble_outline10

repeat64

shareShare

Shishir Patil

@shishirpatil_

2 years ago

In today's updates to the Berkeley Function Calling leaderboard: 📊Enhanced Leaderboard with Additional Models and Summary Table: Mistral AI-large-2402, Google AI Gemini 1.0 Pro, and Gemma now included. 🤖 Gradio for Interactive Exploration! Includes function calling demos, and

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

Shishir Patil

@shishirpatil_

2 years ago

🌀Check out RAFT: Retrieval-Aware Fine Tuning! A simple technique to prepare data for fine-tuning LLMs for in-domain RAG, i.e., question-answering on your set of documents 📄 Exciting collaboration with Berkeley AI Research 🤝 Microsoft Azure 🤝 AI at Meta MSFT-Meta blog:

thumb_up_off_alt159

chat_bubble_outline4

repeat43

shareShare

Shishir Patil

@shishirpatil_

2 years ago

📢Excited to release GoEx⚡️a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution 🔍 Key to our approach is "undo" 🔄 and "damage confinement" abstractions to manage unintended actions &

thumb_up_off_alt200

chat_bubble_outline6

repeat50

shareShare

Shishir Patil

@shishirpatil_

2 years ago

📊Delighted to welcome Command-R-Plus, Llama-3, and and Gemini-Pro-1.5 into the Berkeley Function Calling Leaderboard. Check out how they stack up across different categories, P95 latency, and costs at gorilla.cs.berkeley.edu/leaderboard.ht… Congratulations to Cohere, AI at Meta, and

thumb_up_off_alt310

chat_bubble_outline13

repeat56

shareShare

Manish Shetty

@slimshetty_

2 years ago

Want to turn your own GitHub Repos into a playground for 🤖 coding agents? 📢📢 Introducing R2E: Repository to Environment 📈 Scalable, dynamic, real-world repo-level benchmarks 💡 Generate Equivalence Tests Harnesses 🔗 r2e.dev | Accepted @ ICML '24 🧵

thumb_up_off_alt153

chat_bubble_outline3

repeat28

shareShare

Shishir Patil

@shishirpatil_

2 years ago

📢Berkeley Function Calling Leaderboard Update: Discover the enhanced performance and cost-efficiency of Google DeepMind's Gemini-1.5-pro and Gemini-1.5-flash, alongside OpenAI's new gpt-4o models ⚡️Gemini sets a new benchmark in function-calling 🏆and improves its

📢Berkeley Function Calling Leaderboard Update: Discover the enhanced performance and cost-efficiency of <a href="/GoogleDeepMind/">Google DeepMind</a>'s Gemini-1.5-pro and Gemini-1.5-flash, alongside <a href="/OpenAI/">OpenAI</a>'s new gpt-4o models ⚡️Gemini sets a new benchmark in function-calling 🏆and improves its

thumb_up_off_alt115

chat_bubble_outline9

repeat39

shareShare

Shishir Patil

@shishirpatil_

2 years ago

📣 Announcing BFCL V3 - evaluating how LLMs handle multi-turn, and multi-step function calling! 🚀 For agentic systems, function calling is critical, but a model needs to do more than single-turn tasks. Can it manage multi-turn workflows, handle sequential functions, and adapt to

thumb_up_off_alt204

chat_bubble_outline9

repeat43

shareShare

Charles Packer

@charlespacker

2 years ago

Excited to finally announce Letta ! The next frontier in AI is in the stateful layer above the base models - the "memory layer", or "LLM OS". Letta's mission is to build this layer in the open (say "no" 🙅 to privatized chain of thought).

Excited to finally announce <a href="/Letta_AI/">Letta</a> !

The next frontier in AI is in the stateful layer above the base models - the "memory layer", or "LLM OS".

Letta's mission is to build this layer in the open (say "no" 🙅 to privatized chain of thought).

thumb_up_off_alt204

chat_bubble_outline19

repeat20

shareShare

Changran Hu

@changran_hu

a year ago

🚀Excited to announce the best open-source tokenizer-free language model! EvaByte, our 6.5B byte-level LM developed by The University of Hong Kong SambaNova , matches modern tokenizer-based LMs AI at Meta Google DeepMind Ai2 Apple with 5x data efficiency & 2x faster decoding!

🚀Excited to announce the best open-source tokenizer-free language model! EvaByte, our 6.5B byte-level LM developed by <a href="/HKUniversity/">The University of Hong Kong</a> <a href="/SambaNovaAI/">SambaNova</a> , matches modern tokenizer-based LMs <a href="/AIatMeta/">AI at Meta</a> <a href="/GoogleDeepMind/">Google DeepMind</a> <a href="/allen_ai/">Ai2</a> <a href="/Apple/">Apple</a> with 5x data efficiency & 2x faster decoding!

thumb_up_off_alt8

chat_bubble_outline1

repeat4

shareShare

Wenhao Chai

@wenhaocha1

7 months ago

LiveCodeBench Pro remains one of the most challenging code benchmarks, but its evaluation and verification process is still a black box. We introduce AutoCode, which democratizes evaluation allowing anyone to locally run verification and perform RL training! For the first time,

thumb_up_off_alt126

chat_bubble_outline4

repeat29

shareShare

Fanjia Yan

@fanjia_yan

5 months ago

Great work!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Amazon News

@amazonnews

5 months ago

Amazon's Nova 2 models are here: ➡️ Lite: Fast and cost-effective reasoning for everyday tasks ➡️ Pro: For highly complex tasks like agentic coding, long-range planning, and sophisticated problem-solving – where the highest accuracy is essential ➡️ Sonic: Expanded multilingual

thumb_up_off_alt317

chat_bubble_outline13

repeat64

shareShare