Goku Mohandas (@GokuMohandas) Twitter Tweets • TwiCopy

Goku Mohandas

@GokuMohandas

+ Follow

ML @anyscalecompute + @raydistributed ← 🌏 Founder @MadeWithML (acq) ← ⚕️ ML Lead @Ciitizen (acq) ← 🍎 ML Engineer @Apple ← 🧬 Bio + ChemE @JohnsHopkins

ID:3259586191

linkhttps://madewithml.com/ calendar_today29-06-2015 04:21:48

1,0K Tweets

14,1K Followers

115 Following

Anyscale

@anyscalecompute

2 days ago

Recently, we’ve contributed chunked prefill to vLLM, leading to up to 2x speedup for higher QPS regimes!

In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n

Recently, we’ve contributed chunked prefill to @vllm_project, leading to up to 2x speedup for higher QPS regimes! In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n

thumb_up_off_alt85

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

1 month ago

🦙 We're excited to host Meta Llama-3 8b and 70b on Anyscale Endpoints!

➕ Fine-tuning, JSON mode and function calling support coming soon as well!

Pricing:
- 8B: $0.15 / Million tokens
- 70B: $1.00 / Million tokens

🦙 We're excited to host @Meta Llama-3 8b and 70b on Anyscale Endpoints! ➕ Fine-tuning, JSON mode and function calling support coming soon as well! Pricing: - 8B: $0.15 / Million tokens - 70B: $1.00 / Million tokens

thumb_up_off_alt54

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

1 month ago

We're excited to host Mistral AI's Mixtral-8x22B-Instruct-v0.1 on Anyscale Endpoints!

This new model outperforms all other open source models -- function calling support coming soon as well!

Pricing: $0.90/Million Tokens

We're excited to host @MistralAI's Mixtral-8x22B-Instruct-v0.1 on @anyscalecompute Endpoints! This new model outperforms all other open source models -- function calling support coming soon as well! Pricing: $0.90/Million Tokens

thumb_up_off_alt38

chat_bubble_outline0

account_circle

Samuel Path

3 months ago

I’ve read dozens of articles on building RAG-based LLM Applications, and this one by Goku Mohandas and Philipp Moritz from Anyscale is the best by far.

If you’re curious about RAG, do yourself a favor by studying this. It will bring you up to speed 🔥

anyscale.com/blog/a-compreh…

thumb_up_off_alt62

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

3 months ago

Ready to hear from #RAG experts at LangChain Vercel Pinecone Anyscale and get hands-on with intensive guided trainings? The 2-day RAG Developer Bootcamp is for you! Learn more & register now 👉hubs.ly/Q02j9QmW0

#llm #ml #rag #ai #vectordatabase #ray #pinecone

Ready to hear from #RAG experts at @LangChainAI @vercel @pinecone @anyscalecompute and get hands-on with intensive guided trainings? The 2-day RAG Developer Bootcamp is for you! Learn more & register now 👉hubs.ly/Q02j9QmW0 #llm #ml #rag #ai #vectordatabase #ray #pinecone

thumb_up_off_alt56

chat_bubble_outline0

account_circle

Robert Nishihara

@robertnishihara

3 months ago

This is the first hands-on, intensive, two-day bootcamp for learning to build RAG applications.

Cohosted by Pinecone and Anyscale (also featuring lessons from experts at LangChain, Vercel, and others).

Nearly every AI application will be a RAG application, and

thumb_up_off_alt119

chat_bubble_outline0

account_circle

Chip Huyen

4 months ago

New post: Sampling for Text Generation

huyenchip.com/2024/01/16/sam…

Many challenges (and opportunities) in working with AI today stem from the way models sample their outputs.

This post covers:

1. Sampling strategies and variables including temperature, top-k, and top-p.
2. How

thumb_up_off_alt456

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

4 months ago

Producing ~1B embeddings can take weeks and cost tens of thousands of dollars ($60K with OpenAI in the example below).

We are thrilled to partner with Pinecone on the launch of their new serverless offering!

Anyscale + Pinecone reduce the cost of computing these embeddings by

thumb_up_off_alt59

chat_bubble_outline0

account_circle

Guillermo Rauch

4 months ago

An AI-generated clone of HN built with Next.js App Router
◆ Uses PPR and streaming Node.js SSR
◆ Fully dynamic, fresh data from Postgres
◆ All the UIs bootstrapped with v0
◆ Content via @mistralai 8x7B and Anyscale Tools

What I've learned 🧵
next-ai-news.vercel.app

thumb_up_off_alt941

chat_bubble_outline0

account_circle

Robert Nishihara

@robertnishihara

4 months ago

Owen Colegrove HF is a great choice to host the data. To run embedding computations (or training or inference), we do this regularly with ray and Anyscale.

Here's an example where ByteDance runs embedding computations and inference on 200TB data.

anyscale.com/blog/how-byted…

thumb_up_off_alt30

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

5 months ago

Update on the LLMPerf Leaderboard.

We’ve received a lot of valuable feedback from the community on our open-source benchmark for LLM performance, and we would like to thank you all for the responses! We are looking forward to working with the community to establish an open,

thumb_up_off_alt30

chat_bubble_outline0

account_circle

Guillermo Rauch

5 months ago

Very impressed with Anyscale's endpoints, which support tools / function calling.

2LOC to play with Mixtral as a replacement for GPT 🤯

Very impressed with @anyscalecompute's endpoints, which support tools / function calling. 2LOC to play with Mixtral as a replacement for GPT 🤯

thumb_up_off_alt401

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

5 months ago

🔥 Mixtral-8x7B JSON Mode and Function Calling API is now available on Anyscale Endpoints!

Empirically, we observed noticeable improvements in response to tool messages by Mixtral MoE, compared Mistral AI 7B. 🚀 👇

Try it out: app.endpoints.anyscale.com

🔥 Mixtral-8x7B JSON Mode and Function Calling API is now available on Anyscale Endpoints! Empirically, we observed noticeable improvements in response to tool messages by Mixtral MoE, compared @MistralAI 7B. 🚀 👇 Try it out: app.endpoints.anyscale.com

thumb_up_off_alt105

chat_bubble_outline0

account_circle

Robert Nishihara

@robertnishihara

5 months ago

Curious how LLM providers compare on performance (e.g., AWS Bedrock, Fireworks, Replicate, Together, Anyscale)?

Two key metrics:
🚅 Time to first token
🚢 Inter-token latency

And of course, end-to-end latency can be derived from these two numbers.

Importantly, the code and

thumb_up_off_alt40

chat_bubble_outline0

account_circle

Anyscale

@anyscalecompute

5 months ago

📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market.

Our goal with this leaderboard is to equip users and developers with a clear understanding of the

📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the

thumb_up_off_alt168

chat_bubble_outline0

account_circle

jason liu

5 months ago

If you want to Pydantic and mistral for structured outputs Anyscale s constrained sampling works as good as function calling.

In the latest 0.4.4 we support a patch to use anyscale json_schema (rather than json mode) to get even better results.

jxnl.github.io/instructor/blo…

thumb_up_off_alt145

chat_bubble_outline0

account_circle

kourosh hakhamaneshi

5 months ago

Mixtral model keeps shocking us everyday. This is the first RAG app that I know of that mixtral has beaten not only gpt-3.5 but also the pre-dev-day gpt-4🤯

There seems to be a huge advantage on just having a more up to date model. If you have a RAG app where the data is likely

thumb_up_off_alt89

chat_bubble_outline0

account_circle

TokenBender (e/xperiments)

@4evaBehindSOTA

5 months ago

great set of experiments to refer to

meanwhile enough data to convince people to move to Mixtral for lots of production use cases for a fractional cost over gpt 3.5 with performance possibly better than gpt 4 (old)

$great set of experiments to refer to meanwhile enough data to convince people to move to Mixtral for lots of production use cases for a fractional cost over gpt 3.5 with performance possibly better than gpt 4 (old)$

thumb_up_off_alt60

chat_bubble_outline0

account_circle

Goku Mohandas

5 months ago

It's been nice to see small jumps in output quality in our RAG applications from chunking experiments, contextual preprocessing, prompt engineering, fine-tuned embeddings, lexical search, reranking, etc. but we just added Mixtral-8x7B-Instruct to the mix and we're seeing a 🤯

It's been nice to see small jumps in output quality in our RAG applications from chunking experiments, contextual preprocessing, prompt engineering, fine-tuned embeddings, lexical search, reranking, etc. but we just added Mixtral-8x7B-Instruct to the mix and we're seeing a 🤯

thumb_up_off_alt453

chat_bubble_outline0

account_circle

Robert Nishihara

@robertnishihara

5 months ago

Faster Mixtral? Much more to come here.

We make deep investments in open source AI. If you'd like to help build open source AI or optimize LLM performance, join us at Anyscale. DM me 🚢

Faster Mixtral? Much more to come here. We make deep investments in open source AI. If you'd like to help build open source AI or optimize LLM performance, join us at @anyscalecompute. DM me 🚢

thumb_up_off_alt271

chat_bubble_outline0

account_circle

fpc ok :)