Goku Mohandas(@GokuMohandas) 's Twitter Profileg
Goku Mohandas

@GokuMohandas

ML @anyscalecompute + @raydistributed ← 🌏 Founder @MadeWithML (acq) ← βš•οΈ ML Lead @Ciitizen (acq) ← 🍎 ML Engineer @Apple ← 🧬 Bio + ChemE @JohnsHopkins

ID:3259586191

linkhttps://madewithml.com/ calendar_today29-06-2015 04:21:48

1,0K Tweets

14,1K Followers

115 Following

Anyscale(@anyscalecompute) 's Twitter Profile Photo

Recently, we’ve contributed chunked prefill to vLLM, leading to up to 2x speedup for higher QPS regimes!

In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n

Recently, we’ve contributed chunked prefill to @vllm_project, leading to up to 2x speedup for higher QPS regimes! In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n
account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

πŸ¦™ We're excited to host Meta Llama-3 8b and 70b on Anyscale Endpoints!

βž• Fine-tuning, JSON mode and function calling support coming soon as well!

Pricing:
- 8B: $0.15 / Million tokens
- 70B: $1.00 / Million tokens

πŸ¦™ We're excited to host @Meta Llama-3 8b and 70b on Anyscale Endpoints! βž• Fine-tuning, JSON mode and function calling support coming soon as well! Pricing: - 8B: $0.15 / Million tokens - 70B: $1.00 / Million tokens
account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

We're excited to host Mistral AI's Mixtral-8x22B-Instruct-v0.1 on Anyscale Endpoints!

This new model outperforms all other open source models -- function calling support coming soon as well!

Pricing: $0.90/Million Tokens

We're excited to host @MistralAI's Mixtral-8x22B-Instruct-v0.1 on @anyscalecompute Endpoints! This new model outperforms all other open source models -- function calling support coming soon as well! Pricing: $0.90/Million Tokens
account_circle
Samuel Path(@smlpth) 's Twitter Profile Photo

I’ve read dozens of articles on building RAG-based LLM Applications, and this one by Goku Mohandas and Philipp Moritz from Anyscale is the best by far.

If you’re curious about RAG, do yourself a favor by studying this. It will bring you up to speed πŸ”₯

anyscale.com/blog/a-compreh…

account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

Ready to hear from experts at LangChain Vercel Pinecone Anyscale and get hands-on with intensive guided trainings? The 2-day RAG Developer Bootcamp is for you! Learn more & register now πŸ‘‰hubs.ly/Q02j9QmW0

Ready to hear from #RAG experts at @LangChainAI @vercel @pinecone @anyscalecompute and get hands-on with intensive guided trainings? The 2-day RAG Developer Bootcamp is for you! Learn more & register now πŸ‘‰hubs.ly/Q02j9QmW0 #llm #ml #rag #ai #vectordatabase #ray #pinecone
account_circle
Robert Nishihara(@robertnishihara) 's Twitter Profile Photo

This is the first hands-on, intensive, two-day bootcamp for learning to build RAG applications.

Cohosted by Pinecone and Anyscale (also featuring lessons from experts at LangChain, Vercel, and others).

Nearly every AI application will be a RAG application, and

account_circle
Chip Huyen(@chipro) 's Twitter Profile Photo

New post: Sampling for Text Generation

huyenchip.com/2024/01/16/sam…

Many challenges (and opportunities) in working with AI today stem from the way models sample their outputs.

This post covers:

1. Sampling strategies and variables including temperature, top-k, and top-p.
2. How

account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

Producing ~1B embeddings can take weeks and cost tens of thousands of dollars ($60K with OpenAI in the example below).

We are thrilled to partner with Pinecone on the launch of their new serverless offering!

Anyscale + Pinecone reduce the cost of computing these embeddings by

account_circle
Guillermo Rauch(@rauchg) 's Twitter Profile Photo

An AI-generated clone of HN built with Next.js App Router
β—† Uses PPR and streaming Node.js SSR
β—† Fully dynamic, fresh data from Postgres
β—† All the UIs bootstrapped with v0
β—† Content via @mistralai 8x7B and Anyscale Tools

What I've learned 🧡
next-ai-news.vercel.app

account_circle
Robert Nishihara(@robertnishihara) 's Twitter Profile Photo

Owen Colegrove HF is a great choice to host the data. To run embedding computations (or training or inference), we do this regularly with ray and Anyscale.

Here's an example where ByteDance runs embedding computations and inference on 200TB data.

anyscale.com/blog/how-byted…

account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

Update on the LLMPerf Leaderboard.

We’ve received a lot of valuable feedback from the community on our open-source benchmark for LLM performance, and we would like to thank you all for the responses! We are looking forward to working with the community to establish an open,

account_circle
Guillermo Rauch(@rauchg) 's Twitter Profile Photo

Very impressed with Anyscale's endpoints, which support tools / function calling.

2LOC to play with Mixtral as a replacement for GPT 🀯

Very impressed with @anyscalecompute's endpoints, which support tools / function calling. 2LOC to play with Mixtral as a replacement for GPT 🀯
account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

πŸ”₯ Mixtral-8x7B JSON Mode and Function Calling API is now available on Anyscale Endpoints!

Empirically, we observed noticeable improvements in response to tool messages by Mixtral MoE, compared Mistral AI 7B. πŸš€ πŸ‘‡

Try it out: app.endpoints.anyscale.com

πŸ”₯ Mixtral-8x7B JSON Mode and Function Calling API is now available on Anyscale Endpoints! Empirically, we observed noticeable improvements in response to tool messages by Mixtral MoE, compared @MistralAI 7B. πŸš€ πŸ‘‡ Try it out: app.endpoints.anyscale.com
account_circle
Robert Nishihara(@robertnishihara) 's Twitter Profile Photo

Curious how LLM providers compare on performance (e.g., AWS Bedrock, Fireworks, Replicate, Together, Anyscale)?

Two key metrics:
πŸš… Time to first token
🚒 Inter-token latency

And of course, end-to-end latency can be derived from these two numbers.

Importantly, the code and

account_circle
Anyscale(@anyscalecompute) 's Twitter Profile Photo

πŸ“ˆWe’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market.

Our goal with this leaderboard is to equip users and developers with a clear understanding of the

πŸ“ˆWe’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the
account_circle
jason liu(@jxnlco) 's Twitter Profile Photo

If you want to Pydantic and mistral for structured outputs Anyscale s constrained sampling works as good as function calling.

In the latest 0.4.4 we support a patch to use anyscale json_schema (rather than json mode) to get even better results.

jxnl.github.io/instructor/blo…

account_circle
kourosh hakhamaneshi(@CyrusHakha) 's Twitter Profile Photo

Mixtral model keeps shocking us everyday. This is the first RAG app that I know of that mixtral has beaten not only gpt-3.5 but also the pre-dev-day gpt-4🀯

There seems to be a huge advantage on just having a more up to date model. If you have a RAG app where the data is likely

account_circle
TokenBender (e/xperiments)(@4evaBehindSOTA) 's Twitter Profile Photo

great set of experiments to refer to

meanwhile enough data to convince people to move to Mixtral for lots of production use cases for a fractional cost over gpt 3.5 with performance possibly better than gpt 4 (old)

great set of experiments to refer to meanwhile enough data to convince people to move to Mixtral for lots of production use cases for a fractional cost over gpt 3.5 with performance possibly better than gpt 4 (old)
account_circle
Goku Mohandas(@GokuMohandas) 's Twitter Profile Photo

It's been nice to see small jumps in output quality in our RAG applications from chunking experiments, contextual preprocessing, prompt engineering, fine-tuned embeddings, lexical search, reranking, etc. but we just added Mixtral-8x7B-Instruct to the mix and we're seeing a 🀯

It's been nice to see small jumps in output quality in our RAG applications from chunking experiments, contextual preprocessing, prompt engineering, fine-tuned embeddings, lexical search, reranking, etc. but we just added Mixtral-8x7B-Instruct to the mix and we're seeing a 🀯
account_circle
Robert Nishihara(@robertnishihara) 's Twitter Profile Photo

Faster Mixtral? Much more to come here.

We make deep investments in open source AI. If you'd like to help build open source AI or optimize LLM performance, join us at Anyscale. DM me 🚒

Faster Mixtral? Much more to come here. We make deep investments in open source AI. If you'd like to help build open source AI or optimize LLM performance, join us at @anyscalecompute. DM me 🚒
account_circle