Rohan Paul (@rohanpaul_ai) 's Twitter Profile
Rohan Paul

@rohanpaul_ai

I Build & Write AI stuff.

→ Join my LLM Newsletter - rohanpaul.substack.com

💼 AI Engineer

ID: 2588345408

linkhttps://linktr.ee/rohanpaul calendar_today25-06-2014 22:38:54

17,17K Tweet

30,30K Followers

374 Following

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Self-Hosting LLaMA 3.1 70 on RunPod with vLLM Inference Engine For a 70B parameter model, to deploy in 16-bit floating point precision we’ll need ~140GB of memory, and for something like 4-bit (INT4) we only need ~45GB And you need additional memory for - Context

Self-Hosting LLaMA 3.1 70 on <a href="/runpod_io/">RunPod</a>  with <a href="/vllm_project/">vLLM</a> Inference Engine

For a 70B parameter model, to deploy in 16-bit floating point precision we’ll need ~140GB of memory, and for something like 4-bit (INT4) we only need ~45GB

And you need additional memory for

- Context