Rohan Paul (@rohanpaul_ai) Twitter Tweets • TwiCopy

repeat5

account_circle

Rohan Paul

10 hours ago

The Perplexity of Quantized Llama 3 degrades quite a bit vs quantized Llama 2

But looking at the below Perplexity numbers - Llama 3 has a higher initial perplexity when not quantized vs Llama-2.

Possible explanation 👇

Degree of specialization of each model on the Wikitext

thumb_up_off_alt5

repeat1

account_circle

Rohan Paul

10 hours ago

Google's new Med-Gemini surpasses the GPT-4 model family on every benchmark where a direct comparison could be made.

Achieves SoTA performance of 91.1% accuracy on MedQA (USMLE) benchmark, using a novel uncertainty-guided search strategy.

📌 A very significant advancements in

account_circle

Rohan Paul

10 hours ago

Meta is really taking the leadership position for OSS contributions.

Beyond Llama-3 - we have all the below from them

(and this not an exhaustive list)

- React
- PyTorch
- React Native
- GraphQL
- Jest
- Flow
- Yarn
- Hermes
- FBT
- Prophet
- Cassandra
- Mercurial (which is

thumb_up_off_alt18

repeat2

account_circle

Rohan Paul

15 hours ago

Comparison of TensorRT-LLM on consumer hardware vs llama .cpp - by Jan

(blog link in 1st comment)
---

Findings - 'TensorRT-LLM was:

- 30-70% faster than llama.cpp on the same hardware

- Consumes less memory on consecutive runs and marginally more GPU VRAM

Comparison of TensorRT-LLM on consumer hardware vs llama .cpp - by @janframework (blog link in 1st comment) --- Findings - 'TensorRT-LLM was: - 30-70% faster than llama.cpp on the same hardware - Consumes less memory on consecutive runs and marginally more GPU VRAM

account_circle

Rohan Paul

23 hours ago

Really fantastic paper for a new understanding of In-context Learning in Transformers

'Transformers learn in-context'

In-context learning refers to the ability of Transformers to adapt their predictions based on the context provided in the input sequence, without the need for

account_circle

Rohan Paul

23 hours ago

Recently Microsoft announced 'ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks' 🔥

📌 It investigates the limitations of existing 4-bit quantization methods like GPTQ for large language models (LLMs), which tend to overfit

thumb_up_off_alt25

repeat6

account_circle

Rohan Paul

10 hours ago

Meta is really taking the leadership position for OSS contributions.

Beyond Llama-3 - we have all the below from them

(and this not an exhaustive list)

- React
- PyTorch
- React Native
- GraphQL
- Jest
- Flow
- Yarn
- Hermes
- FBT
- Prophet
- Cassandra
- Mercurial (which is

thumb_up_off_alt18

repeat2

account_circle

Yam Peleg

@Yampeleg

11 hours ago

anton For searching a database? of course it does, context kills RAGs 300%.

Cache the dataset in the KV cache, it is better in any way shape or form.

RAG will only be used for explainability because it is very easy to explain (and 'blame').

(rag is also very hard to get to work..)

thumb_up_off_alt16

repeat1

account_circle

Rohan Paul

10 hours ago

The Perplexity of Quantized Llama 3 degrades quite a bit vs quantized Llama 2

But looking at the below Perplexity numbers - Llama 3 has a higher initial perplexity when not quantized vs Llama-2.

Possible explanation 👇

Degree of specialization of each model on the Wikitext

thumb_up_off_alt5

repeat1

account_circle

Marques Brownlee

@MKBHD

1 day ago

NEW VIDEO - Rabbit R1: Barely Reviewable

youtu.be/ddTV12hErTc

This is the pinnacle of a trend that's been annoying for years: Delivering barely finished products to win a 'race' and then continuing to build them after charging full price. Games, phones, cars, now AI in a box

thumb_up_off_alt27,6K

repeat2,0K

account_circle

Rohan Paul

11 hours ago

When MKBHD proves a point in one scene - Love his style. 💯

LLM Hallucinations ( oops it's LAM)

thumb_up_off_alt2

repeat0