Harsh Maheshwari (@harshmheshwari) Twitter Tweets • TwiCopy

5 months ago

Recently hugging face released a 15 trillion token dataset for pre-training and now this for SFT! Hopefully multimodal datasets also get released soon at this scale. huggingface.co/datasets/Huggi…

thumb_up_off_alt3

An interesting paper which highlights that SFT is more for alignment than injecting new knowledge to LLMs and doing so leads to increase in hallucination. I really liked their approach to determine if LLM have some factual information already or not. arxiv.org/pdf/2405.05904

thumb_up_off_alt9

Harsh Maheshwari

4 months ago

This is an important directions towards detecting untrained/under trained token. They waste tokenizer capacity, can trigger harmful outputs from unexpected inputs, & may be exploited to bypass safety guardrails by pushing the model beyond its training scope.

Harsh Maheshwari

4 months ago

Something which all of us take for granted on how it is done. A good read to understand why there is memory IO bottleneck then computation speed bottleneck

Harsh Maheshwari

4 months ago

A good read from Snowflake providing insight about optimal hyper parameters for training MoE based LLM. It highlights that model performance is directly proportional to total number of expert combination either by top-k selection or frequency of MoE layer shorturl.at/Mw92v

chat_bubble_outline1

Harsh Maheshwari

4 months ago

I published a blog on Low Rank Adaptation (LoRA) which is a common technique used for fine tuning LLMs as well as other heavy model. I have added the most basic version of code as well, which should help readers to understand the topic better. shorturl.at/3zQy3

thumb_up_off_alt11

Harsh Maheshwari

3 months ago

Recently came across this repository. It's a great resource for learning about each component which is used in a typical LLM and also how are the flow of things. github.com/naklecha/llama…

thumb_up_off_alt10

Harsh Maheshwari

3 months ago

A good read for those working towards deploying LLMs in production for their application/product. It provides a good insight about data, models, product vision and team roles. oreilly.com/radar/what-we-…

Harsh Maheshwari

3 months ago

A good video by Ofir Press to get a top level view of the type of prompting and environment required for LLM powered agents. Some of the insight are very helpful, like providing the entire code all together makes it difficult for LLM to work upon! youtube.com/watch?v=RJ6NN8…

thumb_up_off_alt2

Harsh Maheshwari

3 months ago

A good tip: The further the new task deviates from the pretraining data, the more advantageous it is to use full finetuning over LoRA when it comes to acquiring new knowledge. Read this paper arxiv.org/pdf/2405.09673 to understand more about when LoRA is useful vs not.

thumb_up_off_alt1

Harsh Maheshwari

3 months ago

With every LLM scoring very high performance on MMLU, HumanEval & other benchmarks. The idea of Livebench, where some questions are constantly added in test set to eliminate potential contamination is awesome. Checkout this link for information livebench.ai/#

thumb_up_off_alt10

repeat1

Harsh Maheshwari

3 months ago

One simple trick to enforce LLM to provide output in required format is to add the starting token of format in your prompt. For example in the given image, I added "{" in the prompt. This process increases the probability of getting the output in required format to great extent

thumb_up_off_alt4

Harsh Maheshwari

3 months ago

Recently, I wrote about Extending context length. Give it a read if you want to dive deeper into this topic along with its code implementation. I have described about Position Interpolation, NTK aware & Dynamic NTK method in this blog. shorturl.at/KRepb