Harsh Maheshwari (@harshmheshwari) 's Twitter Profile
Harsh Maheshwari

@harshmheshwari

Enthusiastic about #GenerativeAI #DataScience 🤖 | Constantly curious learner 🌱 | Applied scientist 2 at @amazon | Writer at @medium | @IITKGP Graduate

ID: 1055176610134085632

calendar_today24-10-2018 19:18:02

139 Tweet

1,1K Followers

1,1K Following

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

Recently hugging face released a 15 trillion token dataset for pre-training and now this for SFT! Hopefully multimodal datasets also get released soon at this scale. huggingface.co/datasets/Huggi…

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

An interesting paper which highlights that SFT is more for alignment than injecting new knowledge to LLMs and doing so leads to increase in hallucination. I really liked their approach to determine if LLM have some factual information already or not. arxiv.org/pdf/2405.05904

An interesting paper which highlights that SFT is more for alignment than injecting new knowledge to LLMs and doing so leads to increase in hallucination. I really liked their approach to determine if LLM have some factual information already or not. 
arxiv.org/pdf/2405.05904
Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

This is an important directions towards detecting untrained/under trained token. They waste tokenizer capacity, can trigger harmful outputs from unexpected inputs, & may be exploited to bypass safety guardrails by pushing the model beyond its training scope.

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

Something which all of us take for granted on how it is done. A good read to understand why there is memory IO bottleneck then computation speed bottleneck

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

A good read from Snowflake providing insight about optimal hyper parameters for training MoE based LLM. It highlights that model performance is directly proportional to total number of expert combination either by top-k selection or frequency of MoE layer shorturl.at/Mw92v

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

I published a blog on Low Rank Adaptation (LoRA) which is a common technique used for fine tuning LLMs as well as other heavy model. I have added the most basic version of code as well, which should help readers to understand the topic better. shorturl.at/3zQy3

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

Recently came across this repository. It's a great resource for learning about each component which is used in a typical LLM and also how are the flow of things. github.com/naklecha/llama…

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

A good read for those working towards deploying LLMs in production for their application/product. It provides a good insight about data, models, product vision and team roles. oreilly.com/radar/what-we-…

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

A good video by Ofir Press to get a top level view of the type of prompting and environment required for LLM powered agents. Some of the insight are very helpful, like providing the entire code all together makes it difficult for LLM to work upon! youtube.com/watch?v=RJ6NN8…

Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

A good tip: The further the new task deviates from the pretraining data, the more advantageous it is to use full finetuning over LoRA when it comes to acquiring new knowledge. Read this paper arxiv.org/pdf/2405.09673 to understand more about when LoRA is useful vs not.

A good tip: The further the new task deviates from the pretraining data, the more advantageous it is to use full finetuning over LoRA when it comes to acquiring new knowledge. Read this paper arxiv.org/pdf/2405.09673 to understand more about when LoRA is useful vs not.
Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

With every LLM scoring very high performance on MMLU, HumanEval & other benchmarks. The idea of Livebench, where some questions are constantly added in test set to eliminate potential contamination is awesome. Checkout this link for information livebench.ai/#

With every LLM scoring very high performance on MMLU, HumanEval & other benchmarks. The idea of Livebench, where some questions are constantly added in test set to eliminate potential contamination is awesome. Checkout this link for information livebench.ai/#
Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

One simple trick to enforce LLM to provide output in required format is to add the starting token of format in your prompt. For example in the given image, I added "{" in the prompt. This process increases the probability of getting the output in required format to great extent

One simple trick to enforce LLM to provide output in required format is to add the starting token of format in your prompt. For example in the given image, I added "{" in the prompt. This process increases the probability of getting the output in required format to great extent
Harsh Maheshwari (@harshmheshwari) 's Twitter Profile Photo

Recently, I wrote about Extending context length. Give it a read if you want to dive deeper into this topic along with its code implementation. I have described about Position Interpolation, NTK aware & Dynamic NTK method in this blog. shorturl.at/KRepb