Daily AI Papers (@papers_daily) 's Twitter Profile
Daily AI Papers

@papers_daily

ID: 1397114203601657857

linkhttps://labml.ai calendar_today25-05-2021 08:56:03

6,6K Tweet

17,17K Followers

2 Following

labml.ai (@labmlai) 's Twitter Profile Photo

We wrote up some of the best practices we feel are useful for ML projects github.com/labmlai/labml/… Here's a summary πŸ§΅πŸ‘‡

hehehehe (@luck_not_shit) 's Twitter Profile Photo

Encoding floating point arrays with Base64 gives a 4x compression over JSON πŸš€. Quite useful when you have to transfer larger arrays. Encoded float arrays can be included as a string in the JSON objects keeping existing structures intact. Doesn’t require any external libraries.

Encoding floating point arrays with Base64 gives a 4x compression over JSON πŸš€. Quite useful when you have to transfer larger arrays. Encoded float arrays can be included as a string in the JSON objects keeping existing structures intact. Doesn’t require any external libraries.
vpj (@vpj) 's Twitter Profile Photo

.labml.ai deep learning experiment monitoring app got significantly more responsive after hehehehe implemented base64 encoding for long float arrays instead of plain JSON. It uses 4X less data transfer.

labml.ai (@labmlai) 's Twitter Profile Photo

We’ve open-sourced our LLM attention visualization library. It generates interactive visualizations of attention matrices with just a few lines of Python code in notebooks. hehehehe cleaned up and polished the existing code to make it open source.

labml.ai (@labmlai) 's Twitter Profile Photo

✨ Annotated DL Paper Implementation repository reached 50K stars. It has implementations of a wide range of deep learning concepts including Transformers and variations, StyleGAN, Stable Diffusion, Normalization layers, RL, Optimizers... πŸ§ΆπŸ‘‡

✨ Annotated DL Paper Implementation repository reached 50K stars.

It has implementations of a wide range of deep learning concepts including Transformers and variations, StyleGAN, Stable Diffusion, Normalization layers, RL, Optimizers...

πŸ§ΆπŸ‘‡
labml.ai (@labmlai) 's Twitter Profile Photo

The machine generated Chinese translation of annotated paper implementations repo is being improved with manual translations by @pengchzn πŸ™ @pengchzn has already finished the basic transformer including multi-head attention. πŸ‘‡

The machine generated Chinese translation of annotated paper implementations repo is being improved with manual translations by @pengchzn πŸ™

@pengchzn has already finished the basic transformer including multi-head attention.

πŸ‘‡
labml.ai (@labmlai) 's Twitter Profile Photo

πŸŽ‰ Excited share that we add a distribution visualization to our library, Inspectus. It plots the full distribution of data across training steps. This helps better understand how the training is going and instantly see the impact of outliers. πŸ‘‡

vpj (@vpj) 's Twitter Profile Photo

I first found plotting the distribution useful when I was trying RL algorithms on Atari around 2018/19. I used Tensorboard back then. It was quite useful to look at the score distribution of the rollouts. It showed how the policy was behaving clearer than looking at the mean

labml.ai (@labmlai) 's Twitter Profile Photo

We should be able to release an update of labml experiment monitoring library very soon πŸ˜‚ It has a bunch of cool new features

NOTBAD AI (@notbadai) 's Twitter Profile Photo

We’ve been training NVIDIA Mistral-NeMo-Minitron-8B-Base for math reasoning on the GSM8K-Aug dataset, and we have a version with a 70.2% gsm8k score, up from a 58.5% cot score (reported in the paper LLM Pruning and distillation). πŸ‘‡

labml.ai (@labmlai) 's Twitter Profile Photo

Annotated PyTorch implementation of of LoRA (Low Lank Adaptation of LLMs) πŸ“ Code + Notes: nn.labml.ai/lora/index.html πŸ“Ž Paper: arxiv.org/abs/2106.09685 LoRA freezes the pre-trained model and trains smaller injected weights, enabling faster and memory efficient fine-tuning. πŸ‘‡

Annotated <a href="/PyTorch/">PyTorch</a> implementation of of LoRA (Low Lank Adaptation of LLMs)

πŸ“ Code + Notes: nn.labml.ai/lora/index.html
πŸ“Ž Paper: arxiv.org/abs/2106.09685

LoRA freezes the pre-trained model and trains smaller injected weights, enabling faster and memory efficient fine-tuning.

πŸ‘‡
labml.ai (@labmlai) 's Twitter Profile Photo

We added token visualization to Inspectus. It lets you visualize metrics associated with tokens such as loss, entropy, KL div, etc. It works on notebooks and pretty easy to use. πŸ‘‡

NOTBAD AI (@notbadai) 's Twitter Profile Photo

πŸ“’ We are excited to announce Notbad v1.0 Mistral 24B, a new reasoning model trained in math and Python coding. This model is built upon the Mistral AI Small 24B 2501 and has been further trained with reinforcement learning on math and coding.

πŸ“’ We are excited to announce Notbad v1.0 Mistral 24B, a new reasoning model trained in math and Python coding. This model is built upon the <a href="/MistralAI/">Mistral AI</a> Small 24B 2501 and has been further trained with reinforcement learning on math and coding.
NOTBAD AI (@notbadai) 's Twitter Profile Photo

We're open-sourcing a math reasoning dataset with 270k samples, generated by our RL-based self-improved Mistral 24B 2501 model and used to train Notbad v1.0 Mistral 24B. Available on Hugging Face: huggingface.co/datasets/notba…

vpj (@vpj) 's Twitter Profile Photo

Uploaded the dataset of 270k math reasoning samples that we used to finetune Notbad v1.0 Mistral 24B (MATH-500=77.52% GSM8k Platinum=97.55%) to Hugging Face (link in reply) Follow NOTBAD AI for updates

NOTBAD AI (@notbadai) 's Twitter Profile Photo

We are releasing an updated reasoning model with improvements on IFEval scores (77.9%) than our previous model (only 51.4%). πŸ‘‡ Links to try the model and to download weights below

We are releasing an updated reasoning model with improvements on IFEval scores (77.9%) than our previous model (only 51.4%).

πŸ‘‡ Links to try the model and to download weights below
vpj (@vpj) 's Twitter Profile Photo

The new training also improved GPQA from 64.2% to 67.3% and MMLU Pro from 64.2% to 67.3%. This model was also trained with the same reasoning datasets we used to train the v1.0 model. We mixed more general instruction data with answers sampled from the