Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile
Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

PhD at 19 |
Founder and CEO at @MedARC_AI |
Research Director at @StabilityAI |
@kaggle Notebooks GM |
Biomed. engineer @ 14 |
TEDx talk➡bit.ly/3tpAuan

ID: 441465751

linkhttps://tanishq.ai calendar_today20-12-2011 03:45:50

14,14K Tweet

60,60K Followers

1,1K Following

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

ContextCite: Attributing Model Generation to Context abs: arxiv.org/abs/2409.00729 "we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and

ContextCite: Attributing Model Generation to Context

abs: arxiv.org/abs/2409.00729

"we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

One of my favorite YouTubers, Grant Sanderson, recently posted the latest video in his LLM series. This one demonstrates MLPs in Transformers by showing how they might store facts. Definitely give it a watch, along with the other videos in the series!

One of my favorite YouTubers, <a href="/3blue1brown/">Grant Sanderson</a>, recently posted the latest video in his LLM series. 

This one demonstrates MLPs in Transformers by showing how they might store facts.

Definitely give it a watch, along with the other videos in the series!
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

STARTING IN 10 MIN!!! Papers we will cover: Building and better understanding vision-language models: insights and future directions - presented by Leo Tronchon OLMoE: Open Mixture-of-Experts Language Models - presented by Niklas Muennighoff Diffusion Models Are Real-Time Game

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

People ask me how I filter through all the papers on arXiv. It's kinda hard to explain but basically what I do is I skim through the list of papers (100 papers/page), and look for one of two things: 1. Title catches my eye - these days I like papers on diffusion, LLMs, RLHF,

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark abs: arxiv.org/abs/2409.02813 Introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. Different from MMMU by: 1. filtering out

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

abs: arxiv.org/abs/2409.02813

Introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. 

Different from MMMU by:
1. filtering out
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Unforgettable Generalization in Language Models abs: arxiv.org/abs/2409.02228 "forgetting" a task by training on task with random labels for some tasks model forgetfulness generalizes outside of training set, in other cases it doesn't additionally, linear probes trained on

Unforgettable Generalization in Language Models

abs: arxiv.org/abs/2409.02228

"forgetting" a task by training on task with random labels

for some tasks model forgetfulness generalizes outside of training set, in other cases it doesn't

additionally, linear probes trained on
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining abs: arxiv.org/abs/2409.02326 New 1.3B model that is SOTA among small language models for code. Arctic-SnowCoder-1.3B is trained in 3 phases: 1. general pretraining on 500B tokens of raw code data 2. continued

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

abs: arxiv.org/abs/2409.02326

New 1.3B model that is SOTA among small language models for code.

Arctic-SnowCoder-1.3B is trained in 3 phases:
1. general pretraining on 500B tokens of raw code data
2. continued
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

People are wondering why the famous Bollywood actor Anil Kapoor is on the TIME's AI 100 list but that's cuz he's secretly been AK this entire time!

People are wondering why the famous Bollywood actor Anil Kapoor is on the TIME's AI 100 list

but that's cuz he's secretly been AK this entire time!
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

The AdEMAMix Optimizer: Better, Faster, Older abs: arxiv.org/abs/2409.03137 A novel Adam based optimizer from Apple which leverages very old gradients to reach better solutions. Tested on Transformer LM, Mamba LM, and ViT training. A 1.3B parameter AdEMAMix Transformer LM

The AdEMAMix Optimizer: Better, Faster, Older

abs: arxiv.org/abs/2409.03137

A novel Adam based optimizer from Apple which leverages very old gradients to reach better solutions.

Tested on Transformer LM, Mamba LM, and ViT training. 

A 1.3B parameter AdEMAMix Transformer LM
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

xLAM: A Family of Large Action Models to Empower AI Agent Systems abs: arxiv.org/abs/2409.03215 models: huggingface.co/collections/Sa… A series of large action models from Salesforce designed for AI agent tasks. Includes five models with both dense and mixture-of-expert architectures,

xLAM: A Family of Large Action Models to Empower AI Agent Systems

abs: arxiv.org/abs/2409.03215
models: huggingface.co/collections/Sa…

A series of large action models from Salesforce designed for AI agent tasks. Includes five models with both dense and mixture-of-expert architectures,