Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) Twitter Tweets • TwiCopy

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

+ Follow

ID: 441465751

linkhttps://tanishq.ai calendar_today20-12-2011 03:45:50

14,14K Tweet

60,60K Followers

1,1K Following

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

17 days ago

ContextCite: Attributing Model Generation to Context abs: arxiv.org/abs/2409.00729 "we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

Join me and Aran Komatsuzaki today for our AI papers of the week space! x.com/i/spaces/1OyJA…

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

One of my favorite YouTubers, Grant Sanderson, recently posted the latest video in his LLM series. This one demonstrates MLPs in Transformers by showing how they might store facts. Definitely give it a watch, along with the other videos in the series!

One of my favorite YouTubers, <a href="/3blue1brown/">Grant Sanderson</a>, recently posted the latest video in his LLM series.

This one demonstrates MLPs in Transformers by showing how they might store facts.

Definitely give it a watch, along with the other videos in the series!

thumb_up_off_alt575

chat_bubble_outline11

repeat59

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

STARTING IN 10 MIN!!! Papers we will cover: Building and better understanding vision-language models: insights and future directions - presented by Leo Tronchon OLMoE: Open Mixture-of-Experts Language Models - presented by Niklas Muennighoff Diffusion Models Are Real-Time Game

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

x.com/i/spaces/1OyJA…

thumb_up_off_alt6

chat_bubble_outline8

repeat2

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

People ask me how I filter through all the papers on arXiv. It's kinda hard to explain but basically what I do is I skim through the list of papers (100 papers/page), and look for one of two things: 1. Title catches my eye - these days I like papers on diffusion, LLMs, RLHF,

thumb_up_off_alt66

chat_bubble_outline1

repeat7

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

researcher salaries 🤣

thumb_up_off_alt70

chat_bubble_outline1

repeat3

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark abs: arxiv.org/abs/2409.02813 Introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. Different from MMMU by: 1. filtering out

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

Unforgettable Generalization in Language Models abs: arxiv.org/abs/2409.02228 "forgetting" a task by training on task with random labels for some tasks model forgetfulness generalizes outside of training set, in other cases it doesn't additionally, linear probes trained on

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining abs: arxiv.org/abs/2409.02326 New 1.3B model that is SOTA among small language models for code. Arctic-SnowCoder-1.3B is trained in 3 phases: 1. general pretraining on 500B tokens of raw code data 2. continued

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

Ilya:

thumb_up_off_alt926

chat_bubble_outline20

repeat25

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

15 days ago

Discord >> Twitter Few realize...

thumb_up_off_alt112

chat_bubble_outline24

repeat1

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

15 days ago

People are wondering why the famous Bollywood actor Anil Kapoor is on the TIME's AI 100 list but that's cuz he's secretly been AK this entire time!

thumb_up_off_alt129

chat_bubble_outline17

repeat4

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

15 days ago

The AdEMAMix Optimizer: Better, Faster, Older abs: arxiv.org/abs/2409.03137 A novel Adam based optimizer from Apple which leverages very old gradients to reach better solutions. Tested on Transformer LM, Mamba LM, and ViT training. A 1.3B parameter AdEMAMix Transformer LM

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

15 days ago

xLAM: A Family of Large Action Models to Empower AI Agent Systems abs: arxiv.org/abs/2409.03215 models: huggingface.co/collections/Sa… A series of large action models from Salesforce designed for AI agent tasks. Includes five models with both dense and mixture-of-expert architectures,