Shashank Rajput (@shashank_r12) Twitter Tweets • TwiCopy

Prithviraj (Raj) Ammanabrolu

10 months ago

I'm recruiting one (1) PhD student this year focused on multimodal embodied agents. All things vision VLMs + RL!! Please apply to the UCSD CSE PhD app by Dec 15

thumb_up_off_alt91

chat_bubble_outline4

repeat38

shareShare

🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more! Check our work at uw-madison-lee-lab.github.io/LanguageInterf…

thumb_up_off_alt101

chat_bubble_outline1

repeat15

shareShare

NVIDIA AI Developer

@nvidiaaidev

9 months ago

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP

thumb_up_off_alt175

chat_bubble_outline8

repeat45

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

9 months ago

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at

thumb_up_off_alt3,3K

chat_bubble_outline178

repeat481

shareShare

Shashank Rajput

@shashank_r12

9 months ago

I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! databricks.com/blog/databrick…

thumb_up_off_alt17

chat_bubble_outline1

repeat0

shareShare

Rajko Radovanović

@rajko_rad

9 months ago

At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3

thumb_up_off_alt40

chat_bubble_outline0

repeat6

shareShare

jack morris

@jxmnop

8 months ago

i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting

thumb_up_off_alt817

chat_bubble_outline21

repeat95

shareShare

Hongyi Wang

@hongyiwang10

8 months ago

I have three Ph.D. student openings in my research group at Rutgers Computer Science Department starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: grad.rutgers.edu/academics/prog… The deadline is

thumb_up_off_alt410

chat_bubble_outline21

repeat117

shareShare

PK

@herengoneagn

8 months ago

🧵 Super proud to finally share this work I led last quarter - the @Databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3

thumb_up_off_alt33

chat_bubble_outline3

repeat10

shareShare

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

8 months ago

Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, IIT Kharagpur🇮🇳!!

Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, <a href="/IITKgp/">IIT Kharagpur🇮🇳</a>!!

thumb_up_off_alt166

chat_bubble_outline8

repeat3

shareShare

Databricks

@databricks

8 months ago

Databricks research scientist Shashank Rajput s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures

thumb_up_off_alt21

chat_bubble_outline1

repeat8

shareShare

Databricks

@databricks

8 months ago

Watch the full conversation: youtu.be/2tlWPgmiX2s?si…

thumb_up_off_alt10

chat_bubble_outline0

repeat7

shareShare

Kangwook Lee

@kangwook_lee

8 months ago

It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌

thumb_up_off_alt86

chat_bubble_outline5

repeat6

shareShare

Mahesh Sathiamoorthy

@madiator

7 months ago

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer

thumb_up_off_alt93

chat_bubble_outline2

repeat14

shareShare

Mahesh Sathiamoorthy

@madiator

7 months ago

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So

thumb_up_off_alt969

chat_bubble_outline27

repeat153

shareShare

Mahesh Sathiamoorthy

@madiator

7 months ago

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while

thumb_up_off_alt780

chat_bubble_outline35

repeat138

shareShare

Mahesh Sathiamoorthy

@madiator

7 months ago

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat292

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

Shashank Rajput

Prithviraj (Raj) Ammanabrolu

Yuchen Zeng

NVIDIA AI Developer

Ahmad Al-Dahle

Shashank Rajput

Rajko Radovanović

jack morris

Hongyi Wang

PK

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Databricks

Databricks

Kangwook Lee

Mahesh Sathiamoorthy

Mahesh Sathiamoorthy

Mahesh Sathiamoorthy

Mahesh Sathiamoorthy

DeepSeek