Abhinav Prasad Yasaswi (@abhinavpy) 's Twitter Profile
Abhinav Prasad Yasaswi

@abhinavpy

Grad Student at @SCAI_ASU trying to keep up with the developments in AI.

ID: 1359920773

linkhttp://abhinavpy-asu.github.io calendar_today17-04-2013 17:10:20

34 Tweet

76 Takipçi

2,2K Takip Edilen

Harris Chan (@sirrahchan) 's Twitter Profile Photo

Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL

Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. 

Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL
Aurimas Griciūnas (@aurimas_gr) 's Twitter Profile Photo

ML/LLMOps fundamentals - 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (𝗖𝗧) and what steps are needed to achieve it. CT is the process of automated ML Model retraining in Production Environments on a specific trigger. Let’s look into some prerequisites for this: 1) Automation of ML

AshutoshShrivastava (@ai_for_success) 's Twitter Profile Photo

AI agents are so cool! ByteDance just introduced UI-TARS: an end-to-end GUI agent model based on VLM architecture. It processes screenshots as input and performs human-like interactions. Here are 3 examples of complex tasks it can handle without any manual intervention 👇

Deedy (@deedydas) 's Twitter Profile Photo

China just dropped a new model. ByteDance Doubao-1.5-pro matches GPT 4o benchmarks at 50x cheaper — $0.022/M cached input tokens, $0.11/M input, $0.275/M output — 5x cheaper than DeepSeek, >200x of o1 — 32k + 256k context — sparse MoE architecture AI truly too cheap to meter.

China just dropped a new model.
ByteDance Doubao-1.5-pro matches GPT 4o benchmarks at 50x cheaper

— $0.022/M cached input tokens, $0.11/M input, $0.275/M output
— 5x cheaper than DeepSeek, >200x of o1
— 32k + 256k context
— sparse MoE architecture

AI truly too cheap to meter.
Deedy (@deedydas) 's Twitter Profile Photo

The China is crushing US rhetoric totally forgets about Gemini 2.0 Flash Thinking. Likely cheaper, longer context and better on reasoning. We're still early in the AI race.

The China is crushing US rhetoric totally forgets about Gemini 2.0 Flash Thinking.

Likely cheaper, longer context and better on reasoning.

We're still early in the AI race.
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Microsoft presents: Chain-of-Retrieval Augmented Generation - Observes more than 10 points improvement in EM score compared to strong baseline - Establishes a new SotA performance across a diverse range of knowledge-intensive tasks

Microsoft presents:

Chain-of-Retrieval Augmented Generation

- Observes more than 10 points improvement in EM score compared to strong baseline
- Establishes a new SotA performance across a diverse range of knowledge-intensive tasks
Deedy (@deedydas) 's Twitter Profile Photo

DeepSeek just dropped open-source Janus Pro 7B for image understanding and generation! — SOTA 0.8 on GenEval and 84.19 on DPG-Bench, beats DallE3 and SD3-Medium — 72M synthetic images in pretraining — good text rendering Images are small (384x384) but still a huge release.

DeepSeek just dropped open-source Janus Pro 7B for image understanding and generation!

— SOTA 0.8 on GenEval and 84.19 on DPG-Bench, beats DallE3 and SD3-Medium
— 72M synthetic images in pretraining
— good text rendering

Images are small (384x384) but still a huge release.
AK (@_akhaliq) 's Twitter Profile Photo

This is HUGE, Hugging Face just shipped Inference Providers on the Hub Partnering with Together AI, fal, Replicate and SambaNova Systems! Starting today you can access thousands of Models like DeepSeek R1, Llama, Flux, Whisper, and more Directly from Hugging Face!

Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

Two new reasoning datasets just landed on the Hub: 1. OpenThoughts: 114k samples distilled from R1 on math, code, and science huggingface.co/datasets/open-… 2. R1-Distill-SFT: 1.7M samples (!) distilled from R1-32B on NuminaMath and Ai2's Tulu data huggingface.co/datasets/Servi…

Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Everyone: DeepSeek just appeared out of nowhere! 😱 Me: - DeepSeek Coder in 2023 - MoE in Feb - Math in Feb - VL in March - V2 in May - Coder V2 in June - Prover in August - V2.5 in September - VL 2 in December - V3 in December They've consistently shipped for 1+ years 😁