Abhinav Prasad Yasaswi (@abhinavpy) Twitter Tweets • TwiCopy

Harris Chan

10 months ago

Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat242

shareShare

Aurimas Griciūnas

@aurimas_gr

10 months ago

ML/LLMOps fundamentals - 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (𝗖𝗧) and what steps are needed to achieve it. CT is the process of automated ML Model retraining in Production Environments on a specific trigger. Let’s look into some prerequisites for this: 1) Automation of ML

thumb_up_off_alt392

chat_bubble_outline7

repeat103

shareShare

Allen T.

@mr_allent

10 months ago

This is creepy.. It’s an AI tool called GeoSpy that can geolocate photos based on features in the image

thumb_up_off_alt13,13K

chat_bubble_outline283

repeat1,1K

shareShare

AshutoshShrivastava

@ai_for_success

10 months ago

AI agents are so cool! ByteDance just introduced UI-TARS: an end-to-end GUI agent model based on VLM architecture. It processes screenshots as input and performs human-like interactions. Here are 3 examples of complex tasks it can handle without any manual intervention 👇

thumb_up_off_alt292

chat_bubble_outline12

repeat35

shareShare

Deedy

@deedydas

10 months ago

China just dropped a new model. ByteDance Doubao-1.5-pro matches GPT 4o benchmarks at 50x cheaper — $0.022/M cached input tokens, $0.11/M input, $0.275/M output — 5x cheaper than DeepSeek, >200x of o1 — 32k + 256k context — sparse MoE architecture AI truly too cheap to meter.

thumb_up_off_alt7,7K

chat_bubble_outline225

repeat1,1K

shareShare

Deedy

@deedydas

10 months ago

The China is crushing US rhetoric totally forgets about Gemini 2.0 Flash Thinking. Likely cheaper, longer context and better on reasoning. We're still early in the AI race.

thumb_up_off_alt5,5K

chat_bubble_outline385

repeat519

shareShare

Aran Komatsuzaki

@arankomatsuzaki

10 months ago

Microsoft presents: Chain-of-Retrieval Augmented Generation - Observes more than 10 points improvement in EM score compared to strong baseline - Establishes a new SotA performance across a diverse range of knowledge-intensive tasks

thumb_up_off_alt162

chat_bubble_outline4

repeat32

shareShare

Harrison Kinsley

@sentdex

10 months ago

wtf is this another new Deepseek model? an any-to-any 7B MIT license model what is goin on lmao

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat72

shareShare

vittorio

@iterintellectus

10 months ago

while DeepSeek R1 is down, they just released a new model, Janus-Pro, for image generation and visual understanding

thumb_up_off_alt17,17K

chat_bubble_outline296

repeat1,1K

shareShare

Deedy

@deedydas

10 months ago

DeepSeek just dropped open-source Janus Pro 7B for image understanding and generation! — SOTA 0.8 on GenEval and 84.19 on DPG-Bench, beats DallE3 and SD3-Medium — 72M synthetic images in pretraining — good text rendering Images are small (384x384) but still a huge release.

thumb_up_off_alt797

chat_bubble_outline33

repeat111

shareShare

Justine Moore

@venturetwins

10 months ago

DeepSeek's censorship is no match for the jailbreakers of Reddit (from u/JimRice18)

thumb_up_off_alt34,34K

chat_bubble_outline723

repeat3,3K

shareShare

AK

@_akhaliq

10 months ago

deepseek first one to cross chatgpt?

thumb_up_off_alt156

chat_bubble_outline10

repeat11

shareShare

Abhinav Prasad Yasaswi

@abhinavpy

10 months ago

Didn't know there was a paper for test time scaling of RAG models. Interesting.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AK

@_akhaliq

10 months ago

This is HUGE, Hugging Face just shipped Inference Providers on the Hub Partnering with Together AI, fal, Replicate and SambaNova Systems! Starting today you can access thousands of Models like DeepSeek R1, Llama, Flux, Whisper, and more Directly from Hugging Face!

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat196

shareShare

Lewis Tunstall

@_lewtun

10 months ago

Two new reasoning datasets just landed on the Hub: 1. OpenThoughts: 114k samples distilled from R1 on math, code, and science huggingface.co/datasets/open-… 2. R1-Distill-SFT: 1.7M samples (!) distilled from R1-32B on NuminaMath and Ai2's Tulu data huggingface.co/datasets/Servi…

thumb_up_off_alt467

chat_bubble_outline8

repeat75

shareShare

Omar Sanseviero

@osanseviero

10 months ago

Everyone: DeepSeek just appeared out of nowhere! 😱 Me: - DeepSeek Coder in 2023 - MoE in Feb - Math in Feb - VL in March - V2 in May - Coder V2 in June - Prover in August - V2.5 in September - VL 2 in December - V3 in December They've consistently shipped for 1+ years 😁

thumb_up_off_alt3,3K

chat_bubble_outline58

repeat337

shareShare