Enrico Shippole (@enricoshippole) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I really hate some people of ML research community that really love attaching new name to existing research and calling it theirs "Let me add one line of modification and call it something brand new, so next time they will cite mine instead of the original work" + Extra

thumb_up_off_alt650

chat_bubble_outline25

repeat39

shareShare

Enrico Shippole

@enricoshippole

7 months ago

Every. Single. Time.

thumb_up_off_alt49

chat_bubble_outline1

repeat4

shareShare

Shayne Longpre

@shayneredford

7 months ago

Thrilled our global data ecosystem audit was accepted to #ICLR2025! Empirically, we find: 1⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024). 2⃣ YouTube is now 70%+ of speech/video data but could block third-party collection. 3⃣ <0.2% of data from

thumb_up_off_alt75

chat_bubble_outline4

repeat23

shareShare

Enrico Shippole

@enricoshippole

7 months ago

x.com/ficlive/status…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

tomaarsen

@tomaarsen

7 months ago

I just released Sentence Transformers v4.1; featuring ONNX and OpenVINO backends for rerankers offering 2-3x speedups and improved hard negatives mining which helps prepare stronger training datasets. Details in 🧵

thumb_up_off_alt139

chat_bubble_outline5

repeat17

shareShare

Enrico Shippole

@enricoshippole

7 months ago

Looks like I am not going anywhere for a long time. One of the reasons LLMs rarely ever produce usable outputs without significant guidance for my line of work.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Enrico Shippole

@enricoshippole

7 months ago

OpenAI looking at LangChain users.

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Enrico Shippole

@enricoshippole

7 months ago

Tested this out, and there are noticeable issues above ~20 seconds of generated audio. Input for generation needs to be chunked for it to work properly. Voice cloning also likely needs fine-tuning to work out-of-distribution. Additionally, support for streaming for prod uses.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Enrico Shippole

@enricoshippole

7 months ago

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Alexander Doria

@dorialexander

7 months ago

Breaking: pleias releases a new generation of small reasoning models for RAG and source synthesis. Pleias-RAG-350M and Pleias-RAG-1B come with built-in support for source citation, SOTA performance and an accuracy comparable to models ten times their size.

Breaking: <a href="/pleiasfr/">pleias</a> releases a new generation of small reasoning models for RAG and source synthesis. Pleias-RAG-350M and Pleias-RAG-1B come with built-in support for source citation, SOTA performance and an accuracy comparable to models ten times their size.

thumb_up_off_alt542

chat_bubble_outline26

repeat104

shareShare

Vik Paruchuri

@vikparuchuri

7 months ago

We shipped an alpha version of the new Surya OCR model. No hype, just facts: - 90+ languages (focus on en, romance langs, zh, ar, ja, ko) - LaTeX and formatting - Char/word/line bboxes - ~500M non-embed params - 10-20 pages/s

thumb_up_off_alt810

chat_bubble_outline14

repeat113

shareShare

Simo Ryu

@cloneofsimo

6 months ago

10B parameter DiT trained on 80M images, all owned by Freepik . Model commercially usable, raw model without distillation, open sourced. Proud to demonstrate first model-training project with our client Freepik: "F-Lite", from fal

10B parameter DiT trained on 80M images, all owned by <a href="/freepik/">Freepik</a> . Model commercially usable, raw model without distillation, open sourced.

Proud to demonstrate first model-training project with our client <a href="/freepik/">Freepik</a>: "F-Lite", from <a href="/FAL/">fal</a>

thumb_up_off_alt453

chat_bubble_outline19

repeat64

shareShare