Mr. Jack Tung (@mrjacktung) Twitter Tweets • TwiCopy

Joe Burnett

@joe_a_burnett

6 months ago

Aadit Sheth How about posting the source for these posts. arxiv.org/abs/2408.13296

thumb_up_off_alt176

chat_bubble_outline5

repeat21

shareShare

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

thumb_up_off_alt420

chat_bubble_outline11

repeat57

shareShare

Shubham Saboo

@saboo_shubham_

6 months ago

This Multimodal AI Agent can literally create poster from any research paper PDF. 100% Opensource.

thumb_up_off_alt307

chat_bubble_outline16

repeat42

shareShare

Marktechpost AI Research News ⚡

@marktechpost

6 months ago

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference Researchers from Microsoft, Renmin University of China, New York University, and the South China University of Technology proposed a new method

thumb_up_off_alt23

chat_bubble_outline0

repeat9

shareShare

Umesh

@umesh_ai

6 months ago

Flux Kontext is absolutely brilliant on restoring old photographs. Some tips ⤵️

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat155

shareShare

Massimo

@rainmaker1973

6 months ago

Kurt Gödel, who was one of Albert Einstein's best friends in his later years, found a solution to general theory of relativity that modelled a strange, unusual and rotating universe allowing for backward time travel.

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat299

shareShare

elvis

@omarsar0

6 months ago

Agent Zero A personal agentic framework that dynamically grows and learns with you. - It uses the OS as a tool. - Has search and terminal execution too. - It has persistent memory to memorize key information to solve future tasks more reliably. - Multi-agent support.

thumb_up_off_alt361

chat_bubble_outline17

repeat57

shareShare

thermo

@dionysianagent

6 months ago

here i'm running 12 - 16 claude 4 opus/sonnet code agents through claude code bridged to eigencode for coordination, additional memory layers & additional tools they're running 95% autonomously with minimal input for several hours straight

thumb_up_off_alt9,9K

chat_bubble_outline766

repeat627

shareShare

thermo

@dionysianagent

6 months ago

this is from this morning where i had 20 - 24 running at the same time where i managed to record about 30 mins until i started getting mass banned across my accounts at the end

thumb_up_off_alt524

chat_bubble_outline48

repeat34

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

6 months ago

How much do language models memorize? "We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we

thumb_up_off_alt852

chat_bubble_outline7

repeat135

shareShare

Vaibhav (VB) Srivastav

@reach_vb

6 months ago

Google released an app that allows you to run LLMs from Hugging Face, fully privately and 100% local 🔥 > Generate code on-the-fly > Chat with images > Supports multi-turn conversations > Choose any model from Hugging Face > Based on LiteRT 🔥 > Sign in with HF Support for iOS

thumb_up_off_alt376

chat_bubble_outline19

repeat59

shareShare

thermo

@dionysianagent

6 months ago

how i set up 12+ claude 4 code agents to work in parallel mostly autonomously for several hours without conflicts or slop code

thumb_up_off_alt1,1K

chat_bubble_outline80

repeat159

shareShare

Tony Wu

@tonywu_71

6 months ago

🚀 ColQwen2 just dropped in Transformers! 🤗 Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

thumb_up_off_alt577

chat_bubble_outline7

repeat95

shareShare

elvis

@omarsar0

6 months ago

Reasoning Models Thinking Slow and Fast at Test Time Another super cool work on improving reasoning efficiency in LLMs. They show that slow-then-fast reasoning outperforms other strategies. Here are my notes:

thumb_up_off_alt258

chat_bubble_outline9

repeat56

shareShare

Vaibhav (VB) Srivastav

@reach_vb

6 months ago

BOOOOM! PlayAI just open sourced PlayDiffusion - Audio Speech Editing model on Hugging Face - Apache 2.0 licensed! 🔥 > Preserves context at edit boundaries > Dynamic, fine-grained editing without regenerating entire audio > Maintains prosody & speaker consistency Some notes on

thumb_up_off_alt568

chat_bubble_outline10

repeat95

shareShare

Gabriele Trivigno

@gabtrivv

6 months ago

🚀 As #CVPR2025 week kicks off, meet SANSA: Semantically AligNed Segment Anything 2 We turn SAM2 into a semantic few-shot segmenter: 🧠 Unlocks latent semantics in frozen SAM2 ✏️ Supports any prompt: fast and scalable annotation 📦 No extra encoders 📎 github.com/ClaudiaCuttano…

thumb_up_off_alt168

chat_bubble_outline1

repeat31

shareShare

Philipp Schmid

@_philschmid

6 months ago

Fuck Yes! Serverless GPU for everyone! The Cloud Run just shipped Serverless GPU with no quota request required. 🤯 Deploy Google DeepMind Gemma with a single command! - Pay-per-second GPU billing. - Scale to zero instances. - TTFT of 19 seconds for a Gemma3 as cold start. - No

Fuck Yes! Serverless GPU for everyone! The Cloud Run just shipped Serverless GPU with no quota request required. 🤯 Deploy <a href="/GoogleDeepMind/">Google DeepMind</a> Gemma with a single command!

- Pay-per-second GPU billing.
- Scale to zero instances.
- TTFT of 19 seconds for a Gemma3 as cold start.
- No

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat187

shareShare

NVIDIA AI Developer

@nvidiaaidev

6 months ago

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details

thumb_up_off_alt210

chat_bubble_outline3

repeat41

shareShare

AK

@_akhaliq

6 months ago

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

thumb_up_off_alt114

chat_bubble_outline1

repeat23

shareShare

Mr. Jack Tung

Joe Burnett

Gabriel Sarch

Shubham Saboo

Marktechpost AI Research News ⚡

Umesh

Massimo

elvis

thermo

thermo

Tanishq Mathew Abraham, Ph.D.

Vaibhav (VB) Srivastav

thermo

Tony Wu

elvis

Vaibhav (VB) Srivastav

Gabriele Trivigno

Philipp Schmid

NVIDIA AI Developer

AK