Mr. Jack Tung (@mrjacktung) 's Twitter Profile
Mr. Jack Tung

@mrjacktung

ID: 1659483902329888770

calendar_today19-05-2023 09:00:32

3,3K Tweet

244 Followers

4,4K Following

Gabriel Sarch (@gabrielsarch) 's Twitter Profile Photo

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

Marktechpost AI Research News ⚡ (@marktechpost) 's Twitter Profile Photo

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference Researchers from Microsoft, Renmin University of China, New York University, and the South China University of Technology proposed a new method

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference

Researchers from Microsoft, Renmin University of China, New York University, and the South China University of Technology proposed a new method
Massimo (@rainmaker1973) 's Twitter Profile Photo

Kurt Gödel, who was one of Albert Einstein's best friends in his later years, found a solution to general theory of relativity that modelled a strange, unusual and rotating universe allowing for backward time travel.

Kurt Gödel, who was one of Albert Einstein's best friends in his later years, found a solution to general theory of relativity that modelled a strange, unusual and rotating universe allowing for backward time travel.
elvis (@omarsar0) 's Twitter Profile Photo

Agent Zero A personal agentic framework that dynamically grows and learns with you. - It uses the OS as a tool. - Has search and terminal execution too. - It has persistent memory to memorize key information to solve future tasks more reliably. - Multi-agent support.

Agent Zero

A personal agentic framework that dynamically grows and learns with you. 

- It uses the OS as a tool.

- Has search and terminal execution too. 

- It has persistent memory to memorize key information to solve future tasks more reliably.

- Multi-agent support.
thermo (@dionysianagent) 's Twitter Profile Photo

here i'm running 12 - 16 claude 4 opus/sonnet code agents through claude code bridged to eigencode for coordination, additional memory layers & additional tools they're running 95% autonomously with minimal input for several hours straight

thermo (@dionysianagent) 's Twitter Profile Photo

this is from this morning where i had 20 - 24 running at the same time where i managed to record about 30 mins until i started getting mass banned across my accounts at the end

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

How much do language models memorize? "We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we

How much do language models memorize?

"We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

Google released an app that allows you to run LLMs from Hugging Face, fully privately and 100% local 🔥 > Generate code on-the-fly > Chat with images > Supports multi-turn conversations > Choose any model from Hugging Face > Based on LiteRT 🔥 > Sign in with HF Support for iOS

thermo (@dionysianagent) 's Twitter Profile Photo

how i set up 12+ claude 4 code agents to work in parallel mostly autonomously for several hours without conflicts or slop code

Tony Wu (@tonywu_71) 's Twitter Profile Photo

🚀 ColQwen2 just dropped in Transformers! 🤗 Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

🚀 ColQwen2 just dropped in Transformers! 🤗

Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows.

Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)
elvis (@omarsar0) 's Twitter Profile Photo

Reasoning Models Thinking Slow and Fast at Test Time Another super cool work on improving reasoning efficiency in LLMs. They show that slow-then-fast reasoning outperforms other strategies. Here are my notes:

Reasoning Models Thinking Slow and Fast at Test Time

Another super cool work on improving reasoning efficiency in LLMs.

They show that slow-then-fast reasoning outperforms other strategies.

Here are my notes:
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

BOOOOM! PlayAI just open sourced PlayDiffusion - Audio Speech Editing model on Hugging Face - Apache 2.0 licensed! 🔥 > Preserves context at edit boundaries > Dynamic, fine-grained editing without regenerating entire audio > Maintains prosody & speaker consistency Some notes on

Gabriele Trivigno (@gabtrivv) 's Twitter Profile Photo

🚀 As #CVPR2025 week kicks off, meet SANSA: Semantically AligNed Segment Anything 2 We turn SAM2 into a semantic few-shot segmenter: 🧠 Unlocks latent semantics in frozen SAM2 ✏️ Supports any prompt: fast and scalable annotation 📦 No extra encoders 📎 github.com/ClaudiaCuttano…

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Fuck Yes! Serverless GPU for everyone! The Cloud Run just shipped Serverless GPU with no quota request required. 🤯 Deploy Google DeepMind Gemma with a single command! - Pay-per-second GPU billing. - Scale to zero instances. - TTFT of 19 seconds for a Gemma3 as cold start. - No

Fuck Yes! Serverless GPU for everyone! The Cloud Run just shipped Serverless GPU with no quota request required. 🤯 Deploy <a href="/GoogleDeepMind/">Google DeepMind</a> Gemma with a single command! 

- Pay-per-second GPU billing.
- Scale to zero instances.
- TTFT of 19 seconds for a Gemma3 as cold start.
- No
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. 

Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 

📗 Get the technical details