Ameer azam (@ameerazam18) 's Twitter Profile
Ameer azam

@ameerazam18

LLMs, TTS, Diffusion & Open Source Gen AI || AI changing things || AI papers Tweet || Deep Learning

ID: 739044397065478145

linkhttps://huggingface.co/ameerazam08 calendar_today04-06-2016 10:41:30

358 Tweet

122 Takipçi

730 Takip Edilen

VibeCode (@vibecodeapp) 's Twitter Profile Photo

It's Labor Day, and Vibe coding is now FREE for EVERYONE. - Build iOS Apps - Build Web Apps with Claude Code, Codex w/ GPT-5, Gemini CLI - And now you can build Android Apps Like + Reply to this post and we'll DM you the link.

sway (@swaystar123) 's Twitter Profile Photo

You can implement this paper with 2 lines of code cfm_target = torch.roll(flow_target, shifts=1, dims=0) cfm_loss = -((model_output - cfm_target) ** 2).mean() * λ (Official impl is 60 lines btw)

You can implement this paper with 2 lines of code

cfm_target = torch.roll(flow_target, shifts=1, dims=0)
cfm_loss = -((model_output - cfm_target) ** 2).mean() * λ

(Official impl is 60 lines btw)
青龍聖者 (@bdsqlsz) 's Twitter Profile Photo

New diffusion RL method from Tencent:SRPO! Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference 32 H20 with 10minutes enhance Flux 1-dev.

New diffusion RL method from Tencent:SRPO!
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
32 H20 with 10minutes enhance Flux 1-dev.
Gaurav Sen (@gkcs_) 's Twitter Profile Photo

20 AI terms you need to know. 1. Large Langauge Models 2. Tokenization 3. Embeddings (Vectors) 4. Attention Mechanism 5. Transformer 6. Self- supervised Learning 7. Fine-Tuning 8. Quantization 9. Few Shot Prompting 10. Vector Databases 11. Retrieval Augmented Generation 12.

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Apple presents Manzano: Simple & scalable unified multimodal LLM • Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict • SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana • One model for both understanding & generation • Joint recipe:

Apple presents Manzano: Simple & scalable unified multimodal LLM

• Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict
• SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana
• One model for both understanding & generation
• Joint recipe:
Shubham Saboo (@saboo_shubham_) 's Twitter Profile Photo

Paper2Agent automatically transforms research papers into AI agents. You can use it via MCP with Claude Code or Google Gemini CLI. 100% Opensource.

Paper2Agent automatically transforms research papers into AI agents.

You can use it via MCP with Claude Code or Google Gemini CLI.

100% Opensource.
moondream (@moondreamai) 's Twitter Profile Photo

Moondream 3 understands UIs, not just pixels. Identify buttons, prices, and labels with a prompt. Perfect for agentic workflows. Open, tiny, blazingly fast.

DailyPapers (@huggingpapers) 's Twitter Profile Photo

ByteDance just released FaceCLIP on Hugging Face! A new vision-language model specializing in understanding and generating diverse human faces. Dive into the future of facial AI. huggingface.co/ByteDance/Face…

Zhe Gan (@zhegan4) 's Twitter Profile Photo

🎁🎁 We release Pico-Banana-400K, a large-scale, high-quality image editing dataset distilled from Nana-Banana across 35 editing types. 🔗 Data link: github.com/apple/pico-ban… 🔗Paper link: arxiv.org/abs/2510.19808 It includes 258K single-turn image editing data, 72K multi-turn

🎁🎁 We release Pico-Banana-400K, a large-scale, high-quality image editing dataset distilled from Nana-Banana across 35 editing types. 

🔗 Data link: github.com/apple/pico-ban…

🔗Paper link: arxiv.org/abs/2510.19808

It includes 258K single-turn image editing data, 72K multi-turn
Niels Rogge (@nielsrogge) 's Twitter Profile Photo

This is a phenomenal video by Jia-Bin Huang explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting

This is a phenomenal video by <a href="/jbhuang0604/">Jia-Bin Huang</a> explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes 

DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting
weijia wu (@weijiawu7) 's Twitter Profile Photo

🔥 New paper out: WEAVE — a 100K-sample interleaved multimodal dataset + WEAVEBench, a human-annotated benchmark for visual memory, multi-turn editing. 📄 arXiv: arxiv.org/abs/2511.11434 🐙 GitHub: github.com/weichow23/weave 🤗 HF Dataset: huggingface.co/datasets/WeiCh…

🔥 New paper out: WEAVE — a 100K-sample interleaved multimodal dataset + WEAVEBench, a human-annotated benchmark for visual memory, multi-turn editing.
📄 arXiv: arxiv.org/abs/2511.11434
🐙 GitHub: github.com/weichow23/weave
🤗 HF Dataset: huggingface.co/datasets/WeiCh…
Ameer azam (@ameerazam18) 's Twitter Profile Photo

Just Enjoy this Gemini-3 Pro Vibe Vibe Coded Video Calling APP on Huggingface huggingface.co/spaces/ameeraz… github.com/AMEERAZAM08/Ge… star on repo Coonect with Anyone on Huggingface share code and speak. Hugging Face Google Google Gemini

Just Enjoy this Gemini-3 Pro  Vibe 
Vibe Coded Video Calling APP on Huggingface 
huggingface.co/spaces/ameeraz…
github.com/AMEERAZAM08/Ge…
star on repo

Coonect with Anyone on Huggingface share code and speak.
<a href="/huggingface/">Hugging Face</a> <a href="/Google/">Google</a> <a href="/GeminiApp/">Google Gemini</a>
Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Introducing our latest open model: MedASR 🔬Speech to text model 🏥for healthcare-based voice applications 🤗available in Hugging Face ⚡️run with transformers Download right now huggingface.co/google/medasr