AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile
AI Bites | YouTube Channel

@ai_bites

AI Happenings, papers and ideas tweet. Opensource online AI education for the world. Former @UniofOxford @Oxford_VGG

ID: 2728547289

linkhttps://www.youtube.com/c/AIBites calendar_today29-07-2014 22:36:06

3,3K Tweet

1,1K Takipçi

693 Takip Edilen

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

BiHumanML3D. the first bilingual text-to-motion dataset and the corresponding model for bilingual text-to-motion generation, and a plug-and-play reward-guided alignment to further enhance generation quality. Paper Title: ReAlign: Bilingual Text-to-Motion Generation via

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Awesome-3D-Scene-Generation. This repository collects summaries of over 300 recent studies on 3D scene generation, along with the downstream applications, and will be continuously updated. Paper Title: 3D Scene Generation: A Survey Project: github.com/hzxie/Awesome-… Link:

Awesome-3D-Scene-Generation. This repository collects summaries of over 300 recent studies on 3D scene generation, along with the downstream applications, and will be continuously updated.

Paper Title: 3D Scene Generation: A Survey
Project: github.com/hzxie/Awesome-…
Link:
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene Paper

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

3D- Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To achieve this, it leverages a training-based approach that harnesses the generative power of diffusion

3D- Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To achieve this, it leverages a training-based approach that harnesses the generative power of diffusion
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

3D Gaussian Splatting (3DGS) has rapidly become a leading technique for novel-view synthesis, providing exceptional performance through efficient software-based GPU rasterization. Its versatility enables real-time applications, including on mobile and lower-powered devices.

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Scalable Vector Graphics (SVGs) are highly favored by designers due to their resolution independence and well-organized layer structure. Although existing text-to-vector (T2V) generation methods can create SVGs from text prompts, they often overlook an important need in practical

Scalable Vector Graphics (SVGs) are highly favored by designers due to their resolution independence and well-organized layer structure. Although existing text-to-vector (T2V) generation methods can create SVGs from text prompts, they often overlook an important need in practical
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

CoCoGaussian, a framework for 3D scene reconstruction from defocused images. By modeling the circle of confusion as Gaussians, CoCoGaussian enables customization of defocused images through depth of field adjustments or focus plane changes, while sharp images can be rendered by

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Latent Action Pretraining for general Action models (LAPA) is the first unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Learning from human videos is a promising direction for addressing data scarcity in robot learning, but existing methods rely on human-robot alignment or intermediate representations (e.g., trajectories), limiting scalability. How can we leverage large-scale video

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Vid2World is a general approach for transforming video diffusion models (like SORA) into interactive world models (like Genie), leveraging the high fidelity of full-sequence diffusion to enable causal, autoregressive, and action-conditioned generation. Paper Title: Vid2World:

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in multilingual settings remain

Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in multilingual settings remain
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Given multiview or monocular videos of a human, this work can reconstruct and animate the human in novel poses, novel lighting, and novel views. The fast rendering speed enables us to animate the human avatar interactively. Paper Title: Interactive Rendering of Relightable and

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

FastMap, a new global structure from motion method focused on speed and simplicity. Previous methods like COLMAP and GLOMAP are able to estimate high-precision camera poses, but suffer from poor scalability when the number of matched keypoint pairs becomes large. Two key factors

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

This work presents a novel generative 3D modeling system, coined CraftsMan3D, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies and detailed surfaces, and notably, allows for refining the geometry in an interactive manner. Paper

This work presents a novel generative 3D modeling system, coined CraftsMan3D, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies and detailed surfaces, and notably, allows for refining the geometry in an interactive manner.

Paper
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

BAGEL, the open-source Unified Multimodal Model you can fine-tune, distill and deploy anywhere, offering comparable functionality to proprietary systems like GPT-4o and Gemini 2.0 in an open form, unlocks useful and valuable image generation through a natively multimodal

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

In the series of videos about Langgraph, here is our second video: youtu.be/NJ2a6rVsStg?si… Its all about building a simple chatbot and adding tool use to it to make it more sophisticated. Stay tuned for more! #langgraph #LLMs #AI #AI美少女 #langchain

In the series of videos about Langgraph, here is our second video:

youtu.be/NJ2a6rVsStg?si…

Its all about building a simple chatbot and adding tool use to it to make it more sophisticated.

Stay tuned for more!

#langgraph #LLMs #AI #AI美少女 #langchain
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

MotionPro, a precise motion controller that novelly leverages region-wise trajectory and motion mask to regulate fine-grained motion synthesis and identify target motion category (i.e., object or camera moving), respectively. Technically, MotionPro first estimates the flow maps

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Generalizable active mapping in complex unknown environments remains a critical challenge for mobile robots. Existing methods, constrained by insufficient training data and conservative exploration strategies, exhibit limited generalizability across scenes with diverse layouts

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. There is a work that studies the different effects of SFT and RL on

Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. There is a work that studies the different effects of SFT and RL on