AI Bites | YouTube Channel (@ai_bites) Twitter Tweets • TwiCopy

AI Bites | YouTube Channel

6 months ago

BiHumanML3D. the first bilingual text-to-motion dataset and the corresponding model for bilingual text-to-motion generation, and a plug-and-play reward-guided alignment to further enhance generation quality. Paper Title: ReAlign: Bilingual Text-to-Motion Generation via

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Awesome-3D-Scene-Generation. This repository collects summaries of over 300 recent studies on 3D scene generation, along with the downstream applications, and will be continuously updated. Paper Title: 3D Scene Generation: A Survey Project: github.com/hzxie/Awesome-… Link:

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene Paper

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

3D- Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To achieve this, it leverages a training-based approach that harnesses the generative power of diffusion

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

3D Gaussian Splatting (3DGS) has rapidly become a leading technique for novel-view synthesis, providing exceptional performance through efficient software-based GPU rasterization. Its versatility enables real-time applications, including on mobile and lower-powered devices.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Scalable Vector Graphics (SVGs) are highly favored by designers due to their resolution independence and well-organized layer structure. Although existing text-to-vector (T2V) generation methods can create SVGs from text prompts, they often overlook an important need in practical

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

CoCoGaussian, a framework for 3D scene reconstruction from defocused images. By modeling the circle of confusion as Gaussians, CoCoGaussian enables customization of defocused images through depth of field adjustments or focus plane changes, while sharp images can be rendered by

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Latent Action Pretraining for general Action models (LAPA) is the first unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Learning from human videos is a promising direction for addressing data scarcity in robot learning, but existing methods rely on human-robot alignment or intermediate representations (e.g., trajectories), limiting scalability. How can we leverage large-scale video

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Vid2World is a general approach for transforming video diffusion models (like SORA) into interactive world models (like Genie), leveraging the high fidelity of full-sequence diffusion to enable causal, autoregressive, and action-conditioned generation. Paper Title: Vid2World:

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in multilingual settings remain

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Given multiview or monocular videos of a human, this work can reconstruct and animate the human in novel poses, novel lighting, and novel views. The fast rendering speed enables us to animate the human avatar interactively. Paper Title: Interactive Rendering of Relightable and

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

FastMap, a new global structure from motion method focused on speed and simplicity. Previous methods like COLMAP and GLOMAP are able to estimate high-precision camera poses, but suffer from poor scalability when the number of matched keypoint pairs becomes large. Two key factors

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

This work presents a novel generative 3D modeling system, coined CraftsMan3D, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies and detailed surfaces, and notably, allows for refining the geometry in an interactive manner. Paper

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

BAGEL, the open-source Unified Multimodal Model you can fine-tune, distill and deploy anywhere, offering comparable functionality to proprietary systems like GPT-4o and Gemini 2.0 in an open form, unlocks useful and valuable image generation through a natively multimodal

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

In the series of videos about Langgraph, here is our second video: youtu.be/NJ2a6rVsStg?si… Its all about building a simple chatbot and adding tool use to it to make it more sophisticated. Stay tuned for more! #langgraph #LLMs #AI #AI美少女 #langchain

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

MotionPro, a precise motion controller that novelly leverages region-wise trajectory and motion mask to regulate fine-grained motion synthesis and identify target motion category (i.e., object or camera moving), respectively. Technically, MotionPro first estimates the flow maps

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Generalizable active mapping in complex unknown environments remains a critical challenge for mobile robots. Existing methods, constrained by insufficient training data and conservative exploration strategies, exhibit limited generalizability across scenes with diverse layouts

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI Bites | YouTube Channel

@ai_bites

6 months ago

Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. There is a work that studies the different effects of SFT and RL on

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare