Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

Peking University
CVPR 2025

A man in black coat, yellow shirt, pink trousers, blue leather shoes, a pair of sunglasses with orange boarder, red gloves and green hats is waving.

Abstract

Recent text-to-3D generation models have demonstrated remarkable abilities in producing high-quality 3D assets. Despite their great advancements, current models struggle to generate satisfying 3D objects with complex attributes. The difficulty for such complex attributes 3D generation arises from two aspects: (1) existing text-to-3D approaches typically lift text-to-image models to extract semantics via text encoders, while the text encoder exhibits limited comprehension ability for long descriptions, leading to deviated cross-attention focus, subsequently wrong attribute binding in generated results. (2) Objects with complex attributes often exhibit occlusion relationships between different parts, which demands a reasonable generation order as well as explicit disentanglement of different parts to enable structural coherent and attribute following results. Though some works introduce manual efforts to alleviate the above issues, their quality is unstable and highly reliant on manual information. To tackle above problems, we propose a automated method Hierarchical-Chain-of-Generation (HCoG). It leverages a large language model to analyze the long description, decomposes it into several blocks representing different object parts, and organizes an optimal generation order from in to out according to the occlusion relationship between parts, turning the whole generation process into a hierarchical chain. For optimization within each block, we first generate the necessary components coarsely, then bind their attributes precisely by target region localization and corresponding 3D Gaussian kernel optimization. For optimization between blocks, we introduce Gaussian Extension and Label Elimination to seamlessly generate new parts by extending new Gaussian kernels, re-assigning semantic labels, and eliminating unnecessary kernels, ensuring that only relevant parts are added without disrupting previously optimized parts. Experiments validate HCoG's effectiveness in handling complex attributes 3D assets and witnesses high-quality results.

Method Overview

Problem and our method example

The problem of existing work and the example of our method. Our method (HCoG) leverages LLM to generate hierarchical chain of generation, realizing automatic generation of 3D assets with better complex attributes binding capability.

Overview of Hierarchical-Chain-of-Generation. a) In the Hierarchical Blocks stage, LLM analyzes the input text and based on the order from more occlusion to less occlusion, creating the order of generation. b) Part-optimization is applied to the parts in blocks, using Lang-SAM to segment specific parts and utilizing MVDream and ControlNet in fine-grained optimization stage to enable corresponding attributes binding for each part with shape and multi-view consistency. c) Gaussian Extension is applied between blocks, extending new parts for the next block. d) Label Elimination aims to generate new parts by extending new Gaussian kernels (red-star-marked), re-assigning semantic labels (blue-star-marked), and eliminating unnecessary kernels finally, ensuring that only relevant parts are generated without disrupting previously optimized parts.

Method overview

More results

A man in black coat, yellow shirt, pink trousers, blue leather shoes and green hats is waving.

A cartoon girl with short hair wears gray shirt, blue skirt, yellow shoes, pink jacket and brown hat is dancing.

A yellow dog wears a pink shirt, two pairs of pink shoes, and a blue collar.

A wooden dog driving an origami sport car.

An orange cat wearing a yellow suit and cyan boots.

A metal monkey wearing a golden crown and driving an origami sport car.

A boy wears blue shirt with a yellow star on it, gray trousers, blue sport shoes, purple wizard hat and blue jacket, holding a magic stick.

An orange cat wearing a yellow suit and red pumps.

A clown with red nose and white face, wears green wig, black shoes, yellow shirt, red jacket, and red pants.

BibTeX

BibTex Code Here