Controllable Layer Decomposition for Reversible Multi-Layer Image Generation

Abstract

This work presents Controllable Layer Decomposition (CLD), a method for achieving fine-grained and controllable multi-layer separation of raster images. In practical workflows, designers typically generate and edit each RGBA layer independently before compositing them into a final raster image. However, this process is irreversible: once composited, layer-level editing is no longer possible. Existing methods commonly rely on image matting and inpainting, but remain limited in controllability and segmentation precision. To address these challenges, we propose two key modules: LayerDecompose-DiT (LD-DiT), which decouples image elements into distinct layers and enables fine-grained control; and Multi-Layer Conditional Adapter (MLCA), which injects target image information into multi-layer tokens to achieve precise conditional generation. To enable a comprehensive evaluation, we build a new benchmark and introduce tailored evaluation metrics. Experimental results show that CLD consistently outperforms existing methods in both decomposition quality and controllability. Furthermore, the separated layers produced by CLD can be directly manipulated in commonly used design tools such as PowerPoint, highlighting its practical value and applicability in real-world creative workflows.

cars peace

METHOD

Our framework utilizes a main backbone and a parallel control module for precise layer decomposition. (a) The overall CLD architecture, showing the LayerDecompose-DiT (LD-DiT) backbone responsible for generating the multi-layer latent. (b) The detailed structure of the Multi-Layer Conditional Adapter (MLCA). MLCA additively fuses features from the conditional image with the LD-DiT's hidden states, then performs hierarchical cropping based on the input bounding boxes to create a multi-layer guidance token sequence.

cars peace

Case 1

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8

Case 2

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8

Foreground 9

Foreground 10

Foreground 11

Foreground 12

Foreground 13

Foreground 14

Foreground 15

Case 3

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8

Case 4

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8

Foreground 9

Foreground 10

Case 5

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8

Foreground 9

Foreground 10

Case 6

Input

Origin Image

Bounding box

Results

Drag left or right to browse the results (From bottom to top).

Background

Foreground 0

Foreground 1

Foreground 2

Foreground 3

Foreground 4

Foreground 5

Foreground 6

Foreground 7

Foreground 8