Semantic Browsing: Controllable Diversity for Image Generation

ECCV 2026

Sara Dorfman*, Maya Vishnevsky*, Omer Dahary, Or Patashnik, Daniel Cohen-Or

Tel Aviv University

* Equal contribution

TL;DR

Semantic Browsing introduces an agentic workflow that turns a single text prompt into a structured, browsable gallery of diverse image interpretations, where each variation reflects meaningful and controllable semantic choices rather than stochastic sampling.

Abstract

Modern text-to-image models produce high-fidelity images that closely follow prompts, but repeated sampling often collapses toward a single semantic interpretation. Semantic Browsing introduces controllable diversity: users explore generated images through meaningful, interpretable variations rather than incidental stochastic changes. The method shifts diversity to the text level, using a multi-agent workflow to expand prompts into structured scene representations and to identify plausible under-specified axes of variation. Each generated branch corresponds to a specific semantic decision, creating a navigable design space while preserving the original prompt intent.

Structured gallery of lion and tiger image variations generated from a single prompt. — **Semantic Browsing for Image Generation.** From a single text prompt *A poster featuring animals*, the system produces a structured gallery of images that explore different meaningful interpretations of the same scene. Rather than random variations, each image reflects a distinct, coherent semantic choice (e.g., changes in character, composition, or style) allowing users to browse a space of alternatives in a deliberate and interpretable way. In this visualization, the leftmost image serves as the root for the four variations in the center. The variation highlighted with a purple border is then selected as the specific parent for its four children displayed on the right.

Overview

Semantic diversity through structured scene refinement

Structured scene-tree expansion

We represent each fully specified scene as a structured JSON, capturing objects, attributes, interactions, and global scene properties.
The method builds a rooted tree of JSON scenes: each node is a complete scene interpretation and each edge applies one semantic constraint.
The tree grows iteratively by invoking the agentic workflow at a selected node, producing children that preserve the branch history.

Interactive semantic browsing process showing branching choices and selected semantic refinements. — **Example of semantic browsing produced by our method.** Starting from an initial scene interpretation inferred from the user prompt, the method explores alternative realizations by committing explicit semantic constraints at each step. Each branching point corresponds to alternative realizations of a single semantic aspect, while previously fixed constraints are preserved. Branching points also include an option to preserve the current value of the selected aspect, allowing exploration to continue along other semantic dimensions. Every node is a fully specified, renderable scene; *preserve* branches propagate these states to the final level, ensuring the leaf nodes contain all generated representations ready for rendering.

How to expand the tree?

Tree requirements

Semantic structuring: siblings branch along one shared semantic aspect, such as interaction, composition, or style.
Heterogeneity: each child realizes that aspect in a distinct way, creating meaningful conceptual spread.
Plausibility: every refinement remains consistent with the original prompt and with constraints already fixed along its path.

Multi-agent workflow

Context Analyst: separates fixed constraints from mutable scene details, defining a plausible search space for modification.
Brainstormer: groups mutable details into high-level semantic aspects, encouraging structured branches rather than isolated edits.
Decision Maker: selects one impactful aspect and instantiates it into diverse alternative constraints for sibling nodes.
Critic: validates and refines the proposed constraints so they remain faithful, non-contradictory, and clearly distinct.

Results

Galleries of semantic alternatives

All images shown are derived from a single initial scene. The outer gray groupings organize results that share a direct common ancestor scene. Inside, the colored boxes distinguish sibling branches—parallel variations that share that same parent but differ from one another by a single semantic aspect. This demonstrates how our method introduces meaningful diversity while preserving the coherence of the original user prompt.

1 / 10

Semantic Browsing result gallery for the prompt: A group of people doing yoga. — Prompt: A group of people doing yoga.

Semantic Browsing result gallery for the prompt: A cat and a goldfish bowl. — Prompt: A cat and a goldfish bowl.

Semantic Browsing result gallery for the prompt: A robot and a scarecrow in a field. — Prompt: A robot and a scarecrow in a field.

Semantic Browsing result gallery for the prompt: A birthday cake. — Prompt: A birthday cake.

Semantic Browsing result gallery for the prompt: A boat passes by waterfront houses flanked by trees. — Prompt: A boat passes by waterfront houses flanked by trees.

Semantic Browsing result gallery for the prompt: A doll on a shelf. — Prompt: A doll on a shelf.

Semantic Browsing result gallery for the prompt: A man in uniform riding a horse. — Prompt: A man in uniform riding a horse.

Semantic Browsing result gallery for the prompt: A family of monkeys. — Prompt: A family of monkeys.

Semantic Browsing result gallery for the prompt: A group of people riding on a group of elephants. — Prompt: A group of people riding on a group of elephants.

Semantic Browsing result gallery for the prompt: A group of people at a sports event. — Prompt: A group of people at a sports event.

Model-Agnostic Generation

Qualitative results demonstrating the transferability of our framework to the FLUX.2 architecture. By utilizing our agentic flow solely for scene generation and FLUX.2 as the rendering backbone, we achieve consistent structured diversity.

Semantic Browsing FLUX.2 result gallery for the prompt: A dancer performing a dance. — Prompt: A dancer performing a dance.

Semantic Browsing FLUX.2 result gallery for the prompt: A red fox and a white fox playing a video game. — Prompt: A red fox and a white fox playing a video game.

Citation

BibTeX

@article{dorfman2026semanticbrowsing,
  title   = {Semantic Browsing: Controllable Diversity for Image Generation},
  author  = {Dorfman, Sara and Vishnevsky, Maya and Dahary, Omer and Patashnik, Or and Cohen-Or, Daniel},
  journal = {arXiv preprint},
  year    = {2026}
}