Semantic Browsing: Controllable Diversity for Image Generation

ECCV 2026

Sara Dorfman*, Maya Vishnevsky*, Omer Dahary, Or Patashnik, Daniel Cohen-Or

Tel Aviv University

* Equal contribution

TL;DR

Semantic Browsing introduces an agentic workflow that turns a single text prompt into a structured, browsable gallery of diverse image interpretations, where each variation reflects meaningful and controllable semantic choices rather than stochastic sampling.

Abstract

Modern text-to-image models produce high-fidelity images that closely follow prompts, but repeated sampling often collapses toward a single semantic interpretation. Semantic Browsing introduces controllable diversity: users explore generated images through meaningful, interpretable variations rather than incidental stochastic changes. The method shifts diversity to the text level, using a multi-agent workflow to expand prompts into structured scene representations and to identify plausible under-specified axes of variation. Each generated branch corresponds to a specific semantic decision, creating a navigable design space while preserving the original prompt intent.

Structured gallery of lion and tiger image variations generated from a single prompt.
Semantic Browsing for Image Generation. From a single text prompt A poster featuring animals, the system produces a structured gallery of images that explore different meaningful interpretations of the same scene. Rather than random variations, each image reflects a distinct, coherent semantic choice (e.g., changes in character, composition, or style) allowing users to browse a space of alternatives in a deliberate and interpretable way. In this visualization, the leftmost image serves as the root for the four variations in the center. The variation highlighted with a purple border is then selected as the specific parent for its four children displayed on the right.

Overview

Semantic diversity through structured scene refinement

Structured scene-tree expansion

  • We represent each fully specified scene as a structured JSON, capturing objects, attributes, interactions, and global scene properties.
  • The method builds a rooted tree of JSON scenes: each node is a complete scene interpretation and each edge applies one semantic constraint.
  • The tree grows iteratively by invoking the agentic workflow at a selected node, producing children that preserve the branch history.
Interactive semantic browsing process showing branching choices and selected semantic refinements.
Example of semantic browsing produced by our method. Starting from an initial scene interpretation inferred from the user prompt, the method explores alternative realizations by committing explicit semantic constraints at each step. Each branching point corresponds to alternative realizations of a single semantic aspect, while previously fixed constraints are preserved. Branching points also include an option to preserve the current value of the selected aspect, allowing exploration to continue along other semantic dimensions. Every node is a fully specified, renderable scene; preserve branches propagate these states to the final level, ensuring the leaf nodes contain all generated representations ready for rendering.

How to expand the tree?

Tree requirements

  • Semantic structuring: siblings branch along one shared semantic aspect, such as interaction, composition, or style.
  • Heterogeneity: each child realizes that aspect in a distinct way, creating meaningful conceptual spread.
  • Plausibility: every refinement remains consistent with the original prompt and with constraints already fixed along its path.

Multi-agent workflow

  • Context Analyst: separates fixed constraints from mutable scene details, defining a plausible search space for modification.
  • Brainstormer: groups mutable details into high-level semantic aspects, encouraging structured branches rather than isolated edits.
  • Decision Maker: selects one impactful aspect and instantiates it into diverse alternative constraints for sibling nodes.
  • Critic: validates and refines the proposed constraints so they remain faithful, non-contradictory, and clearly distinct.
Multi-agent workflow for proposing and refining semantic scene variations.
Multi-Agent workflow guiding an iterative JSON generation process. The pipeline takes the current JSON configuration and a history of constraints derived from previous modifications (including the user prompt) as inputs. A sequence of agents - Context Analyst, Brainstormer, Decision Maker, and Critic - analyzes these inputs to select an aspect to modify and formulate specific instructions. The JSON Refiner then translates these instructions into an updated JSON configuration, and the new modifications are added to the constraint set for subsequent iterations.

Results

Galleries of semantic alternatives

All images shown are derived from a single initial scene. The outer gray groupings organize results that share a direct common ancestor scene. Inside, the colored boxes distinguish sibling branches—parallel variations that share that same parent but differ from one another by a single semantic aspect. This demonstrates how our method introduces meaningful diversity while preserving the coherence of the original user prompt.

Model-Agnostic Generation

Qualitative results demonstrating the transferability of our framework to the FLUX.2 architecture. By utilizing our agentic flow solely for scene generation and FLUX.2 as the rendering backbone, we achieve consistent structured diversity.

Citation

BibTeX

@article{dorfman2026semanticbrowsing,
  title   = {Semantic Browsing: Controllable Diversity for Image Generation},
  author  = {Dorfman, Sara and Vishnevsky, Maya and Dahary, Omer and Patashnik, Or and Cohen-Or, Daniel},
  journal = {arXiv preprint},
  year    = {2026}
}