Multistable Shape from Shading Emerges from Patch Diffusion

May-26-2025, 22:07:32 GMT–Neural Information Processing Systems

Models for inferring monocular shape of surfaces with diffuse reflection---shape from shading---ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) types that are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from 16\times 16 patches of synthetic images of everyday 3D objects. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable.

artificial intelligence, human experience, multistable shape, (3 more...)

Neural Information Processing Systems

May-26-2025, 22:07:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.42)