Robust Conditional 3D Shape Generation from Casual Captures

https://news.ycombinator.com/rss Hits: 4
Summary

SAM 3D Objects marks a significant improvement in shape generation, but it lacks metric accuracy and requires interaction. Since it can only exploit a single view, it can sometimes fail to preserve correct aspect ratios, relative scales, and object layouts in complex scenes such as shown in the example here. ShapeR solves this by leveraging image sequences and multimodal data (such as SLAM points). By integrating multiple posed views, ShapeR automatically produces metrically accurate and consistent reconstructions. Unlike interactive single-image methods, ShapeR robustly handles casually captured real-world scenes, generating high-quality metric shapes and arrangements without requiring user interaction. Notably, ShapeR achieves this while trained entirely on synthetic data, whereas SAM 3D exploits large-scale labeled real image-to-3D data. This highlights two different axes of progress: where SAM 3D uses large-scale real data for robust single-view inference, ShapeR utilizes multi-view geometric constraints to achieve robust, metric scene reconstruction. The two approaches can be combined. By conditioning the second stage of SAM 3D with the output of ShapeR, we can merge the best of both worlds: the metric accuracy and robust layout of ShapeR, and the textures and robust real-world priors of SAM 3D.

First seen: 2026-01-19 14:31

Last seen: 2026-01-19 17:31