Tao, Sirui
HotSpot: Screened Poisson Equation for Signed Distance Function Optimization
Wang, Zimo, Wang, Cheng, Yoshino, Taiki, Tao, Sirui, Fu, Ziyang, Li, Tzu-Mao
Existing losses such as the eikonal loss is to ensure that the implicit function indeed outputs cannot guarantee the recovered implicit function to be a the signed distance. A standard regularization loss used is distance function, even when the implicit function satisfies the eikonal equation: it constrains the norm of the gradient the eikonal equation almost everywhere. Furthermore, the of an implicit function to be 1 almost everywhere. If eikonal loss suffers from stability issues in optimization and the implicit function is a signed distance function, then it the remedies that introduce area or divergence minimization satisfies the eikonal equation. However, the converse is not can lead to oversmoothing. We address these challenges true. Figure 1 shows an example: on the left, we optimize by designing a loss function that when minimized can an implicit function to satisfy the eikonal equation, while converge to the true distance function, is stable, and naturally it successfully does so, it converges to a solution that is far penalize large surface area. We provide theoretical from the actual distance [5, 6].
Ensemble Clustering via Co-association Matrix Self-enhancement
Jia, Yuheng, Tao, Sirui, Wang, Ran, Wang, Yongheng
Ensemble clustering integrates a set of base clustering results to generate a stronger one. Existing methods usually rely on a co-association (CA) matrix that measures how many times two samples are grouped into the same cluster according to the base clusterings to achieve ensemble clustering. However, when the constructed CA matrix is of low quality, the performance will degrade. In this paper, we propose a simple yet effective CA matrix self-enhancement framework that can improve the CA matrix to achieve better clustering performance. Specifically, we first extract the high-confidence (HC) information from the base clusterings to form a sparse HC matrix. By propagating the highly-reliable information of the HC matrix to the CA matrix and complementing the HC matrix according to the CA matrix simultaneously, the proposed method generates an enhanced CA matrix for better clustering. Technically, the proposed model is formulated as a symmetric constrained convex optimization problem, which is efficiently solved by an alternating iterative algorithm with convergence and global optimum theoretically guaranteed. Extensive experimental comparisons with twelve state-of-the-art methods on eight benchmark datasets substantiate the effectiveness, flexibility and efficiency of the proposed model in ensemble clustering. The codes and datasets can be downloaded at https://github.com/Siritao/EC-CMS.
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
Bear, Daniel M., Wang, Elias, Mrowca, Damian, Binder, Felix J., Tung, Hsiau-Yu Fish, Pramod, R. T., Holdaway, Cameron, Tao, Sirui, Smith, Kevin, Sun, Fan-Yun, Fei-Fei, Li, Kanwisher, Nancy, Tenenbaum, Joshua B., Yamins, Daniel L. K., Fan, Judith E.
While machine learning algorithms excel at many challenging visual tasks, it is unclear that they can make predictions about commonplace real world physical events. Here, we present a visual and physical prediction benchmark that precisely measures this capability. In realistically simulating a wide variety of physical phenomena -- rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion -- our dataset presents a more comprehensive challenge than existing benchmarks. Moreover, we have collected human responses for our stimuli so that model predictions can be directly compared to human judgments. We compare an array of algorithms -- varying in their architecture, learning objective, input-output structure, and training data -- on their ability to make diverse physical predictions. We find that graph neural networks with access to the physical state best capture human behavior, whereas among models that receive only visual input, those with object-centric representations or pretraining do best but fall far short of human accuracy. This suggests that extracting physically meaningful representations of scenes is the main bottleneck to achieving human-like visual prediction. We thus demonstrate how our benchmark can identify areas for improvement and measure progress on this key aspect of physical understanding.