Plotting


The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes

Neural Information Processing Systems

Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data.


Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Neural Information Processing Systems

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance.



A Additional Experiments

Neural Information Processing Systems

A.1 Estimating Test Robust Accuracy Instead of estimating the robust generalization gap, one might expect the analysis on the relationship between test robust accuracy and the measures. In this regard, we investigate the correlation between the measures and the test robust accuracy 1 E(w; ฯต, D) on the test dataset D instead of the robust generalization gap g(w). Figure 1 illustrates the difference in total ฯ„ when the robust generalization gap and the test robust accuracy are used as the target variable for correlation analysis. Figure 6: Comparison of the total ฯ„ when the robust generalization g(w) (yellow) and the test robust accuracy 1 E(w; ฯต, D) (blue) are used as the target variables for correlation analysis. Although we observe some different behavior of measures, we find that estimating the test robust accuracy can be more challenging.


A tiny shapeshifting robot could be the next big thing in biomedicine

Mashable

Developed by a team of scientists at Seoul National University and Gachon University in South Korea, PB, or the Particle-armored liquid roBot, is designed to behave the way cells do, and imitate biological forms and functions. The morphing bot can ooze around tiny pillars, skim across water to reach a dry surface without bursting, merge with another PB, and swallow a glass bead, all without compromising structural integrity. The robot is still in the research stages, but the promising results so far raise hopes that PB could potentially help advance drug delivery and even tumor cell destruction in the future.



G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering Yifei Sun

Neural Information Processing Systems

Given a graph with textual attributes, we enable users to'chat with their graph': that is, to ask questions about the graph using a conversational interface. In response to a user's questions, our method provides textual replies and highlights the relevant parts of the graph. While existing works integrate large language models (LLMs) and graph neural networks (GNNs) in various ways, they mostly focus on either conventional graph tasks (such as node, edge, and graph classification), or on answering simple graph queries on small or synthetic graphs. In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. Toward this goal, we first develop a Graph Question Answering (GraphQA) benchmark with data collected from different tasks. Then, we propose our G-Retriever method, introducing the first retrievalaugmented generation (RAG) approach for general textual graphs, which can be fine-tuned to enhance graph understanding via soft prompting. To resist hallucination and to allow for textual graphs that greatly exceed the LLM's context window size, G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem. Empirical evaluations show that our method outperforms baselines on textual graph tasks from multiple domains, scales well with larger graph sizes, and mitigates hallucination.


SLIBO-Net: Floorplan Reconstruction via Slicing Box Representation with Local Geometry Regularization Supplemental Material, Chi-Han Peng

Neural Information Processing Systems

We compare our method with four competing methods in Table 1 of the main paper. Below, we provide more details about how we obtain the scores on Structured3D [6] for each method. Floor-SP [1] extracts geometry primitives from density maps using deep neural networks and optimizes the floorplan graph structure with room-wise coordinate descent. We use the evaluation score reported by [5]. MonteFloor [3] applies MCTS to select room proposals that maximize an objective function combining the density map predicted by a deep network and regularization terms on the room shapes.


Approximating mutual information of highdimensional variables using learned representations

Neural Information Processing Systems

Mutual information (MI) is a general measure of statistical dependence with widespread application across the sciences. However, estimating MI between multidimensional variables is challenging because the number of samples necessary to converge to an accurate estimate scales unfavorably with dimensionality. In practice, existing techniques can reliably estimate MI in up to tens of dimensions, but fail in higher dimensions, where sufficient sample sizes are infeasible. Here, we explore the idea that underlying low-dimensional structure in high-dimensional data can be exploited to faithfully approximate MI in high-dimensional settings with realistic sample sizes. We develop a method that we call latent MI (LMI) approximation, which applies a nonparametric MI estimator to low-dimensional representations learned by a simple, theoretically-motivated model architecture.