Model-Based Reasoning
Social Mechanism Design: Making Maximally Acceptable Decisions
Abramowitz, Ben, Mattei, Nicholas
Agents care not only about the outcomes of collective decisions but also about how decisions are made. In many cases, both the outcome and the procedure affect whether agents see a decision as legitimate, justifiable, or acceptable. We propose a novel model for collective decisions that takes into account both the preferences of the agents and their higher order concerns about the process of preference aggregation. To this end we (1) propose natural, plausible preference structures and establish key properties thereof, (2) develop mechanisms for aggregating these preferences to maximize the acceptability of decisions, and (3) characterize the performance of our acceptance-maximizing mechanisms. We apply our general approach to the specific setting of dichotomous choice, and compare the worst-case rates of acceptance achievable among populations of agents of different types. We also show in the special case of rule selection, i.e., amendment procedures, the method proposed by Abramowitz, Shapiro, and Talmon (2021) achieves universal acceptance with certain agent types.
An operator preconditioning perspective on training in physics-informed machine learning
De Ryck, Tim, Bonnet, Florent, Mishra, Siddhartha, de Bรฉzenac, Emmanuel
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering
Nguyen, Trang, Okazaki, Naoaki
Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution. Existing attempts primarily refine unimodal aspects, overlooking enhancements in multimodal aspects. Besides, diverse interpretations of the input lead to various modes of answer generation, highlighting the role of causal reasoning between interpreting and answering steps in VQA. Through this lens, we propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors. CopVQA first operates a pool of pathways that capture diverse causal reasoning flows through interpreting and answering stages. Mirroring human cognition, we decompose the responsibility of each stage into distinct experts and a cognition-enabled component (CC). The two CCs strategically execute one expert for each stage at a time. Finally, we prioritize answer predictions governed by pathways involving both CCs while disregarding answers produced by either CC, thereby emphasizing causal reasoning and supporting generalization. Our experiments on real-life and medical data consistently verify that CopVQA improves VQA performance and generalization across baselines and domains. Notably, CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.
Universal Humanoid Motion Representations for Physics-Based Control
Luo, Zhengyi, Cao, Jinkun, Merel, Josh, Winkler, Alexander, Huang, Jing, Kitani, Kris, Xu, Weipeng
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. Due to the high-dimensionality of humanoid control as well as the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers its applicability in complex tasks. Our work closes this gap, significantly increasing the coverage of motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator. This is achieved using an encoder-decoder structure with a variational information bottleneck. Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. Sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using natural and realistic human behavior. We demonstrate the effectiveness of our motion representation by solving generative tasks (e.g. strike, terrain traversal) and motion tracking using VR controllers.
Robust Model-Based Optimization for Challenging Fitness Landscapes
Ghaffari, Saba, Saleh, Ehsan, Schwing, Alexander G., Wang, Yu-Xiong, Burke, Martin D., Sinha, Saurabh
Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recognized but equally important problem stems from the distribution of training samples in the design space: leading methods are not designed for scenarios where the desired optimum is in a region that is not only poorly represented in training data, but also relatively far from the highly represented low-fitness regions. We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools and propose a new approach that uses a novel VAE as its search model to overcome the problem. We demonstrate its advantage over prior methods in robustly finding improved samples, regardless of the imbalance and separation between low- and high-fitness training samples. Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. Our implementation is available at https://github.com/sabagh1994/PGVAE.
It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
Bertsch, Amanda, Xie, Alex, Neubig, Graham, Gormley, Matthew R.
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.
Tangent Model Composition for Ensembling and Continual Fine-tuning
The computational architecture of Transformers [52] has been leveraged extensively to co-opt Tangent Model Composition (TMC) is a method to combine the compositional structure of data through prompts or tokens component models independently fine-tuned around [29, 55], but still the activations of trained models do a pre-trained point. Component models are tangent vectors not appear to be meaningfully composable. Compositionality to the pre-trained model that can be added, scaled, of neural activity would allow one to combine activations or subtracted to support incremental learning, ensembling, from different models to capture novel concepts, or or unlearning. Component models are composed at inference incorporate knowledge from different data without having time via scalar combination, reducing the cost of ensembling to re-train or fine-tune the core models. This would enable to that of a single model. TMC improves accuracy open-universe classification and, more generally, combinatorial by 4.2% compared to ensembling non-linearly finetuned expansion of the hypothesis space. Continual learning models at a 2.5 to 10 reduction of inference cost, could be performed simply by composing models trained on growing linearly with the number of component models.
Physics-Informed Induction Machine Modelling
Shen, Qing, Zhou, Yifan, Zhang, Peng
This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the absence of data; and (3) a data-physics-integrated hybrid NeuIM approach which is adaptive to various levels of data availability. Extensive case studies validate the efficacy of NeuIM and in particular, its advantage over purely data-driven approaches.
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Kayali, Moe, Lykov, Anton, Fountalis, Ilias, Vasiloglou, Nikolaos, Olteanu, Dan, Suciu, Dan
We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models, impact of non-determinism on the outputs and syntactic/semantic signals. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.
Physics-Informed Solution of The Stationary Fokker-Plank Equation for a Class of Nonlinear Dynamical Systems: An Evaluation Study
Alhussein, Hussam, Khasawneh, Mohammed, Daqaq, Mohammed F.
The Fokker-Planck (FP) equation is a linear partial differential equation which governs the temporal and spatial evolution of the probability density function (PDF) associated with the response of stochastic dynamical systems. An exact analytical solution of the FP equation is only available for a limited subset of dynamical systems. Semi-analytical methods are available for larger, yet still a small subset of systems, while traditional computational methods; e.g. Finite Elements and Finite Difference require dividing the computational domain into a grid of discrete points, which incurs significant computational costs for high-dimensional systems. Physics-informed learning offers a potentially powerful alternative to traditional computational schemes. To evaluate its potential, we present a data-free, physics-informed neural network (PINN) framework to solve the FP equation for a class of nonlinear stochastic dynamical systems. In particular, through several examples concerning the stochastic response of the Duffing, Van der Pol, and the Duffing-Van der Pol oscillators, we assess the ability and accuracy of the PINN framework in $i)$ predicting the PDF under the combined effect of additive and multiplicative noise, $ii)$ capturing P-bifurcations of the PDF, and $iii)$ effectively treating high-dimensional systems. Through comparisons with Monte-Carlo simulations and the available literature, we show that PINN can effectively address all of the afore-described points. We also demonstrate that the computational time associated with the PINN solution can be substantially reduced by using transfer learning.