Goto

Collaborating Authors

 submodel


Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization (Appendix) A Model architecture The architecture of the base model in meta-learning is the same as POMO [ 26

Neural Information Processing Systems

Each sublayer adds a skip-connection (ADD) and batch normalization (BN). The decoder sequentially chooses a node according to a probability distribution produced by the node embeddings to construct a solution. The scaled symmetric sampling method is shown in Algorithm 2. The scaled factor The uniform division of the weight space is illustrated as follows. Thus, its approximate Pareto optimal solutions are commonly pursued. V ehicles must serve all the customers and finally return to the depot.






Bounded rationality in structured density estimation: Supplementary material A Experimental details

Neural Information Processing Systems

A.1 Experiment 1 A.1.1 Participants Experiment 1 recruited 21 participants (11 females, aged 18-25). All participants had provided informed consent before the experiment. Cover story Participants were told that they were apprentice magicians in a magical world. In this world, dangerous magic lava rocks were emitted from an unknown number of invisible volcano(es). On each trial, they observed past landing locations of lava rocks in a specific area (on the screen), and their job was to predict the probability density of future landing locations. More specifically, they were asked to draw a probability density by reporting, using click-and-drag mouse gestures, three key properties of the volcano(es), corresponding to the mean, the weight, and the standard deviation of a Gaussian component. They were told that their bonus payment depended on the accuracy of the reported predictive density.


Bounded rationality in structured density estimation Tianyuan T eng

Neural Information Processing Systems

Learning to accurately represent environmental uncertainty is crucial for adaptive and optimal behaviors in various cognitive tasks. However, it remains unclear how the human brain, constrained by finite cognitive resources, internalise the highly structured environmental uncertainty. In this study, we explore how these learned distributions deviate from the ground truth, resulting in observable inconsistency in a novel structured density estimation task. During each trial, human participants were asked to learn and report the latent probability distribution functions underlying sequentially presented independent observations. As the number of observations increased, the reported predictive density became closer to the ground truth. Nevertheless, we observed an intriguing inconsistency in human structure estimation, specifically a large error in the number of reported clusters.


Causal Inference with the "Napkin Graph"

Guo, Anna, Benkeser, David, Nabi, Razieh

arXiv.org Machine Learning

Unmeasured confounding can render identification strategies based on adjustment functionals invalid. We study the "Napkin graph", a causal structure that encapsulates patterns of M-bias, instrumental variables, and the classical back-door and front-door models within a single graphical framework, yet requires a nonstandard identification strategy: the average treatment effect is expressed as a ratio of two g-formulas. We develop novel estimators for this functional, including doubly robust one-step and targeted minimum loss-based estimators that remain asymptotically linear when nuisance functions are estimated at slower-than-parametric rates using machine learning. We also show how a generalized independence restriction encoded by the Napkin graph, known as a Verma constraint, can be exploited to improve efficiency, illustrating more generally how such constraints in hidden variable DAGs can inform semiparametric inference. The proposed methods are validated through simulations and applied to the Finnish Life Course study to estimate the effect of educational attainment on income. An accompanying R package, napkincausal, implements all proposed procedures.


Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$

Vanhoefer, Jakob, Körner, Antonia, Doresic, Domagoj, Hasenauer, Jan, Pathirana, Dilan

arXiv.org Machine Learning

Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.


Federated Submodel Optimization for Hot and Cold Data Features Y ucheng Ding

Neural Information Processing Systems

The global model for federated optimization typically contains a large and sparse embedding layer, while each client's local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors.