parameter value
Fine Tuning a Simulation-Driven Estimator
Lakshminarayanan, Braghadeesh, Guerrero, Margarita A., Rojas, Cristian R.
Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Alternating optimization of decision trees, with application to learning sparse oblique trees
Learning a decision tree from data is a difficult optimization problem. The most widespread algorithm in practice, dating to the 1980s, is based on a greedy growth of the tree structure by recursively splitting nodes, and possibly pruning back the final tree. The parameters (decision function) of an internal node are approximately estimated by minimizing an impurity measure. We give an algorithm that, given an input tree (its structure and the parameter values at its nodes), produces a new tree with the same or smaller structure but new parameter values that provably lower or leave unchanged the misclassification error. This can be applied to both axis-aligned and oblique trees and our experiments show it consistently outperforms various other algorithms while being highly scalable to large datasets and trees. Further, the same algorithm can handle a sparsity penalty, so it can learn sparse oblique trees, having a structure that is a subset of the original tree and few nonzero parameters. This combines the best of axis-aligned and oblique trees: flexibility to model correlated data, low generalization error, fast inference and interpretable nodes that involve only a few features in their decision.
Path following algorithms for \ell_2 -regularized M -estimation with approximation guarantee
Many modern machine learning algorithms are formulated as regularized M-estimation problems, in which a regularization (tuning) parameter controls a trade-off between model fit to the training data and model complexity. To select the ``best'' tuning parameter value that achieves a good trade-off, an approximated solution path needs to be computed. In practice, this is often done through selecting a grid of tuning parameter values and solving the regularized problem at the selected grid points. However, given any desired level of accuracy, it is often not clear how to choose the grid points and also how accurately one should solve the regularized problems at the selected gird points, both of which can greatly impact the overall amount of computation. In the context of $\ell_2$-regularized $M$-estimation problem, we propose a novel grid point selection scheme and an adaptive stopping criterion for any given optimization algorithm that produces an approximated solution path with approximation error guarantee. Theoretically, we prove that the proposed solution path can approximate the exact solution path to arbitrary level of accuracy, while saving the overall computation as much as possible. Numerical results also corroborate with our theoretical analysis.
- Europe > Germany > Lower Saxony > Oldenburg (0.04)
- Europe > Austria (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Alternating optimization of decision trees, with application to learning sparse oblique trees
Learning a decision tree from data is a difficult optimization problem. The most widespread algorithm in practice, dating to the 1980s, is based on a greedy growth of the tree structure by recursively splitting nodes, and possibly pruning back the final tree. The parameters (decision function) of an internal node are approximately estimated by minimizing an impurity measure. We give an algorithm that, given an input tree (its structure and the parameter values at its nodes), produces a new tree with the same or smaller structure but new parameter values that provably lower or leave unchanged the misclassification error. This can be applied to both axis-aligned and oblique trees and our experiments show it consistently outperforms various other algorithms while being highly scalable to large datasets and trees. Further, the same algorithm can handle a sparsity penalty, so it can learn sparse oblique trees, having a structure that is a subset of the original tree and few nonzero parameters. This combines the best of axis-aligned and oblique trees: flexibility to model correlated data, low generalization error, fast inference and interpretable nodes that involve only a few features in their decision.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Arizona (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Practical, Utilitarian Algorithm Configuration
Graham, Devon, Velez, Eros Rojas, Leyton-Brown, Kevin
Utilitarian algorithm configuration identifies a parameter setting for a given algorithm that maximizes a user's utility. Utility functions offer a theoretically well-grounded approach to optimizing decision-making under uncertainty and are flexible enough to capture a user's preferences over algorithm runtimes (e.g., they can describe a sharp cutoff after which a solution is no longer required, a per-hour cost for compute, or diminishing returns from algorithms that take longer to run). COUP is a recently-introduced utilitarian algorithm configuration procedure which was designed mainly to offer strong theoretical guarantees about the quality of the configuration it returns, with less attention paid to its practical performance. This paper closes that gap, bringing theoretically-grounded, utilitarian algorithm configuration to the point where it is competitive with widely used, heuristic configuration procedures that offer no performance guarantees. We present a series of improvements to COUP that improve its empirical performance without degrading its theoretical guarantees and demonstrate their benefit experimentally. Using a case study, we also illustrate ways of exploring the robustness of a given solution to the algorithm selection problem to variations in the utility function.
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales
After enough time has elapsed, the events that were presented close in time will gradually blend together, as illustrated in the bottom panel of Figure 1. We used the parameters presented in that work for the experiment with the adding problem used here. Table 1: Parameter values used for LSTM networks. Table 2: Parameter values used for LMU networks. "Coupled Oscillatory Recurrent Neural Network (coRNN): An Accurate and (Gradient) Stable Architecture for Learning Long Time Dependencies."
- North America > United States > Virginia (0.06)
- North America > United States > Indiana (0.05)
A Appendix
We show different kinds of perturbations in our benchmarks in Fig.5. We show more image samples of unseen perturbations in Figure 1. We use "snow", "frost", "fog" (left to We also show the comparison when choosing different basis perturbations in Table. 5. The final set we used are better than other sets (e.g., V channel only, or V channel + B channel + Blur) on Notice we don't compare them on single perturbation and The resulting dataset contains approximately 10,000 images. All of them are then randomly split into training/validation/test data with an approximate ratio 20:1:2. There are several good autonomous driving datasets, but not all of them are suitable for the end-to-end learning to steer task.