Search
ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision
Liu, Qianying, Guan, Wenyu, Shen, Jianhao, Cheng, Fei, Kurohashi, Sadao
Previous studies have introduced a weakly-supervised paradigm for solving math word problems requiring only the answer value annotation. While these methods search for correct value equation candidates as pseudo labels, they search among a narrow sub-space of the enormous equation space. To address this problem, we propose a novel search algorithm with combinatorial strategy \textbf{ComSearch}, which can compress the search space by excluding mathematically equivalent equations. The compression allows the searching algorithm to enumerate all possible equations and obtain high-quality data. We investigate the noise in the pseudo labels that hold wrong mathematical logic, which we refer to as the \textit{false-matching} problem, and propose a ranking model to denoise the pseudo labels. Our approach holds a flexible framework to utilize two existing supervised math word problem solvers to train pseudo labels, and both achieve state-of-the-art performance in the weak supervision task.
Multi-trip algorithm for multi-depot rural postman problem with rechargeable vehicles
Sathyamurthy, Eashwar, Herrmann, Jeffrey W., Azarm, Shapour
This paper presents a new Mixed Integer Linear Programming (MILP) formulation to find optimal solutions to the problem. The paper also proposes a new heuristic called the multi-trip algorithm for the problem whose solutions are compared against solutions of heuristics from literature and the optimal solutions obtained from the MILP formulation by testing them on both benchmark instances and real-world instances generated from road maps. Results show that the proposed heuristic was able to solve all the instances and produce better solutions than heuristics from the literature on 37 of 39 total instances. Due to the high requirement of memory and compute power, the Gurobi optimizer used for solving the MILP formulation, although it produced optimal solutions, was only able to solve benchmark instances but not real-world instances.
Faithfulness-Aware Decoding Strategies for Abstractive Summarization
Wan, David, Liu, Mengwen, McKeown, Kathleen, Dreyer, Markus, Bansal, Mohit
Despite significant progress in understanding and improving faithfulness in abstractive summarization, the question of how decoding strategies affect faithfulness is less studied. We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization. We find a consistent trend where beam search with large beam sizes produces the most faithful summaries while nucleus sampling generates the least faithful ones. We propose two faithfulness-aware generation methods to further improve faithfulness over current generation techniques: (1) ranking candidates generated by beam search using automatic faithfulness metrics and (2) incorporating lookahead heuristics that produce a faithfulness score on the future summary. We show that both generation methods significantly improve faithfulness across two datasets as evaluated by four automatic faithfulness metrics and human evaluation. To reduce computational cost, we demonstrate a simple distillation approach that allows the model to generate faithful summaries with just greedy decoding. Our code is publicly available at https://github.com/amazon-science/faithful-summarization-generation
Rolling Horizon based Temporal Decomposition for the Offline Pickup and Delivery Problem with Time Windows
Kim, Youngseo, Edirimanna, Danushka, Wilbur, Michael, Pugliese, Philip, Laszka, Aron, Dubey, Abhishek, Samaranayake, Samitha
The offline pickup and delivery problem with time windows (PDPTW) is a classical combinatorial optimization problem in the transportation community, which has proven to be very challenging computationally. Due to the complexity of the problem, practical problem instances can be solved only via heuristics, which trade-off solution quality for computational tractability. Among the various heuristics, a common strategy is problem decomposition, that is, the reduction of a large-scale problem into a collection of smaller sub-problems, with spatial and temporal decompositions being two natural approaches. While spatial decomposition has been successful in certain settings, effective temporal decomposition has been challenging due to the difficulty of stitching together the sub-problem solutions across the decomposition boundaries. In this work, we introduce a novel temporal decomposition scheme for solving a class of PDPTWs that have narrow time windows, for which it is able to provide both fast and high-quality solutions. We utilize techniques that have been popularized recently in the context of online dial-a-ride problems along with the general idea of rolling horizon optimization. To the best of our knowledge, this is the first attempt to solve offline PDPTWs using such an approach. To show the performance and scalability of our framework, we use the optimization of paratransit services as a motivating example. We compare our results with an offline heuristic algorithm using Google OR-Tools. In smaller problem instances, the baseline approach is as competitive as our framework. However, in larger problem instances, our framework is more scalable and can provide good solutions to problem instances of varying degrees of difficulty, while the baseline algorithm often fails to find a feasible solution within comparable compute times.
Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains
Aloor, Jasmine Jerry, Patrikar, Jay, Kapoor, Parv, Oh, Jean, Scherer, Sebastian
Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.
A Formal Metareasoning Model of Concurrent Planning and Execution
Elboher, Amihay, Bensoussan, Ava, Karpas, Erez, Ruml, Wheeler, Shperberg, Shahaf S., Shimony, Solomon E.
Agents that plan and act in the real world must deal with the fact that time passes as they are planning. When timing is tight, there may be insufficient time to complete the search for a plan before it is time to act. By commencing execution before search concludes, one gains time to search by making planning and execution concurrent. However, this incurs the risk of making incorrect action choices, especially if actions are irreversible. This tradeoff between opportunity and risk is the problem addressed in this paper. Our main contribution is to formally define this setting as an abstract metareasoning problem. We find that the abstract problem is intractable. However, we identify special cases that are solvable in polynomial time, develop greedy solution algorithms, and, through tests on instances derived from search problems, find several methods that achieve promising practical performance. This work lays the foundation for a principled time-aware executive that concurrently plans and executes.
Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance
Tang, Chen, Ouyang, Kai, Wang, Zhi, Zhu, Yifei, Wang, Yaowei, Ji, Wen, Zhu, Wenwu
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 s, which improves time efficiency exponentially compared to iterative search methods. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate). Code is available on https://github.com/1hunters/LIMPQ.
Agent-based Collaborative Random Search for Hyper-parameter Tuning and Global Function Optimization
Esmaeili, Ahmad, Ghorrati, Zahra, Matson, Eric T.
Almost all Machine Learning (ML) algorithms comprise a set of hyper-parameters that control their learning process and the quality of their resulting models. The number of hidden units, the learning rate, the minibatch sizes, etc. in neural networks; the kernel parameters and regularization penalty amount in support vector machines; and maximum depth, samples split criteria, and the number of used features in decision trees are few common hyper-parameter examples that need to be configured for the corresponding learning algorithms. Assuming a specific ML algorithm and a dataset, one can build countless number of models, each with potentially different performance and/or learning speeds, by assigning different values to the algorithm's hyper-parameters. While they provide ultimate flexibility in using ML algorithms in different scenarios, they also account for most failures and tedious development procedures. Unsurprisingly, there are numerous studies and practices in the machine learning community devoted to the optimization of hyperparameter. The most straightforward yet difficult approach utilizes expert knowledge to identify potentially better candidates in hyper-parameter search spaces to evaluate and use. The availability of expert knowledge and generating reproducible results are among the primary limitation of such manual searching technique [1], particularly due to the fact that using any learning algorithm on different datasets likely requires different sets of hyper-parameter values [2].
Diffusion Models are Minimax Optimal Distribution Estimators
Oko, Kazusato, Akiyama, Shunta, Suzuki, Taiji
While efficient distribution learning is no doubt behind the groundbreaking success of diffusion modeling, its theoretical guarantees are quite limited. In this paper, we provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling for well-known function spaces. The highlight of this paper is that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates in the total variation distance and in the Wasserstein distance of order one. Furthermore, we extend our theory to demonstrate how diffusion models adapt to low-dimensional data distributions. We expect these results advance theoretical understandings of diffusion modeling and its ability to generate verisimilar outputs.
Conflict-driven Structural Learning Towards Higher Coverage Rate in ATPG
Zhen, Hui-Ling, Wang, Naixing, Huang, Junhua, Huang, Xinyue, Yuan, Mingxuan, Huang, Yu
Due to the increasing challenges posed by the relentless rise in the design complexity of integrated circuits, Boolean Satisfiability (SAT) has emerged as a robust alternative to structural APTG techniques. However, the high cost of transforming a circuit testing problem to a Conjunctive Normal Form (CNF) limits the application of SAT in industrial ATPG scenarios, resulting in a loss of test coverage. In Order to address this problem, this paper proposes a conflict-driven structural learning (CDSL) ATPG algorithm firstly, in which the conflict-driven heuristic methods in modern SAT solver are implemented on the logic cone of fault propagation and activation directly. The proposed CDSL algorithm is composed of three parts: (1) According to the implication graph, various conflict constraints have been learned to prune search space. (2) Conflict-driven implication and justification have been applied to increase decision accuracy and solving efficiency. (3) A conflict-based diagnosis method is further proposed in the case of low coverage debug, leading to making the aborted faults testable by relaxing or modifying some constraints on primary inputs. Extensive experimental results on industrial circuits demonstrate the effectiveness and efficiency of the proposed CDSL algorithm. It is shown that compared with the SAT-based ATPG, the proposed CDSL can on average decrease $25.6\%$ aborted faults with $94.51\%$ less run time. With a two-stage computational flow, it has shown that the proposed CDSL can lead to $46.37\%$ less aborted faults than a one-stage structural algorithm, further with the $3.19\%$ improvement on fault coverage. In addition, the conflict diagnosis can lead to $8.89\%$ less aborted faults on average, and $0.271\%$ improvement in fault coverage rate.