Not enough data to create a plot.
Try a different view from the menu above.
He, Xi
Provably optimal decision trees with arbitrary splitting rules in polynomial time
He, Xi, Little, Max A.
In this paper, we introduce a generic data structure called decision trees, which integrates several well-known data structures, including binary search trees, K-D trees, binary space partition trees, and decision tree models from machine learning. We provide the first axiomatic definition of decision trees. These axioms establish a firm mathematical foundation for studying decision tree problems. We refer to decision trees that satisfy the axioms as proper decision trees. We prove that only proper decision trees can be uniquely characterized as K-permutations. Since permutations are among the most well-studied combinatorial structures, this characterization provides a fundamental basis for analyzing the combinatorial and algorithmic properties of decision trees. As a result of this advancement, we develop the first provably correct polynomial-time algorithm for solving the optimal decision tree problem. Our algorithm is derived using a formal program derivation framework, which enables step-by-step equational reasoning to construct side-effect-free programs with guaranteed correctness. The derived algorithm is correct by construction and is applicable to decision tree problems defined by any splitting rules that adhere to the axioms and any objective functions that can be specified in a given form. Examples include the decision tree problems where splitting rules are defined by axis-parallel hyperplanes, arbitrary hyperplanes, and hypersurfaces. By extending the axioms, we can potentially address a broader range of problems. Moreover, the derived algorithm can easily accommodate various constraints, such as tree depth and leaf size, and is amenable to acceleration techniques such as thinning method.
ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
Pang, Wei, Shafieinejad, Masoumeh, Liu, Lucy, He, Xi
Recent research in tabular data synthesis has focused on single tables, whereas real-world applications often involve complex data with tens or hundreds of interconnected tables. Previous approaches to synthesizing multi-relational (multi-table) data fall short in two key aspects: scalability for larger datasets and capturing long-range dependencies, such as correlations between attributes spread across different tables. Inspired by the success of diffusion models in tabular data modeling, we introduce $\textbf{C}luster$ $\textbf{La}tent$ $\textbf{Va}riable$ $guided$ $\textbf{D}enoising$ $\textbf{D}iffusion$ $\textbf{P}robabilistic$ $\textbf{M}odels$ (ClavaDDPM). This novel approach leverages clustering labels as intermediaries to model relationships between tables, specifically focusing on foreign key constraints. ClavaDDPM leverages the robust generation capabilities of diffusion models while incorporating efficient algorithms to propagate the learned latent variables across tables. This enables ClavaDDPM to capture long-range dependencies effectively. Extensive evaluations on multi-table datasets of varying sizes show that ClavaDDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data.
EKM: An exact, polynomial-time algorithm for the $K$-medoids problem
He, Xi, Little, Max A.
The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case $O\left(N^{K+1}\right)$ time complexity. EKM is developed according to recent advances in transformational programming and combinatorial generation, using formal program derivation steps. The derived algorithm is provably correct by construction. We demonstrate the effectiveness of our algorithm by comparing it against various approximate methods on numerous real-world datasets. We show that the wall-clock run time of our algorithm matches the worst-case time complexity analysis on synthetic datasets, clearly outperforming the exponential time complexity of benchmark branch-and-bound based MIP solvers. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this ubiquitous problem.
Dynamic programming by polymorphic semiring algebraic shortcut fusion
Little, Max A., He, Xi, Kayas, Ugur
Dynamic programming (DP) is an algorithmic design paradigm for the efficient, exact solution of otherwise intractable, combinatorial problems. However, DP algorithm design is often presented in an ad-hoc manner. It is sometimes difficult to justify algorithm correctness. To address this issue, this paper presents a rigorous algebraic formalism for systematically deriving DP algorithms, based on semiring polymorphism. We start with a specification, construct an algorithm to compute the required solution which is self-evidently correct because it exhaustively generates and evaluates all possible solutions meeting the specification. We then derive, through the use of shortcut fusion, an implementation of this algorithm which is both efficient and correct. We also demonstrate how, with the use of semiring lifting, the specification can be augmented with combinatorial constraints, showing how these constraints can be fused with the algorithm. We furthermore demonstrate how existing DP algorithms for a given combinatorial problem can be abstracted from their original context and re-purposed. This approach can be applied to the full scope of combinatorial problems expressible in terms of semirings. This includes, for example: optimal probability and Viterbi decoding, probabilistic marginalization, logical inference, fuzzy sets, differentiable softmax, relational and provenance queries. The approach, building on ideas from the existing literature on constructive algorithmics, exploits generic properties of polymorphic functions, tupling and formal sums and algebraic simplifications arising from constraint algebras. We demonstrate the effectiveness of this formalism for some example applications arising in signal processing, bioinformatics and reliability engineering. Python software implementing these algorithms can be downloaded from: http://www.maxlittle.net/software/dppolyalg.zip.
An efficient, provably exact, practical algorithm for the 0-1 loss linear classification problem
He, Xi, Rahman, Waheed Ul, Little, Max A.
There has been an increasing trend to leverage machine learning (ML) for high-stakes prediction applications that deeply impact human lives. Many of these ML models are "black boxes" with highly complex, inscrutable functional forms. In high-stakes applications such as healthcare and criminal justice, black box ML predictions have incorrectly denied parole [Wexler, 2017], misclassified highly polluted air as safe to breathe [McGough, 2018], and suggested poor allocation of valuable, limited resources in medicine and energy reliability [Varshney and Alemzadeh, 2017]. In such high-stakes applications of ML, we always want the best possible prediction, and we want to know how the model makes these predictions so that we can be confident the predictions are meaningful [Rudin, 2022]. In short, the ideal model is simple enough to be easily understood (interpretable), and optimally accurate (exact). Hence, in high-stakes applications of ML, we always want the best possible prediction, and we want to know how the model makes these predictions so that we can be confident the predictions are meaningful. In short, the ideal model is simple enough to understand and optimally accurate, then our interpretations of the results can be faithful to what our model actually computes. Another compelling reason why simple models are preferable is because such low complexity models usually provide better statistical generality, in the sense that a classifier fit to some training dataset, will work well on another dataset drawn from the same distribution to which we do not have access (works well out-of-sample). The VC dimension is a key measure of the complexity of a classification model.
The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection
Mohapatra, Shubhankar, Sasy, Sajin, He, Xi, Kamath, Gautam, Thakkar, Om
Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clipping norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.
Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy
Jahani, Majid, He, Xi, Ma, Chenxin, Mokhtari, Aryan, Mudigere, Dheevatsa, Ribeiro, Alejandro, Takáč, Martin
In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes. Various iteration complexity results regarding descent direction computation, communication efficiency and stopping criteria are analyzed under convex setting. Our numerical results illustrate that the proposed method outperforms other comparable methods for solving learning problems including neural networks.
Distributed Hessian-Free Optimization for Deep Neural Network
He, Xi (Lehigh University) | Mudigere, Dheevatssa (Intel Labs, India) | Smelyanskiy, Mikhail (Intel Labs, SC) | Takac, Martin (Lehigh University)
Training deep neural network is a high dimensional and a highly non-convex optimization problem. In this paper, we revisit Hessian-free optimization method for deep networks with negative curvature direction detection. We also develop its distributed variant and demonstrate superior scaling potential to SGD, which allows more efficiently utilizing larger computing resources thus enabling large models and faster time to obtain desired solution. We show that these techniques accelerate the training process for both the standard MNIST dataset and also the TIMIT speech recognition problem, demonstrating robust performance with upto an order of magnitude larger batch sizes. This increased scaling potential is illustrated with near linear speed-up on upto 32 CPU nodes for a simple 4-layer network.