Collaborating Authors

Encoding fixed length high cardinality non-numeric columns for a ML algorithm


ML algorithms work only with numerical values. So there is a need to model a problem and its data completely in numbers. For example, to run a clustering algorithm on a road network, representing the network / graph as an adjacency matrix is one way to model it. Similarly, a tabular data with a mix of numerical and non-numerical / categorical data also needs to be transformed or encoded to a table of only numbers for a ML algorithm to work on. Columns of string values are quite common in tabular data and in this article, some ideas on how to encode them, especially ones with high cardinality and are of known lengths like IP addresses, mobile numbers etc. are discussed.

Latent Tree Models and Approximate Inference in Bayesian Networks

Journal of Artificial Intelligence Research

We propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are tree-structured, inference takes linear time. In the meantime, they can represent complex relationship among leaf nodes and hence the approximation accuracy is often good. Empirical evidence shows that our method can achieve good approximation accuracy at low online computational cost.

Revisiting Counting Solutions for the Global Cardinality Constraint

Journal of Artificial Intelligence Research

Counting solutions for a combinatorial problem has been identified as an important concern within the Artificial Intelligence field. It is indeed very helpful when exploring the structure of the solution space. In this context, this paper revisits the computation process to count solutions for the global cardinality constraint in the context of counting-based search. It first highlights an error and then presents a way to correct the upper bound on the number of solutions for this constraint.

Fast Exact Inference for Recursive Cardinality Models Machine Learning

Cardinality potentials are a generally useful class of high order potential that affect probabilities based on how many of D binary variables are active. Maximum a posteriori (MAP) inference for cardinality potential models is well-understood, with efficient computations taking O(DlogD) time. Yet efficient marginalization and sampling have not been addressed as thoroughly in the machine learning community. We show that there exists a simple algorithm for computing marginal probabilities and drawing exact joint samples that runs in O(Dlog2 D) time, and we show how to frame the algorithm as efficient belief propagation in a low order tree-structured model that includes additional auxiliary variables. We then develop a new, more general class of models, termed Recursive Cardinality models, which take advantage of this efficiency. Finally, we show how to do efficient exact inference in models composed of a tree structure and a cardinality potential. We explore the expressive power of Recursive Cardinality models and empirically demonstrate their utility.

Maximizing Non-Monotone DR-Submodular Functions with Cardinality Constraints Artificial Intelligence

We consider the problem of maximizing a non-monotone DR-submodular function subject to a cardinality constraint. Diminishing returns (DR) submodularity is a generalization of the diminishing returns property for functions defined over the integer lattice. This generalization can be used to solve many machine learning or combinatorial optimization problems such as optimal budget allocation, revenue maximization, etc. In this work we propose the first polynomial-time approximation algorithms for non-monotone constrained maximization. We implement our algorithms for a revenue maximization problem with a real-world dataset to check their efficiency and performance.