Collaborating Authors


Objective Bayesian Nets for Integrating Consistent Datasets

Journal of Artificial Intelligence Research

This paper addresses a data integration problem: given several mutually consistent datasets each of which measures a subset of the variables of interest, how can one construct a probabilistic model that fits the data and gives reasonable answers to questions which are under-determined by the data? Here we show how to obtain a Bayesian network model which represents the unique probability function that agrees with the probability distributions measured by the datasets and otherwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantial efficiency savings over the standard brute-force approach to determining the maximum entropy probability function. Furthermore, we develop modifications to the general algorithm which enable further efficiency savings but which are only applicable in particular situations. We show that there are circumstances in which one can obtain the model (i) directly from the data; (ii) by solving algebraic problems; and (iii) by solving relatively simple independent optimisation problems.

Adaptive Greedy versus Non-adaptive Greedy for Influence Maximization

Journal of Artificial Intelligence Research

We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1 − 1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1−1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1 − 1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular diffusion model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm.

Detect social media fake news using graph machine learning with Amazon Neptune ML


In recent years, social media has become a common means for sharing and consuming news. However, the spread of misinformation and fake news on these platforms has posed a major challenge to the well-being of individuals and societies. Therefore, it is imperative that we develop robust and automated solutions for early detection of fake news on social media. Traditional approaches rely purely on the news content (using natural language processing) to mark information as real or fake. However, the social context in which the news is published and shared can provide additional insights into the nature of fake news on social media and improve the predictive capabilities of fake news detection tools.

Could Big Data Apps Prevent the Next Pandemic?


For programmers, algorithms and data structures are their most essential subjects--a programmer's bread and butter if you will. If you want to enter the field of programming and hit the ground running, you'll need to master the most common data structures and boost your resume with in-demand skills. Here, we'll explore the eight most important data structures every programmer should know, including what they do and where to use them. To start, let's gain a fundamental understanding of what a data structure is. Data structures are methods of storing and organizing data in a computer system so that operations can be performed upon them more efficiently.

A Group Effort

Communications of the ACM

Fifty years ago, mathematician Paul Erds posed a problem to friends at one of his regular tea parties. The trio thought they would be able to come up with a solution the same afternoon. It took 49 years for other mathematicians to provide an answer. The Erds-Faber-Lovász conjecture focused on a familiar question in mathematics, one of graph coloring. However, this was not on a conventional graph, but on another more-complex structure: a hypergraph.

Geometry of the Minimum Volume Confidence Sets Machine Learning

Computation of confidence sets is central to data science and machine learning, serving as the workhorse of A/B testing and underpinning the operation and analysis of reinforcement learning algorithms. This paper studies the geometry of the minimum-volume confidence sets for the multinomial parameter. When used in place of more standard confidence sets and intervals based on bounds and asymptotic approximation, learning algorithms can exhibit improved sample complexity. Prior work showed the minimum-volume confidence sets are the level-sets of a discontinuous function defined by an exact p-value. While the confidence sets are optimal in that they have minimum average volume, computation of membership of a single point in the set is challenging for problems of modest size. Since the confidence sets are level-sets of discontinuous functions, little is apparent about their geometry. This paper studies the geometry of the minimum volume confidence sets by enumerating and covering the continuous regions of the exact p-value function. This addresses a fundamental question in A/B testing: given two multinomial outcomes, how can one determine if their corresponding minimum volume confidence sets are disjoint? We answer this question in a restricted setting.

Efficient Spatial Representation and Routing of Deformable One-Dimensional Objects for Manipulation Artificial Intelligence

With the field of rigid-body robotics having matured in the last fifty years, routing, planning, and manipulation of deformable objects have emerged in recent years as a more untouched research area in many fields ranging from surgical robotics to industrial assembly and construction. Routing approaches for deformable objects which rely on learned implicit spatial representations (e.g., Learning-from-Demonstration methods) make them vulnerable to changes in the environment and the specific setup. On the other hand, algorithms that entirely separate the spatial representation of the deformable object from the routing and manipulation, often using a representation approach independent of planning, result in slow planning in high dimensional space. This paper proposes a novel approach to spatial representation combined with route planning that allows efficient routing of deformable one-dimensional objects (e.g., wires, cables, ropes, threads). The spatial representation is based on the geometrical decomposition of the space into convex subspaces, which allows an efficient coding of the configuration. Having such a configuration, the routing problem can be solved using a dynamic programming matching method with a quadratic time and space complexity. The proposed method couples the routing and efficient configuration for improved planning time. Our tests and experiments show the method correctly computing the next manipulation action in sub-millisecond time and accomplishing various routing and manipulation tasks.

Migrating Techniques from Search-based Multi-Agent Path Finding Solvers to SAT-based Approach

Journal of Artificial Intelligence Research

In the multi-agent path finding problem (MAPF) we are given a set of agents each with respective start and goal positions. The task is to find paths for all agents while avoiding collisions, aiming to minimize a given objective function. Many MAPF solvers were introduced in the past decade for optimizing two specific objective functions: sum-of-costs and makespan. Two prominent categories of solvers can be distinguished: search-based solvers and compilation-based solvers. Search-based solvers were developed and tested for the sum-of-costs objective, while the most prominent compilation-based solvers that are built around Boolean satisfiability (SAT) were designed for the makespan objective. Very little is known on the performance and relevance of solvers from the compilation-based approach on the sum-of-costs objective. In this paper, we start to close the gap between these cost functions in the compilation-based approach. Our main contribution is a new SAT-based MAPF solver called MDD-SAT, that is directly aimed to optimally solve the MAPF problem under the sum-of-costs objective function. Using both a lower bound on the sum-of-costs and an upper bound on the makespan, MDD-SAT is able to generate a reasonable number of Boolean variables in our SAT encoding. We then further improve the encoding by borrowing ideas from ICTS, a search-based solver. In addition, we show that concepts applicable in search-based solvers like ICTS and ICBS are applicable in the SAT-based approach as well. Specifically, we integrate independence detection, a generic technique for decomposing an MAPF instance into independent subproblems, into our SAT-based approach, and we design a relaxation of our optimal SAT-based solver that results in a bounded suboptimal SAT-based solver. Experimental evaluation on several domains shows that there are many scenarios where our SAT-based methods outperform state-of-the-art sum-of-costs search-based solvers, such as variants of the ICTS and ICBS algorithms.


AAAI Conferences

A clique in a graph is a set of vertices, each of which is adjacent to every other vertex in this set. A k-clique relaxes this requirement, requiring vertices to be within a distance k of each other, rather than directly adjacent. In theory, a maximum clique algorithm can easily be adapted to solve the maximum k-clique problem, although large sparse k-clique graphs reduce to large dense clique graphs, which can be computationally challenging. We adapt a state of the art maximum clique algorithm to show that this reduction is in fact useful in practice, and introduce a lazy global domination rule which sometimes vastly reduces the search space. We include experimental results for a range of real-world and benchmark graphs, and a detailed look at random graphs. We also use thread-parallel search to solve some harder instances.


AAAI Conferences

Search using subgoal graphs is a recent preprocessing-based path-planning algorithm that can find shortest paths on 8-neighbor grids several orders of magnitude faster than A*, while requiring little preprocessing time and memory overhead. In this paper, we first generalize the ideas behind subgoal graphs to a framework that can be specialized to different types of environments (represented as weighted directed graphs) through the choice of a reachability relation. Intuitively, a reachability relation identifies pairs of vertices for which a shortest path can be found quickly. A subgoal graph can then be constructed as an overlay graph that is guaranteed to have edges only between vertices that satisfy the reachability relation, which allows one to find shortest paths on the original graph quickly. In the context of this general framework, subgoal graphs on grids use freespace-reachability (originally called h-reachability) as the reachability relation, which holds for pairs of vertices if and only if their distance on the grid with blocked cells is equal to their distance on the grid without blocked cells (freespace assumption). We apply this framework to state lattices by using variants of freespace-reachability as the reachability relation. We provide preliminary results on (x,y,theta)-state lattices, which shows that subgoal graphs can be used to speed up path planning on state lattices as well, although the speed-up is not as significant as it is on grids.