Goto

Collaborating Authors

 Search


Subspace Perspective on Canonical Correlation Analysis: Dimension Reduction and Minimax Rates

arXiv.org Machine Learning

Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose to quantify the estimation loss of CCA by the excess prediction loss defined through a prediction-after-dimension-reduction framework. Such framework suggests viewing CCA estimation as estimating the subspaces spanned by the canonical variates. Interestedly, the proposed error metrics derived from the excess prediction loss turn out to be closely related to the principal angles between the subspaces spanned by the population and sample canonical variates respectively. We characterize the non-asymptotic minimax rates under the proposed metrics, especially the dependency of the minimax rates on the key quantities including the dimensions, the condition number of the covariance matrices, the canonical correlations and the eigen-gap, with minimal assumptions on the joint covariance matrix. To the best of our knowledge, this is the first finite sample result that captures the effect of the canonical correlations on the minimax rates.


Artificial Intelligence Nanodegree: Game Playing Agent

#artificialintelligence

The purpose of this project is to design and implement a game playing agent to play a game using adversarial search methods. The goal is to create a game playing agent that defeats our opponent at a game of isolation consistently. For this project I've designed three heuristics to create an effective edge for the game playing agent. Don't know what Isolation is? Check out out this page: isolation.secdevops.ai and compete against my agent. In the case of the game isolation, there are two players competing against each other.


Graph Summarization Methods and Applications: A Survey

arXiv.org Artificial Intelligence

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind, and the challenges of, graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.


Estimating the Number of Connected Components in a Graph via Subgraph Sampling

arXiv.org Machine Learning

Learning properties of large graphs from samples has been an important problem in statistical network analysis since the early work of Goodman \cite{Goodman1949} and Frank \cite{Frank1978}. We revisit a problem formulated by Frank \cite{Frank1978} of estimating the number of connected components in a large graph based on the subgraph sampling model, in which we randomly sample a subset of the vertices and observe the induced subgraph. The key question is whether accurate estimation is achievable in the \emph{sublinear} regime where only a vanishing fraction of the vertices are sampled. We show that it is impossible if the parent graph is allowed to contain high-degree vertices or long induced cycles. For the class of chordal graphs, where induced cycles of length four or above are forbidden, we characterize the optimal sample complexity within constant factors and construct linear-time estimators that provably achieve these bounds. This significantly expands the scope of previous results which have focused on unbiased estimators and special classes of graphs such as forests or cliques. Both the construction and the analysis of the proposed methodology rely on combinatorial properties of chordal graphs and identities of induced subgraph counts. They, in turn, also play a key role in proving minimax lower bounds based on construction of random instances of graphs with matching structures of small subgraphs.


Toward Metric Indexes for Incremental Insertion and Querying

arXiv.org Machine Learning

In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment. This use-case is inspired by a real-life need in malware analysis triage, and is surprisingly understudied. Existing literature tends to either focus on only final query efficiency, often does not support incremental insertion, or does not support arbitrary distance metrics. We modify and improve three algorithms to support our scenario of incremental insertion and querying with arbitrary metrics, and evaluate them on multiple datasets and distance metrics while varying the value of $k$ for the desired number of nearest neighbors. In doing so we determine that our improved Vantage-Point tree of Minimum-Variance performs best for this scenario.


Search Algorithms in Artificial Intelligence โ€“ Hacker Noon

#artificialintelligence

There can be one or many solutions to a given problem, depending on the scenario, As there can be many ways to solve that problem. Think about how do you approach a problem. Lets say you need to do something straight forward like a math multiplication. Clearly there is one correct solution, but many algorithms to multiply, depending on the size of the input. Now, take a more complicated problem, like playing a game(imagine your favorite game, chess, poker, call of duty, DOTA, anything..).


Introduction to Monte Carlo Tree Search - Jeff Bradberry

#artificialintelligence

The subject of game AI generally begins with so-called perfect information games. These are turn-based games where the players have no information hidden from each other and there is no element of chance in the game mechanics (such as by rolling dice or drawing cards from a shuffled deck). Tic Tac Toe, Connect 4, Checkers, Reversi, Chess, and Go are all games of this type. Because everything in this type of game is fully determined, a tree can, in theory, be constructed that contains all possible outcomes, and a value assigned corresponding to a win or a loss for one of the players. Finding the best possible play, then, is a matter of doing a search on the tree, with the method of choice at each level alternating between picking the maximum value and picking the minimum value, matching the different players' conflicting goals, as the search proceeds down the tree.


Friends Make Tactile Rubik's Cube for Visually Impaired

U.S. News

The two bought a generic cube puzzle, since it was looser and would slide easier. They then placed different textured items on each side. One side was left smooth and the other had plastic squares. Another side had scratchy Velcro and the opposite had soft Velcro. The final two sides had squishy craft dots and hard plastic dots.


Gaussian Process bandits with adaptive discretization

arXiv.org Machine Learning

In this paper, the problem of maximizing a black-box function $f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.


Relaxation heuristics for the set multicover problem with generalized upper bound constraints

arXiv.org Artificial Intelligence

We consider an extension of the set covering problem (SCP) introducing (i)~multicover and (ii)~generalized upper bound (GUB)~constraints. For the conventional SCP, the pricing method has been introduced to reduce the size of instances, and several efficient heuristic algorithms based on such reduction techniques have been developed to solve large-scale instances. However, GUB constraints often make the pricing method less effective, because they often prevent solutions from containing highly evaluated variables together. To overcome this problem, we develop heuristic algorithms to reduce the size of instances, in which new evaluation schemes of variables are introduced taking account of GUB constraints. We also develop an efficient implementation of a 2-flip neighborhood local search algorithm that reduces the number of candidates in the neighborhood without sacrificing the solution quality. In order to guide the search to visit a wide variety of good solutions, we also introduce a path relinking method that generates new solutions by combining two or more solutions obtained so far. According to computational comparison on benchmark instances, the proposed method succeeds in selecting a small number of promising variables properly and performs quite effectively even for large-scale instances having hard GUB constraints.