gower
DeepMind AI gets silver medal at International Mathematical Olympiad
DeepMind's AlphaProof AI can tackle a range of mathematical problems An AI from Google DeepMind has achieved a silver medal score at this year's International Mathematical Olympiad (IMO), the first time any AI has made it to the podium. The IMO is considered the world's most prestigious competition for young mathematicians. Correctly answering its test questions requires mathematical ability that AI systems typically lack. In January, Google DeepMind demonstrated AlphaGeometry, an AI system that could answer some IMO geometry questions as well as humans. However, this was not from a live competition, and it couldn't answer questions from other mathematical disciplines, such as number theory, algebra and combinatorics, which is necessary to win an IMO medal.
Gower's similarity coefficients with automatic weight selection
Nearest-neighbor methods have become popular in statistics and play a key role in statistical learning. Important decisions in nearest-neighbor methods concern the variables to use (when many potential candidates exist) and how to measure the dissimilarity between units. The first decision depends on the scope of the application while second depends mainly on the type of variables. Unfortunately, relatively few options permit to handle mixed-type variables, a situation frequently encountered in practical applications. The most popular dissimilarity for mixed-type variables is derived as the complement to one of the Gower's similarity coefficient. It is appealing because ranges between 0 and 1, being an average of the scaled dissimilarities calculated variable by variable, handles missing values and allows for a user-defined weighting scheme when averaging dissimilarities. The discussion on the weighting schemes is sometimes misleading since it often ignores that the unweighted "standard" setting hides an unbalanced contribution of the single variables to the overall dissimilarity. We address this drawback following the recent idea of introducing a weighting scheme that minimizes the differences in the correlation between each contributing dissimilarity and the resulting weighted Gower's dissimilarity. In particular, this note proposes different approaches for measuring the correlation depending on the type of variables. The performances of the proposed approaches are evaluated in simulation studies related to classification and imputation of missing values.
A Nystr\"om method with missing distances
Lichtenberg, Samuel, Tasissa, Abiy
We study the problem of determining the configuration of $n$ points, referred to as mobile nodes, by utilizing pairwise distances to $m$ fixed points known as anchor nodes. In the standard setting, we have information about the distances between anchors (anchor-anchor) and between anchors and mobile nodes (anchor-mobile), but the distances between mobile nodes (mobile-mobile) are not known. For this setup, the Nystr\"om method is a viable technique for estimating the positions of the mobile nodes. This study focuses on the setting where the anchor-mobile block of the distance matrix contains only partial distance information. First, we establish a relationship between the columns of the anchor-mobile block in the distance matrix and the columns of the corresponding block in the Gram matrix via a graph Laplacian. Exploiting this connection, we introduce a novel sampling model that frames the position estimation problem as low-rank recovery of an inner product matrix, given a subset of its expansion coefficients in a special non-orthogonal basis. This basis and its dual basis--the central elements of our model--are explicitly derived. Our analysis is grounded in a specific centering of the points that is unique to the Nystr\"om method. With this in mind, we extend previous work in Euclidean distance geometry by providing a general dual basis approach for points centered anywhere.
Distances with mixed type variables some modified Gower's coefficients
Nearest neighbor methods have become popular in official statistics, mainly in imputation or in statistical matching problems; they play a key role in machine learning too, where a high number of variants have been proposed. The choice of the distance function depends mainly on the type of the selected variables. Unfortunately, relatively few options permit to handle mixed type variables, a situation frequently encountered in official statistics. The most popular distance for mixed type variables is derived as the complement of the Gower's similarity coefficient; it is appealing because ranges between 0 and 1 and allows to handle missing values. Unfortunately, the unweighted standard setting the contribution of the single variables to the overall Gower's distance is unbalanced because of the different nature of the variables themselves. This article tries to address the main drawbacks that affect the overall unweighted Gower's distance by suggesting some modifications in calculating the distance on the interval and ratio scaled variables. Simple modifications try to attenuate the impact of outliers on the scaled Manhattan distance; other modifications, relying on the kernel density estimation methods attempt to reduce the unbalanced contribution of the different types of variables. The performance of the proposals is evaluated in simulations mimicking the imputation of missing values through nearest neighbor distance hotdeck method.
On the Existence of Kernel Function for Kernel-Trick of k-Means
This paper corrects the proof of the Theorem 2 from the Gower's paper \cite[page 5]{Gower:1982}. The correction is needed in order to establish the existence of the kernel function used commonly in the kernel trick e.g. for $k$-means clustering algorithm, on the grounds of distance matrix. The scope of correction is explained in section 2.
A Simple Proof From the Pattern-Matching Card Game Set Stuns Mathematicians
In a series of papers posted online in recent weeks, mathematicians have solved a problem about the pattern-matching card game Set that predates the game itself. The solution, whose simplicity has stunned mathematicians, is already leading to advances in other combinatorics problems. Invented in 1974, Set has a simple goal: to find special triples called "sets" within a deck of 81 cards. Each card displays a different design with four attributes--color (which can be red, purple or green), shape (oval, diamond or squiggle), shading (solid, striped or outlined) and number (one, two or three copies of the shape). In typical play, 12 cards are placed face-up and the players search for a set: three cards whose designs, for each attribute, are either all the same or all different.