arborescence
3df80af53dce8435cf9ad6c3e7a403fd-Paper.pdf
The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators areeasy toimplement and lowvariance, butthegoal ofscaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalizetheGumbel-Softmax tricktocombinatorial spaces.
Maximizing Value in Challenge the Champ Tournaments
Bhaskar, Umang, Chaudhary, Juhi, Dey, Palash
A tournament is a method to decide the winner in a competition, and describes the overall sequence in which matches between the players are held. While deciding a worthy winner is the primary goal of a tournament, a close second is to maximize the value generated for the matches played, with value for a match measured either in terms of tickets sold, television viewership, advertising revenue, or other means. Tournament organizers often seed the players -- i.e., decide which matches are played -- to increase this value. We study the value maximization objective in a particular tournament format called Challenge the Champ. This is a simple tournament format where an ordering of the players is decided. The first player in this order is the initial champion. The remaining players in order challenge the current champion; if a challenger wins, she replaces the current champion. We model the outcome of a match between two players using a complete directed graph, called a strength graph, with each player represented as a vertex, and the direction of an edge indicating the winner in a match. The value-maximization objective has been recently explored for knockout tournaments when the strength graph is a directed acyclic graph (DAG). We extend the investigation to Challenge the Champ tournaments and general strength graphs. We study different representations of the value of each match, and completely characterize the computational complexity of the problem.
Generalized Naive Bayes
Kovács, Edith Alice, Ország, Anna, Pfeifer, Dániel, Benczúr, András
In this paper we introduce the so-called Generalized Naive Bayes structure as an extension of the Naive Bayes structure. We give a new greedy algorithm that finds a good fitting Generalized Naive Bayes (GNB) probability distribution. We prove that this fits the data at least as well as the probability distribution determined by the classical Naive Bayes (NB). Then, under a not very restrictive condition, we give a second algorithm for which we can prove that it finds the optimal GNB probability distribution, i.e. best fitting structure in the sense of KL divergence. Both algorithms are constructed to maximize the information content and aim to minimize redundancy. Based on these algorithms, new methods for feature selection are introduced. We discuss the similarities and differences to other related algorithms in terms of structure, methodology, and complexity. Experimental results show, that the algorithms introduced outperform the related algorithms in many cases.
Almost Sure Convergence of Dropout Algorithms for Neural Networks
Senen-Cerda, Albert, Sanders, Jaron
We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that have been inspired by Dropout (Hinton et al., 2012). With the goal of avoiding overfitting during training of NNs, dropout algorithms consist in practice of multiplying the weight matrices of a NN componentwise by independently drawn random matrices with $\{0, 1 \}$-valued entries during each iteration of Stochastic Gradient Descent (SGD). This paper presents a probability theoretical proof that for fully-connected NNs with differentiable, polynomially bounded activation functions, if we project the weights onto a compact set when using a dropout algorithm, then the weights of the NN converge to a unique stationary point of a projected system of Ordinary Differential Equations (ODEs). After this general convergence guarantee, we go on to investigate the convergence rate of dropout. Firstly, we obtain generic sample complexity bounds for finding $\epsilon$-stationary points of smooth nonconvex functions using SGD with dropout that explicitly depend on the dropout probability. Secondly, we obtain an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for NNs with the shape of arborescences of arbitrary depth and with linear activation functions. The latter bound shows that for an algorithm such as Dropout or Dropconnect (Wan et al., 2013), the convergence rate can be impaired exponentially by the depth of the arborescence. In contrast, we experimentally observe no such dependence for wide NNs with just a few dropout layers. We also provide a heuristic argument for this observation. Our results suggest that there is a change of scale of the effect of the dropout probability in the convergence rate that depends on the relative size of the width of the NN compared to its depth.
On graph-based reentrancy-free semantic parsing
We propose a novel graph-based approach for semantic parsing that resolves two problems observed in the literature: (1) seq2seq models fail on compositional generalization tasks; (2) previous work using phrase structure parsers cannot cover all the semantic parses observed in treebanks. We prove that both MAP inference and latent tag anchoring (required for weakly-supervised learning) are NP-hard problems. We propose two optimization algorithms based on constraint smoothing and conditional gradient to approximately solve these inference problems. Experimentally, our approach delivers state-of-the-art results on Geoquery, Scan and Clevr, both for i.i.d. splits and for splits that test for compositional generalization.
Hierarchies over Vector Space: Orienting Word and Graph Embeddings
Word and graph embeddings are widely used in deep learning applications. We present a data structure that captures inherent hierarchical properties from an unordered flat embedding space, particularly a sense of direction between pairs of entities. Inspired by the notion of \textit{distributional generality}, our algorithm constructs an arborescence (a directed rooted tree) by inserting nodes in descending order of entity power (e.g., word frequency), pointing each entity to the closest more powerful node as its parent. We evaluate the performance of the resulting tree structures on three tasks: hypernym relation discovery, least-common-ancestor (LCA) discovery among words, and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for hypernym and LCA discovery across five languages and 62.76\% accuracy on directed Wiki-page link recovery, with both substantially above baselines. Finally, we investigate the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.
Computing Diverse Shortest Paths Efficiently: A Theoretical and Experimental Study
Hanaka, Tesshu, Kobayashi, Yasuaki, Kurita, Kazuhiro, Lee, See Woo, Otachi, Yota
Finding diverse solutions in combinatorial problems recently has received considerable attention (Baste et al. 2020; Fomin et al. 2020; Hanaka et al. 2021). In this paper we study the following type of problems: given an integer $k$, the problem asks for $k$ solutions such that the sum of pairwise (weighted) Hamming distances between these solutions is maximized. Such solutions are called diverse solutions. We present a polynomial-time algorithm for finding diverse shortest $st$-paths in weighted directed graphs. Moreover, we study the diverse version of other classical combinatorial problems such as diverse weighted matroid bases, diverse weighted arborescences, and diverse bipartite matchings. We show that these problems can be solved in polynomial time as well. To evaluate the practical performance of our algorithm for finding diverse shortest $st$-paths, we conduct a computational experiment with synthetic and real-world instances.The experiment shows that our algorithm successfully computes diverse solutions within reasonable computational time.
A Neural Network Perturbation Theory Based on the Born Series
Kaspschak, Bastian, Meißner, Ulf-G.
Deep Learning has become an attractive approach towards various data-based problems of theoretical physics in the past decade. Its protagonists, the deep neural networks (DNNs), are capable of making accurate predictions for data of arbitrarily high complexity. A well-known issue most DNNs share is their lack of interpretability. In order to explain their behavior and extract physical laws they have discovered during training, a suitable interpretation method has, therefore, to be applied post-hoc. Due to its simplicity and ubiquity in quantum physics, we decide to present a rather general interpretation method in the context of two-body scattering: We find a one-to-one correspondence between the $n^\text{th}$-order Born approximation and the $n^\text{th}$-order Taylor approximation of deep multilayer perceptrons (MLPs), that predict S-wave scattering lengths $a_0$ for discretized, attractive potentials of finite range. This defines a perturbation theory for MLPs similarily to Born approximations defining a perturbation theory for $a_0$. In the case of shallow potentials, lower-order approximations, that can be argued to be local interpretations of respective MLPs, reliably reproduce $a_0$. As deep MLPs are highly nested functions, the computation of higher-order partial derivatives, which is substantial for a Taylor approximation, is an effortful endeavour. By introducing quantities we refer to as propagators and vertices and that depend on the MLP's weights and biases, we establish a graph-theoretical approach towards partial derivatives and local interpretability. Similar to Feynman rules in quantum field theories, we find rules that systematically assign diagrams consisting of propagators and vertices to the corresponding order of the MLP perturbation theory.
Learning Conserved Networks from Flows
P., Satya Jayadev, Narasimhan, Shankar, Bhatt, Nirav
The network reconstruction problem is one of the challenging problems in network science. This work deals with reconstructing networks in which the flows are conserved around the nodes. These networks are referred to as conserved networks. We propose a novel concept of conservation graph for describing conserved networks. The properties of conservation graph are investigated. We develop a methodology to reconstruct conserved networks from flows by combining these graph properties with learning techniques, with polynomial time complexity. We show that exact network reconstruction is possible for radial networks. Further, we extend the methodology for reconstructing networks from noisy data. We demonstrate the proposed methods on different types of radial networks.
Rigging Nearly Acyclic Tournaments Is Fixed-Parameter Tractable
Ramanujan, M. S. (Technische Universität Wien) | Szeider, Stefan (Technische Universität Wien)
Single-elimination tournaments (or knockout tournaments) are a popular format in sports competitions that is also widely used for decision making and elections. In this paper we study the algorithmic problem of manipulating the outcome of a tournament. More specifically, we study the problem of finding a seeding of the players such that a certain player wins the resulting tournament. The problem is known to be NP-hard in general. In this paper we present an algorithm for this problem that exploits structural restrictions on the tournament. More specifically, we establish that the problem is fixed-parameter tractable when parameterized by the size of a smallest feedback arc set of the tournament (interpreting the tournament as an oriented complete graph). This is a natural parameter because most problems on tournaments (including this one) are either trivial or easily solvable on acyclic tournaments, leading to the question — what about nearly acyclic tournaments or tournaments with a small feedback arc set? Our result significantly improves upon a recent algorithm by Aziz et al. (2014) whose running time is bounded by an exponential function where the size of a smallest feedback arc set appears in the exponent and the base is the number of players.