independent distribution
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (5 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (5 more...)
On the Independence Assumption in Neurosymbolic Learning
van Krieken, Emile, Minervini, Pasquale, Ponti, Edoardo M., Vergari, Antonio
State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
Testing distributional assumptions of learning algorithms
Rubinfeld, Ronitt, Vasilyan, Arsen
There are many high dimensional function classes that have fast agnostic learning algorithms when assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be confident that data indeed satisfies such assumption, so that one can trust in output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussianity testers do not exist for the $L_1$ and EMD distance measures. A key step is to show that half-spaces are well-approximated with low-degree polynomials relative to distributions with low-degree moments close to those of a Gaussian. We also go beyond spherically-symmetric distributions, and give a tester-learner pair for halfspaces under the uniform distribution on $\{0,1\}^n$ with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This is achieved using polynomial approximation theory and critical index machinery. We also show there exist some well-studied settings where $2^{\tilde{O}(\sqrt{n})}$ run-time agnostic learning algorithms are available, yet the combined run-times of tester-learner pairs must be as high as $2^{\Omega(n)}$. On that account, the design of tester-learner pairs is a research direction in its own right independent of standard agnostic learning.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (14 more...)
A Closed Form Solution to Best Rank-1 Tensor Approximation via KL divergence Minimization
Ghalamkari, Kazu, Sugiyama, Mahito
Tensor decomposition is a fundamentally challenging problem. Even the simplest case of tensor decomposition, the rank-1 approximation in terms of the Least Squares (LS) error, is known to be NP-hard. Here, we show that, if we consider the KL divergence instead of the LS error, we can analytically derive a closed form solution for the rank-1 tensor that minimizes the KL divergence from a given positive tensor. Our key insight is to treat a positive tensor as a probability distribution and formulate the process of rank-1 approximation as a projection onto the set of rank-1 tensors. This enables us to solve rank-1 approximation by convex optimization. We empirically demonstrate that our algorithm is an order of magnitude faster than the existing rank-1 approximation methods and gives better approximation of given tensors, which supports our theoretical finding.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Africa > Senegal > Kolda Region > Kolda (0.05)
- South America > Brazil (0.04)
- (2 more...)
Equilibrium Computation and Robust Optimization in Zero Sum Games With Submodular Structure
Wilder, Bryan (University of Southern California)
We define a class of zero-sum games with combinatorial structure, where the best response problem of one player is to maximize a submodular function. For example, this class includes security games played on networks, as well as the problem of robustly optimizing a submodular function over the worst case from a set of scenarios. The challenge in computing equilibria is that both players' strategy spaces can be exponentially large. Accordingly, previous algorithms have worst-case exponential runtime and indeed fail to scale up on practical instances. We provide a pseudopolynomial-time algorithm which obtains a guaranteed (1 - 1/e)^2-approximate mixed strategy for the maximizing player. Our algorithm only requires access to a weakened version of a best response oracle for the minimizing player which runs in polynomial time. Experimental results for network security games and a robust budget allocation problem confirm that our algorithm delivers near-optimal solutions and scales to much larger instances than was previously possible.
Non-Depth-First Search against Independent Distributions on an AND-OR Tree
Suzuki and Niida (Ann. Pure. Appl. Logic, 2015) showed the following results on independent distributions (IDs) on an AND-OR tree, where they took only depth-first algorithms into consideration. (1) Among IDs such that probability of the root having value 0 is fixed as a given r such that 0 < r < 1, if d is a maximizer of cost of the best algorithm then d is an independent and identical distribution (IID). (2) Among all IDs, if d is a maximizer of cost of the best algorithm then d is an IID. In the case where non-depth-first algorithms are taken into consideration, the counter parts of (1) and (2) are left open in the above work. Peng et al. (Inform. Process. Lett., 2017) extended (1) and (2) to multi-branching trees, where in (2) they put an additional hypothesis on IDs that probability of the root having value 0 is neither 0 nor 1. We give positive answers for the two questions of Suzuki-Niida. A key to the proof is that if ID d achieves the equilibrium among IDs then we can chose an algorithm of the best cost against d from depth-first algorithms. In addition, we extend the result of Peng et al. to the case where non-depth-first algorithms are taken into consideration.