Mathematical & Statistical Methods
Efficient and Effective Optimal Transport-Based Biclustering: Supplementary Material
Z that represents some transfer of mass between elements of w and v . The proof is the same for W . Proposition 2. Suppose that the target row and column representative distributions are the same, The the Kantorovich OT problem and whose rank is at most min(rank(Z), rank( W)) . Proof of proposition 2. From linear algebra, we have that Proof of proposition 3. We suppose that The optimal transport problem can be formulated and solved as the Earth Mover's Distance (EMD) We report the biclustering performance on the synthetic datasets in table 2. At least one of our models finds the perfect partition in all cases. The gene-expression matrices used are the Cumida Breast Cancer and Leukemia datasets. Their characteristics are shown in Table 3. Table 3: Characteristics of the gene expression datasets.
On computing and the complexity of computing higher-order $U$-statistics, exactly
Chen, Xingyu, Zhang, Ruiqi, Liu, Lin
Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill that gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from an $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics, quantum computing, and quantum information sciences for accelerating tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing $U$-statistics, based on the treewidth of a particular graph associated with the $U$-statistic kernel. The above ingredients lead to a new, much more runtime-efficient algorithm of exactly computing general higher-order $U$-statistics. We also wrap our new algorithm into an open-source Python package called $\texttt{u-stats}$. We demonstrate via three statistical applications that $\texttt{u-stats}$ achieves impressive runtime performance compared to existing benchmarks. This paper aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas further to advance the algorithmic development of $U$-statistics, and (2) to offer the package $\texttt{u-stats}$ as a valuable tool for practitioners, making the implementation of methods based on higher-order $U$-statistics a more delightful experience.
Universal Learning of Nonlinear Dynamics
Dogariu, Evan, Brahmbhatt, Anand, Hazan, Elad
We study the fundamental problem of learning a marginally stable unknown nonlinear dynamical system. We describe an algorithm for this problem, based on the technique of spectral filtering, which learns a mapping from past observations to the next based on a spectral representation of the system. Using techniques from online convex optimization, we prove vanishing prediction error for any nonlinear dynamical system that has finitely many marginally stable modes, with rates governed by a novel quantitative control-theoretic notion of learnability. The main technical component of our method is a new spectral filtering algorithm for linear dynamical systems, which incorporates past observations and applies to general noisy and marginally stable systems. This significantly generalizes the original spectral filtering algorithm to both asymmetric dynamics as well as incorporating noise correction, and is of independent interest.
Unfolded Laplacian Spectral Embedding: A Theoretically Grounded Approach to Dynamic Network Representation
Ezoe, Haruka, Matsumoto, Hiroki, Hisano, Ryohei
Dynamic relational structures play a central role in many AI tasks, but their evolving nature presents challenges for consistent and interpretable representation. A common approach is to learn time-varying node embeddings, whose effectiveness depends on satisfying key stability properties. In this paper, we propose Unfolded Laplacian Spectral Embedding, a new method that extends the Unfolded Adjacency Spectral Embedding framework to normalized Laplacians while preserving both cross-sectional and longitudinal stability. We provide formal proof that our method satisfies these stability conditions. In addition, as a bonus of using the Laplacian matrix, we establish a new Cheeger-style inequality that connects the embeddings to the conductance of the underlying dynamic graphs. Empirical evaluations on synthetic and real-world datasets support our theoretical findings and demonstrate the strong performance of our method. These results establish a principled and stable framework for dynamic network representation grounded in spectral graph theory.
Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs
We propose a framework which makes it feasible to directly train deep neural networks with respect to popular families of task-specific non-decomposable performance measures such as AUC, multi-class AUC, F -measure and others. A feature of the optimization model that emerges from these tasks is that it involves solving a Linear Programs (LP) during training where representations learned by upstream layers characterize the constraints or the feasible set. The constraint matrix is not only large but the constraints are also modified at each iteration. We show how adopting a set of ingenious ideas proposed by Mangasarian for 1-norm SVMs - which advocates for solving LPs with a generalized Newton method - provides a simple and effective solution that can be run on the GPU. In particular, this strategy needs little unrolling, which makes it more efficient during the backward pass. Further, even when the constraint matrix is too large to fit on the GPU memory (say large minibatch settings), we show that running the Newton method in a lower dimensional space yields accurate gradients for training, by utilizing a statistical concept called sufficient dimension reduction. While a number of specialized algorithms have been proposed for the models that we describe here, our module turns out to be applicable without any specific adjustments or relaxations. We describe each use case, study its properties and demonstrate the efficacy of the approach over alternatives which use surrogate lower bounds and often, specialized optimization schemes. Frequently, we achieve superior computational behavior and performance improvements on common datasets used in the literature.
A Some Concepts in Linear Algebra In the interest of self-containedness, we provide a brief review of some concepts from linear algebra
Addition and scalar multiplication are defined in the obvious way by pa,b q ` λ pc,d q: " p a ` λc,b ` λd q for a,c P H, b,d P p H and λ P C . 'size' by what is called the operator norm, denoted by } } We may then write f " In this case we write R pz, q " p z q It is a standard exercise to show that this is independent of the choice of orthonormal basis. To streamline the argumentation let us first introduce some notation: 18 Notation C.2. Lemma A.1), we find a To investigate the example of Figure 3, we label the vertices of the respective graphs as depicted in Figure 6. Such operators are positive and hence | | " (similarly for r). " 0. Next we note }Jf } " J and determine Ă J It remains to establish (9).