inull
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Nonparametric Kernel Clustering with Bandit Feedback
Thuot, Victor, Vogt, Sebastian, Ghoshdastidar, Debarghya, Verzelen, Nicolas
Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The problem is formally posed as the task of partitioning the arms of an N-armed stochastic bandit according to their underlying distributions, grouping two arms together if and only if they share the same distribution, using samples collected sequentially and adaptively. This setting has gained attention in recent years due to its applicability in recommendation systems and crowdsourcing. Existing works on clustering with bandit feedback rely on a strong assumption that the underlying distributions are sub-Gaussian. As a consequence, the existing methods mainly cover settings with linearly-separable clusters, which has little practical relevance. We introduce a framework of ``nonparametric clustering with bandit feedback'', where the underlying arm distributions are not constrained to any parametric, and hence, it is applicable for active clustering of real-world datasets. We adopt a kernel-based approach, which allows us to reformulate the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Building on this formulation, we introduce the KABC algorithm with theoretical correctness guarantees and analyze its sampling budget. We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS. Our algorithm is adaptive to this unknown quantity: it does not require it as an input yet achieves instance-dependent guarantees.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Communications > Social Media > Crowdsourcing (0.52)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.43)
Learning Time-Varying Graphs from Incomplete Graph Signals
Peng, Chuansen, Shen, Xiaojing
This paper tackles the challenging problem of jointly inferring time-varying network topologies and imputing missing data from partially observed graph signals. We propose a unified non-convex optimization framework to simultaneously recover a sequence of graph Laplacian matrices while reconstructing the unobserved signal entries. Unlike conventional decoupled methods, our integrated approach facilitates a bidirectional flow of information between the graph and signal domains, yielding superior robustness, particularly in high missing-data regimes. To capture realistic network dynamics, we introduce a fused-lasso type regularizer on the sequence of Laplacians. This penalty promotes temporal smoothness by penalizing large successive changes, thereby preventing spurious variations induced by noise while still permitting gradual topological evolution. For solving the joint optimization problem, we develop an efficient Alternating Direction Method of Multipliers (ADMM) algorithm, which leverages the problem's structure to yield closed-form solutions for both the graph and signal subproblems. This design ensures scalability to large-scale networks and long time horizons. On the theoretical front, despite the inherent non-convexity, we establish a convergence guarantee, proving that the proposed ADMM scheme converges to a stationary point. Furthermore, we derive non-asymptotic statistical guarantees, providing high-probability error bounds for the graph estimator as a function of sample size, signal smoothness, and the intrinsic temporal variability of the graph. Extensive numerical experiments validate the approach, demonstrating that it significantly outperforms state-of-the-art baselines in both convergence speed and the joint accuracy of graph learning and signal recovery.
- North America > United States > California (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- North America > Canada (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Germany > Baden-Württemberg (0.04)
- Europe > France (0.04)
Sharper Generalization Bounds for Pairwise Learning: Supplementary Material A Proof of Theorem 1
To prove Theorem 1, we need to introduce some lemmas. With these lemmas, we can give the proof of Theorem 1 on high-probability bounds of the generalization gap. The concentration inequality established in Lemma A.1 applies to a summation of According to Lemma A.3, we know null null null null(A ( S); Z, Z) E Therefore, all the assumptions of Lemma A.1 hold for the random functions Lemma A.1 to derive null null null null We first prove Lemma 2 on the norm of output model. We can plug the above inequality back into (B.1) to derive σ 2 Enull null A (S) w To prove Theorem 3, we introduce some lemmas. Assume for all z, z we have (4.3) .
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
The Loss Surfaces of Neural Networks with General Activation Functions
Baskerville, Nicholas P., Keating, Jonathan P., Mezzadri, Francesco, Najnudel, Joseph
The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)