AITopics

2605.19152

Country: Europe (0.67)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tsikouras, Nikos, Caramanis, Constantine, Tzamos, Christos

MaxSketch: Robust Distinct Counting in Streams via Random Projections

arXiv.org Machine LearningMay-18-2026

Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, different images of the same individual may vary significantly at the pixel level. Classical sketches such as HyperLogLog rely on consistent hash values for identical elements and break down in this regime. Recent work on robust distinct counting in general metric spaces achieves $\widetildeΘ(\sqrt{n})$ memory, which is tight in the worst case. We show that substantially improved memory guarantees are possible under geometric structure common in learned representations. We introduce MaxSketch, a simple max-linear sketch built from random Gaussian projections, and prove that it succeeds in estimating the number of distinct latent objects. Concretely, we show that under this assumption $m = \widetilde{O} (\log n / \varepsilon^2)$ random projections (and hence $\widetilde{O} (\log n/\varepsilon^2)$ memory) suffice to recover the true distinct count within a $(1+\varepsilon)$ factor. Experiments on image streams confirm that MaxSketch accurately estimates distinct counts and generalizes beyond the training regime. Our results bridge classical streaming algorithms and modern representation learning, showing how geometric structure can fundamentally reduce the complexity of distinct counting.

artificial intelligence, assumption, machine learning, (19 more...)

2605.15571

Country:

North America > United States (0.46)
Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Nguyen, Minh-Toan, Barbier, Jean

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

arXiv.org Machine LearningMay-13-2026

We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We work in the high-dimensional regime where the teacher width $k$ scales linearly with the input dimension $d$ -- a setting that captures large-but-finite-width networks and has only recently become analytically tractable. Using a heuristic leave-one-out decoupling argument, validated numerically throughout, we derive asymptotically sharp characterizations of the Bayes-optimal generalization error and individual feature overlaps via a system of closed fixed-point equations. These equations reveal that feature learnability is governed by a sequence of sharp phase transitions: as data grows, teacher features become recoverable sequentially, each through a discontinuous jump in overlap. This sequential acquisition underlies a precise notion of \textit{effective width} $k_c$ -- the number of learnable features at a given data budget $n$ -- which unifies two distinct scaling regimes: a feature-learning regime in which the Bayes-optimal generalization error $\varepsilon^{\rm BO}$ scales as $ n^{1/(2β)-1}$, and a refinement regime in which it scales as $n^{-1}$, where $β>1/2$ is the exponent of the power-law feature hierarchy. Both laws collapse to the single relation $\varepsilon^{\rm BO}=Θ(k_c d/n)$. We further show empirically that a student trained with \textsc{Adam} near the effective width $k_c$ achieves these optimal scaling laws (up to a small algorithmic gap), and provide an information-theoretic account of the associated scaling in model size.

artificial intelligence, machine learning, regime, (15 more...)

2605.10395

Country: Europe (0.28)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Safaai, Houman, Vargas, Alessandro Marin

Dynamic Vine Copulas: Detecting and Quantifying Time-Varying Higher-Order Interactions

arXiv.org Machine LearningMay-8-2026

Time-varying dependence is often modeled with dynamic correlations or Gaussian graphical models, but multivariate systems can change through tail behavior, asymmetry, or conditional structure even when correlations are nearly stable. We introduce Dynamic Vine Copulas (DVC), a temporal vine-copula framework for estimating and diagnosing sequence-wide non-Gaussian dependence. DVC fixes a chosen vine factorization for comparability; the framework applies to C-, D-, and R-vines, and our experiments use fixed-root-order C-vines. Pair-copula states evolve through smooth parameter trajectories or temporally regularized family-switching paths. The main diagnostic is a held-out comparison between a full vine and its matched 1-truncated version, which separates flexible first-tree pairwise dependence from evidence contributed by higher-tree conditional terms. At the population level, under a correct fixed vine and the simplifying assumption, this contrast equals the higher-tree component of a vine total-correlation decomposition; in finite samples, it is a predictive diagnostic. In controlled benchmarks, DVC detects Student-t degrees-of-freedom changes, Clayton-to-Gumbel switches, and recurrent conditional-interaction episodes missed or conflated by Gaussian dynamic baselines. The higher-tree score remains near zero in pairwise-only regimes and rises during conditional-interaction regimes. On Allen Visual Behavior Neuropixels data, DVC identifies a reproducible time-indexed higher-tree signal that is positive across held-out splits and vanishes under a decorrelated null, indicating simultaneous cross-area dependence. DVC therefore provides a flexible temporal copula model and an interpretable test of whether temporal dependence changes are pairwise or conditional.

artificial intelligence, dependence, machine learning, (19 more...)

2605.03061

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.46)
Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Neural Information Processing SystemsApr-30-2026, 19:24:32 GMT

0004d0b59e19461ff126e3a08a814c33-AuthorFeedback.pdf

We sincerely appreciate the reviewers for their careful reading, constructive questions and suggestions. We would very1 much like further exchanges to improve our work, but the following is our best effort within the current limits.2 First, we address questions appeared at least twice. We write P1, P2 for paragraph reference, and Rx for reviewers.3 We discuss two main motivations here: lack of graph loss, and empirical failure4 of distinguishing power.

artificial intelligence, machine learning, representation, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Neural Information Processing SystemsApr-25-2026, 02:44:15 GMT

interpretation of regularization

Blue arrows indicate node feature vectors hv of the latent space, and the orange area/point indicate possible range of graph feature vector hG obtained by applying READOUT to hv. We elaborate our motivation behind orthogonal regularization (15) proposed in Section 4.2.3. The biggest motivation behind orthognoal regularization lies in understanding (8) and (12) that the node features H becomes full rank matrix with good condition number. Figure 5 visually demonstrates the geometric effect of attention-based READOUT and orthogonal regularization with two example node features h1 and h2. Only one graph feature vector hG is possible from the combination of two node features with conventional READOUT, while vectors within the range of the orange rhombus can represent the whole graph feature with attention-based READOUT. With orthogonal regularization, area of the range that the graph feature vector hG can represent become even larger, with lower possibility of null subspace within H. Accordingly, the subspace that H can span can be rich enough.

artificial intelligence, machine learning, regularization, (15 more...)

Genre: Research Report (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsFeb-19-2026, 19:59:54 GMT

9a0ee0a9e7a42d2d69b8f86b3a0756b1-AuthorFeedback.pdf

inference, neuron, postdiction, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Neural Information Processing SystemsFeb-18-2026, 07:45:02 GMT

Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos

Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different models on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the SENSORIUM 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice.

artificial intelligence, machine learning, visual cortex, (19 more...)

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(9 more...)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Neural Information Processing SystemsFeb-14-2026, 12:28:18 GMT

d921c3c762b1522c475ac8fc0811bb0f-AuthorFeedback.pdf

We wish to thank all of the reviewers for their time and thorough reading of our paper! We appreciate the reviewer's suggestions regarding clarity. We have added the suggested summary sentence "the key We started with binary sentiment classification, but are actively working on more tasks. RNN hidden states onto the top two PCs for two different input sequences that differ only by two tokens (replacing ' The trajectories start out the same as the initial tokens are identical. We have added a footnote noting this in the main text.

artificial intelligence, linear approximation, natural language, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.37)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.37)