AITopics | sse

Collaborating Authors

sse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nested Mini-Batch K-Means

James Newling, François Fleuret

Neural Information Processing SystemsMar-23-2026, 15:53:00 GMT

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 earlier than the standard mini-batch algorithm.

artificial intelligence, centroid, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Manipulating a Learning Defender and Ways to Counteract

Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge

Neural Information Processing SystemsFeb-14-2026, 01:06:34 GMT

Neural Information Processing Systems http://nips.cc/

attacker, attacker type, defender, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games (0.70)
Information Technology (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ed383ec94720d62a939bfb6bdd98f50c-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 00:16:00 GMT

best response, follower, payoff matrix, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsDec-25-2025, 05:51:26 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

data-driven regularization, name change, stochastic shared embedding, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Improved seeding strategies for k-means and k-GMM

Carrière, Guillaume, Cazals, Frédéric

arXiv.org Artificial IntelligenceNov-4-2025

We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a lookahead principle--conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a multipass strategy to tame down the effect of randomization. Experiments show a consistent constant factor improvement over classical contenders in terms of the final metric (SSE for k-means, log-likelihood for k-GMM), at a modest overhead. In particular, for k-means, our methods improve on the recently designed multi-swap strategy, which was the first one to outperform the greedy k-means++ seeding. Our experimental analysis also shed light on subtle properties of k-means often overlooked, including the (lack of) correlations between the SSE upon seeding and the final SSE, the variance reduction phenomena observed in iterative seeding methods, and the sensitivity of the final SSE to the pool size for greedy methods. Practically, our most effective seeding methods are strong candidates to become one of the--if not the--standard techniques. From a theoretical perspective, our formalization of seeding opens the door to a new line of analytical approaches.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.21291

Country: Europe > France (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Scaling Linear Attention with Sparse State Expansion

Pan, Yuqi, An, Yongqi, Li, Zheng, Chou, Yuhong, Zhu, Ruijie, Wang, Xiaohui, Wang, Mingxuan, Wang, Jinqiao, Li, Guoqi

arXiv.org Artificial IntelligenceOct-2-2025

The Transformer architecture, despite its widespread success, struggles with long-context scenarios due to quadratic computation and linear memory growth. While various linear attention variants mitigate these efficiency constraints by compressing context into fixed-size states, they often degrade performance in tasks such as in-context retrieval and reasoning. To address this limitation and achieve more effective context compression, we propose two key innovations. First, we introduce a row-sparse update formulation for linear attention by conceptualizing state updating as information classification. This enables sparse state updates via softmax-based top-$k$ hard classification, thereby extending receptive fields and reducing inter-class interference. Second, we present Sparse State Expansion (SSE) within the sparse framework, which expands the contextual state into multiple partitions, effectively decoupling parameter size from state capacity while maintaining the sparse classification paradigm. Supported by efficient parallelized implementations, our design achieves effective classification and highly discriminative state representations. We extensively validate SSE in both pure linear and hybrid (SSE-H) architectures across language modeling, in-context retrieval, and mathematical reasoning benchmarks. SSE demonstrates strong retrieval performance and scales favorably with state size. Moreover, after reinforcement learning (RL) training, our 2B SSE-H model achieves state-of-the-art mathematical reasoning performance among small reasoning models, scoring 64.5 on AIME24 and 50.2 on AIME25, significantly outperforming similarly sized open-source Transformers. These results highlight SSE as a promising and efficient architecture for long-context modeling.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.16577

Genre: Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation

Cattaneo, Matias D., Klusowski, Jason M., Yu, Ruiqi Rae

arXiv.org Machine LearningSep-16-2025

Recursive decision trees have emerged as a leading methodology for heterogeneous causal treatment effect estimation and inference in experimental and observational settings. These procedures are fitted using the celebrated CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or custom variants thereof, and hence are believed to be "adaptive" to high-dimensional data, sparsity, or other specific features of the underlying data generating process. Athey and Imbens [2016] proposed several "honest" causal decision tree estimators, which have become the standard in both academia and industry. We study their estimators, and variants thereof, and establish lower bounds on their estimation error. We demonstrate that these popular heterogeneous treatment effect estimators cannot achieve a polynomial-in-$n$ convergence rate under basic conditions, where $n$ denotes the sample size. Contrary to common belief, honesty does not resolve these limitations and at best delivers negligible logarithmic improvements in sample size or dimension. As a result, these commonly used estimators can exhibit poor performance in practice, and even be inconsistent in some settings. Our theoretical insights are empirically validated through simulations.

dim, estimator, log log, (14 more...)

arXiv.org Machine Learning

2509.11381

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (0.45)

Industry: Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Optimally Deceiving a Learning Leader in Stackelberg Games

Neural Information Processing SystemsAug-22-2025, 01:03:56 GMT

Stackelberg games are a simple yet powerful model for sequential interaction among strategic agents. In such games there are two players: a leader and a follower.

best response, follower, payoff matrix, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Manipulating a Learning Defender and Ways to Counteract

Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge

Neural Information Processing SystemsAug-20-2025, 01:34:33 GMT

Neural Information Processing Systems http://nips.cc/

attacker, attacker type, defender, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games (0.70)
Information Technology (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means

Sharma, Parichit, Stanislaw, Marcin, Kurban, Hasan, Kulekci, Oguzhan, Dalkilic, Mehmet

arXiv.org Artificial IntelligenceAug-11-2025

This paper introduces Geometric-k-means (or Gk-means for short), a novel approach that significantly enhances the efficiency and energy economy of the widely utilized k-means algorithm, which, despite its inception over five decades ago, remains a cornerstone in machine learning applications. The essence of Gk-means lies in its active utilization of geometric principles, specifically scalar projection, to significantly accelerate the algorithm without sacrificing solution quality. This geometric strategy enables a more discerning focus on data points that are most likely to influence cluster updates, which we call as high expressive data (HE). In contrast, low expressive data (LE), does not impact clustering outcome, is effectively bypassed, leading to considerable reductions in computational overhead. Experiments spanning synthetic, real-world and high-dimensional datasets, demonstrate Gk-means is significantly better than traditional and state of the art (SOTA) k-means variants in runtime and distance computations (DC). Moreover, Gk-means exhibits better resource efficiency, as evidenced by its reduced energy footprint, placing it as more sustainable alternative.

artificial intelligence, gk -means, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.06353

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback