AITopics | att

Obtaining reliable inferences with traditional difference-in-differences (DiD) methods can be difficult. Problems can arise when both outcomes and errors are serially correlated, when there are few clusters or few treated clusters, when cluster sizes vary greatly, and in various other cases. In recent years, recognition of the ``staggered adoption'' problem has shifted the focus away from inference towards consistent estimation of treatment effects. One of the most popular new estimators is the CSDID procedure of Callaway and Sant'Anna (2021). We find that the issues of over-rejection with few clusters and/or few treated clusters are at least as severe for CSDID as for traditional DiD methods. We also propose using a cluster jackknife for inference with CSDID, which simulations suggest greatly improves inference. We provide software packages in Stata csdidjack and R didjack to calculate cluster-jackknife standard errors easily.

artificial intelligence, att, machine learning, (17 more...)

arXiv.org Machine Learning

2602.12043

Country:

North America > United States > Indiana (0.05)
North America > United States > Wisconsin (0.04)
North America > United States > South Carolina (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions

Neural Information Processing SystemsFeb-10-2026, 23:16:55 GMT

We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable.

data mining, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.67)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)
(2 more...)

Add feedback

LeveragingPredictionsinSmoothedOnlineConvex OptimizationviaGradient-basedAlgorithms

Neural Information Processing SystemsFeb-9-2026, 17:05:25 GMT

Since the switching costs introduce coupling across all stages, multi-step-ahead (long-term) predictions areincorporated toimprovethe online performance.

artificial intelligence, machine learning, rhig, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

DifferentiableEquilibriumComputationwith DecisionDiagramsforStackelbergModelsof CombinatorialCongestionGames

Neural Information Processing SystemsFeb-8-2026, 14:46:28 GMT

For example, theleader aims tooptimize some traffic-networkparameters (e.g., road width values) so that players can spend less traveling time at equilibrium.

artificial intelligence, machine learning, uninett, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

49ca03822497d26a3943d5084ed59130-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 08:05:17 GMT

factor graph, fgg, graph, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

When do spectral gradient updates help in deep learning?

Davis, Damek, Drusvyatskiy, Dmitriy

arXiv.org Machine LearningDec-5-2025

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobenius ratio of the gradient to the stable rank of the incoming activations. To understand when this condition may be satisfied, we first prove that post-activation matrices have low stable rank at Gaussian initialization in random feature regression, feedforward networks, and transformer blocks. In spiked random feature models we then show that, after a short burn-in, the Euclidean gradient's nuclear-to-Frobenius ratio grows with the data dimension while the stable rank of the activations remains bounded, so the predicted advantage of spectral updates scales with dimension. We validate these predictions in synthetic regression experiments and in NanoGPT-scale language model training, where we find that intermediate activations have low-stable-rank throughout training and the corresponding gradients maintain large nuclear-to-Frobenius ratios. Together, these results identify conditions for spectral gradient methods, such as Muon, to be effective in training deep networks and transformers.

activation, matrix, stable rank, (16 more...)

arXiv.org Machine Learning

2512.04299

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)
(6 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Achieving Equilibrium under Utility Heterogeneity: An Agent-Attention Framework for Multi-Agent Multi-Objective Reinforcement Learning

Li, Zhuhui, Luo, Chunbo, Huang, Liming, Qi, Luyu, Min, Geyong

arXiv.org Artificial IntelligenceNov-13-2025

Multi-agent multi-objective systems (MAMOS) have emerged as powerful frameworks for modelling complex decision-making problems across various real-world domains, such as robotic exploration, autonomous traffic management, and sensor network optimisation. MAMOS offers enhanced scalability and robustness through decentralised control and more accurately reflects inherent trade-offs between conflicting objectives. In MAMOS, each agent uses utility functions that map return vectors to scalar values. Existing MAMOS optimisation methods face challenges in handling heterogeneous objective and utility function settings, where training non-stationarity is intensified due to private utility functions and the associated policies. In this paper, we first theoretically prove that direct access to, or structured modeling of, global utility functions is necessary for the Bayesian Nash Equilibrium under decentralised execution constraints. To access the global utility functions while preserving the decentralised execution, we propose an Agent-Attention Multi-Agent Multi-Objective Reinforcement Learning (AA-MAMORL) framework. Our approach implicitly learns a joint belief over other agents' utility functions and their associated policies during centralised training, effectively mapping global states and utilities to each agent's policy. In execution, each agent independently selects actions based on local observations and its private utility function to approximate a BNE, without relying on inter-agent communication. We conduct comprehensive experiments in both a custom-designed MAMO Particle environment and the standard MOMALand benchmark. The results demonstrate that access to global preferences and our proposed AA-MAMORL significantly improve performance and consistently outperform state-of-the-art methods.

agent, artificial intelligence, game theory, (16 more...)

arXiv.org Artificial Intelligence

2511.08926

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Energy (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.94)

Add feedback

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

He, Di, Tu, Songjun, Jaiswal, Ajay, Shen, Li, Yuan, Ganzhao, Liu, Shiwei, Yin, Lu

arXiv.org Artificial IntelligenceNov-6-2025

Weight decay is a standard regularization technique for training large language models (LLMs). While it is common to assign a uniform decay rate to every layer, this approach overlooks the structural diversity of LLMs and the varying spectral properties across modules. In this paper, we introduce AlphaDecay, a simple yet effective method that adaptively assigns different weight decay strengths to each module of an LLM. Our approach is guided by Heavy-Tailed Self-Regularization (HT-SR) theory, which analyzes the empirical spectral density (ESD) of weight correlation matrices to quantify "heavy-tailedness." Modules exhibiting more pronounced heavy-tailed ESDs, reflecting stronger feature learning, are assigned weaker decay, while modules with lighter-tailed spectra receive stronger decay. Our method leverages tailored weight decay assignments to balance the module-wise differences in spectral properties, leading to improved performance. Extensive pre-training tasks with various model sizes from 60M to 1B demonstrate that AlphaDecay achieves better perplexity and generalization than conventional uniform decay and other adaptive decay baselines. Our code is available at https://github.com/hed-ucas/AlphaDecay.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.14562

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Filters

Collaborating Authors

att

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

eef6aecfe050b556c6a48d9c16b15558-Paper-Conference.pdf

MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Improved Inference for CSDID Using the Cluster Jackknife

An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions

LeveragingPredictionsinSmoothedOnlineConvex OptimizationviaGradient-basedAlgorithms

DifferentiableEquilibriumComputationwith DecisionDiagramsforStackelbergModelsof CombinatorialCongestionGames

49ca03822497d26a3943d5084ed59130-Paper.pdf

When do spectral gradient updates help in deep learning?

Achieving Equilibrium under Utility Heterogeneity: An Agent-Attention Framework for Multi-Agent Multi-Objective Reinforcement Learning

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs