AITopics | figurec

Collaborating Authors

figurec

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5bacb12bf81e98e2ee0eed953a23c656-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 07:34:19 GMT

Instead,ourboundrequires asimple, intuitive condition which is well justified by prior empirical works and holds in practiceeffectively100%ofthetime. Theboundisinspiredby H H-divergence but is easier to evaluate and substantially tighter, consistently providing nonvacuous test error upper bounds.

artificial intelligence, chosenfrom, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Appendix

Neural Information Processing SystemsFeb-11-2026, 20:40:53 GMT

Weheldoutavalidation setfromthetraining set,andusedthisvalidation settoselecttheL2 regularization hyperparameter,which weselected from 45logarithmically spaced values between 10 6 and 105, applied to the sum of the per-example losses. Because the optimization problem is convex, we used the previous weights as a warm start as we increased theL2 regularization hyperparameter. Wemeasured eithertop-1ormean per-class accuracy, depending on which was suggested by the dataset creators. A.3 Fine-tuning In our fine-tuning experiments in Table 2, we used standard ImageNet-style data augmentationand trained for 20,000 steps with SGD with momentum of0.9 and cosine annealing [ 20]without restarts. Each curve represents a different model.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Supplemental Material: CHIP: AHawkes Process Model for Continuous-time Networkswith Scalable and Consistent Estimation

Neural Information Processing SystemsFeb-10-2026, 06:05:31 GMT

A.1 CommunityDetection The spectral clustering algorithm for directed networks that we consider in this paper is shown in Algorithm A.1. It can be applied either to the weighted adjacency (count) matrixN or the unweighted adjacency matrixA, where Aij =1{Nij >0} and 1{ } denotes the indicator function of the argument. This algorithm is used for the community detection step in our proposed CHIP estimationprocedure. For undirectednetworks, which we use for the theoreticalanalysisin Section 4, spectral clustering is performed by running k-means clustering on the rows of theeigenvector matrix of N or A, not the rows of the concatenated singular vector matrix. A.2 Estimation of Hawkes process parameters Ozaki (1979) derived the log-likelihood function for Hawkes processes with exponential kernels, which takes the form: logL= µT+ The threeparameters µ,α,β can be estimatedby maximizing (A.1) using standard numerical methods for non-linear optimization (Nocedal & Wright, 2006). We provide closed-form equations for estimating mab =αab/βab and µab in (2).

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)

Add feedback

SupplementaryMaterial

Neural Information Processing SystemsFeb-9-2026, 13:05:24 GMT

R φqφ(z)dz = 0. Thus, the gradient of the log-variance loss becomes equaltothegradientofthe KL divergence. Therefore, for large enough D, the condition from Proposition 3 (see Eq. 19), is fulfilled and the statement follows immediately. This result isexpected to extend to the multivariate cases as well. For all the experiments listed in the main text, we use the VarGrad estimator for the gradients of the logistic regression models. VarGrad achieves considerable variance reduction over the adaptive (RELAX) and non-adaptive (ControlledReinforce)model-agnosticestimators.

artificial intelligence, estimator, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Supplementto" Learning IndividualizedTreatment RuleswithManyTreatments: ASupervised ClusteringApproachUsingAdaptiveFusion "

Neural Information Processing SystemsFeb-9-2026, 12:05:53 GMT

For parametric models, we assume the linear main effect M0pZq " Z η where η PRp. For nonparametric regression, we follow [3] to divide the training data into M folds based on the assignedtreatment. In addition, since β PΘn, we have β PΘn as well. Hence, with similar derivations, we have }β i β j}1 " a2λn.BasedonAssumption4, Therefore, only the treatments that belong to the same group contribute to Γ2.

artificial intelligence, learning individualizedtreatment ruleswithmanytreatment, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Appendix

Neural Information Processing SystemsFeb-9-2026, 01:46:56 GMT

For vision transformers, we train linear probes on representations from individual tokens or on the representation averaged over all tokens, at the output of different transformer layers (each layer meaning a full transformer block including self-attention and MLP). Moreover, ResNets differ from ViTs in that the number of channels changes throughout the model, with fewer channels in the earlier layers. Wetrain alinear probe on each individual token and plot the average accuracy over the test set, in percent. Here we plot the results for each token a subset of layers in 3models: ViT-B/32 trained with aclassification token (CLS) or global average pooling (GAP), as well as a ResNet50. There are two main observations tobemade.

artificial intelligence, figurec, representation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.36)

Add feedback