AITopics | sigmoid 0

Collaborating Authors

sigmoid 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix

Neural Information Processing SystemsFeb-11-2026, 20:40:53 GMT

Weheldoutavalidation setfromthetraining set,andusedthisvalidation settoselecttheL2 regularization hyperparameter,which weselected from 45logarithmically spaced values between 10 6 and 105, applied to the sum of the per-example losses. Because the optimization problem is convex, we used the previous weights as a warm start as we increased theL2 regularization hyperparameter. Wemeasured eithertop-1ormean per-class accuracy, depending on which was suggested by the dataset creators. A.3 Fine-tuning In our fine-tuning experiments in Table 2, we used standard ImageNet-style data augmentationand trained for 20,000 steps with SGD with momentum of0.9 and cosine annealing [ 20]without restarts. Each curve represents a different model.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Supplementary Contents

Neural Information Processing SystemsFeb-9-2026, 08:24:33 GMT

Theauthors T admits singularvalue( i, i, i)i2I forsomeI, with1= 0 1 .. , i :X!Rand i :Z!R, i.e.T i = i iandT i = i i. Moreo T operatoras: Th= X Figure 5: Estimated rbfkernelwith =.1and1000samples. Thentheestimatorpresented in Equation(4), satisfiesthatw.p.1 : kT(ˆh h0)k2 O r rs log (pn) n + r log ( 1/ ) n !

artificial intelligence, equation, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Quétant, Guillaume, Molchanov, Pavlo, Voloshynovskiy, Slava

arXiv.org Machine LearningMar-10-2025

Foundation models are large-scale neural networks pre-trained on diverse data to learn generalpurpose representations that can be fine-tuned for specific downstream tasks. This poses significant challenges, especially in the case of low-labelled data, a semi-supervised learning setting where only a small fraction of the data samples are labelled, while the majority remain unlabelled. While foundation models are pre-trained on large datasets in a self-supervised manner, their deployment often requires fine-tuning on new datasets with limited labelled samples and potential distribution shifts. Furthermore, the downstream tasks frequently differ from the pre-training objectives, complicating the adaptation process. Existing semi-supervised approaches, such as pseudo-labelling, rely heavily on assumptions about data distributions or task-specific tuning, limiting their generalisability. Addressing these challenges is essential to fully exploit the potential of foundation models and ensure their adaptability and scalability in diverse applications. The main contributions of this study are: A new framework for foundation models fine-tuning: We introduces a fine-tuning strategy based on mutual information decomposition.

dataset, equation, foundation model, (13 more...)

arXiv.org Machine Learning

2503.07851

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Automated Design of Linear Bounding Functions for Sigmoidal Nonlinearities in Neural Networks

König, Matthias, Zhang, Xiyue, Hoos, Holger H., Kwiatkowska, Marta, van Rijn, Jan N.

arXiv.org Artificial IntelligenceJun-14-2024

The ubiquity of deep learning algorithms in various applications has amplified the need for assuring their robustness against small input perturbations such as those occurring in adversarial attacks. Existing complete verification techniques offer provable guarantees for all robustness queries but struggle to scale beyond small neural networks. To overcome this computational intractability, incomplete verification methods often rely on convex relaxation to over-approximate the nonlinearities in neural networks. Progress in tighter approximations has been achieved for piecewise linear functions. However, robustness verification of neural networks for general activation functions (e.g., Sigmoid, Tanh) remains under-explored and poses new challenges. Typically, these networks are verified using convex relaxation techniques, which involve computing linear upper and lower bounds of the nonlinear activation functions. In this work, we propose a novel parameter search method to improve the quality of these linear approximations. Specifically, we show that using a simple search method, carefully adapted to the given verification problem through state-of-the-art algorithm configuration techniques, improves the average global lower bound by 25% on average over the current state of the art on several commonly used local robustness verification benchmarks.

activation function, neural network, tangent point, (14 more...)

arXiv.org Artificial Intelligence

2406.10154

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation

Roberts-Licklider, Karen, Trafalis, Theodore

arXiv.org Artificial IntelligenceApr-23-2024

The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios.

machine learning technique, unfair 1, unfair 10, (12 more...)

arXiv.org Artificial Intelligence

2404.15418

Country:

North America > United States > Oklahoma (0.25)
North America > Canada (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Kernelized Concept Erasure

Ravfogel, Shauli, Vargas, Francisco, Goldberg, Yoav, Cotterell, Ryan

arXiv.org Artificial IntelligenceSep-6-2023

The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.

kernel, poly 0, representation, (17 more...)

arXiv.org Artificial Intelligence

2201.12191

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Oceania > Australia (0.04)
(6 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model

Tang, Runshi, Yuan, Ming, Zhang, Anru R.

arXiv.org Artificial IntelligenceJul-2-2023

In modern scientific applications, data are often observed in the form of multiple matrices or tensors that pertain to different subjects from a certain population. For instance, longitudinal gene expression data consist of a matrix of gene expression levels across time for each subject (Liu et al., 2017); MRI imaging data contain one order-3 tensor image for each patient (Zhou et al., 2013); multilayer network can be represented by an order-3 tensor, where each layer (i.e., a matrix) represents one network (Jing et al., 2021); m-uniform hypergraph is typically viewed as an order-m tensor, whose entries denote all hyper-edges (Zhen & Wang, 2022); atomicresolution 4D scanning transmission electron microscopy data can be expressed as an order-3 tensor with two models denoting scan location and the other denoting the convergent beam electron diffraction pattern (Zhang et al., 2020). Combining information from all subjects results in a high-order tensor with subject independence along one mode and some covariance structure along the other modes that represent the relationship among the measured covariates. Principal Component Analysis (PCA) is a widely accepted method for analyzing data consisting of vectors associated with individual subjects. Its primary objective is to identify a lower-dimensional subspace within the feature domain that captures the majority of data variance (Pearson, 1901).

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.00575

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.67)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Dimension-Free Average Treatment Effect Inference with Deep Neural Networks

Du, Xinze, Fan, Yingying, Lv, Jinchi, Sun, Tianshu, Vossler, Patrick

arXiv.org Machine LearningDec-2-2021

This paper investigates the estimation and inference of the average treatment effect (ATE) using deep neural networks (DNNs) in the potential outcomes framework. Under some regularity conditions, the observed response can be formulated as the response of a mean regression problem with both the confounding variables and the treatment indicator as the independent variables. Using such formulation, we investigate two methods for ATE estimation and inference based on the estimated mean regression function via DNN regression using a specific network architecture. We show that both DNN estimates of ATE are consistent with dimension-free consistency rates under some assumptions on the underlying true mean regression model. Our model assumptions accommodate the potentially complicated dependence structure of the observed response on the covariates, including latent factors and nonlinear interactions between the treatment indicator and confounding variables. We also establish the asymptotic normality of our estimators based on the idea of sample splitting, ensuring precise inference and uncertainty quantification. Simulation studies and real data application justify our theoretical findings and support our DNN estimation and inference methods.

estimate sigmoid 0, relu 0, sigmoid 0, (14 more...)

arXiv.org Machine Learning

2112.01574

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Eliminating Multicollinearity Issues in Neural Network Ensembles: Incremental, Negatively Correlated, Optimal Convex Blending

Lagari, Pola Lydia, Tsoukalas, Lefteri H., Safarkhani, Salar, Lagaris, Isaac E.

arXiv.org Artificial IntelligenceApr-29-2021

Given a {features, target} dataset, we introduce an incremental algorithm that constructs an aggregate regressor, using an ensemble of neural networks. It is well known that ensemble methods suffer from the multicollinearity issue, which is the manifestation of redundancy arising mainly due to the common training-dataset. In the present incremental approach, at each stage we optimally blend the aggregate regressor with a newly trained neural network under a convexity constraint which, if necessary, induces negative correlations. Under this framework, collinearity issues do not arise at all, rendering so the method both accurate and robust.

aggregate network, equidistant point, sigmoid 0, (11 more...)

arXiv.org Artificial Intelligence

2104.14715

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hebbian-Descent

Melchior, Jan, Wiskott, Laurenz

arXiv.org Machine LearningMay-25-2019

In this work we propose Hebbian-descent as a biologically plausible learning rule for hetero-associative as well as auto-associative learning in single layer artificial neural networks. It can be used as a replacement for gradient descent as well as Hebbian learning, in particular in online learning, as it inherits their advantages while not suffering from their disadvantages. We discuss the drawbacks of Hebbian learning as having problems with correlated input data and not profiting from seeing training patterns several times. For gradient descent we identify the derivative of the activation function as problematic especially in online learning. Hebbian-descent addresses these problems by getting rid of the activation function's derivative and by centering, i.e. keeping the neural activities mean free, leading to a biologically plausible update rule that is provably convergent, does not suffer from the vanishing error term problem, can deal with correlated data, profits from seeing patterns several times, and enables successful online learning when centering is used. We discuss its relationship to Hebbian learning, contrastive learning, and gradient decent and show that in case of a strictly positive derivative of the activation function Hebbian-descent leads to the same update rule as gradient descent but for a different loss function. In this case Hebbian-descent inherits the convergence properties of gradient descent, but we also show empirically that it converges when the derivative of the activation function is only non-negative, such as for the step function for example. Furthermore, in case of the mean squared error loss Hebbian-descent can be understood as the difference between two Hebb-learning steps, which in case of an invertible and integrable activation function actually optimizes a generalized linear model. ...

artificial intelligence, hebbian-descent, machine learning, (16 more...)

arXiv.org Machine Learning

1905.10585

Country:

North America > Canada (0.27)
North America > United States (0.27)

Genre: Research Report > New Finding (0.67)

Industry:

Education > Educational Setting > Online (0.89)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback