AITopics

2407.12056

Country:

Europe > France (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Artificial IntelligenceJul-8-2024

Learning by the F-adjoint

Boughammoura, Ahmed

A recent paper by Boughammoura (2023) describes the back-propagation algorithm in terms of an alternative formulation called the F-adjoint method. In particular, by the F-adjoint algorithm the computation of the loss gradient, with respect to each weight within the network, is straightforward and can simply be done. In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network. Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process which yields to some local learning rule for deep feed-forward networks setting. Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.

algorithm, learning rule, neural network, (16 more...)

2407.11049

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Tunisia > Monastir Governorate > Monastir (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.66)

arXiv.org Artificial IntelligenceJul-4-2024

Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Nguyen, Thong, Bin, Yi, Wu, Xiaobao, Dong, Xinshuai, Hu, Zhiyuan, Le, Khoi, Nguyen, Cong-Duy, Ng, See-Kiong, Tuan, Luu Anh

Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, we propose a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model's focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, we improve video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets.

arxiv preprint arxiv, proceedings, representation, (9 more...)

2407.03788

Country:

Asia > Singapore (0.05)
North America > United States (0.04)
Europe > Hungary (0.04)
Asia > Vietnam (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Reinhardt, Eric A. F., Gleyzer, Sergei

SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions

arXiv.org Artificial IntelligenceJul-4-2024

Recent work has established an alternative to traditional multi-layer perceptron neural networks in the form of Kolmogorov-Arnold Networks (KAN). The general KAN framework uses learnable activation functions on the edges of the computational graph followed by summation on nodes. The learnable edge activation functions in the original implementation are basis spline functions (B-Spline). Here, we present a model in which learnable grids of B-Spline activation functions can be replaced by grids of re-weighted sine functions. We show that this leads to better or comparable numerical performance to B-Spline KAN models on the MNIST benchmark, while also providing a substantial speed increase on the order of 4-9 times.

activation function, kolmogorov-arnold network, sinekan, (16 more...)

2407.04149

Country:

North America > United States > Alabama > Tuscaloosa County > Tuscaloosa (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Koenig, Benjamin C., Kim, Suyong, Deng, Sili

KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics

arXiv.org Artificial IntelligenceJul-4-2024

Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed KAN-ODEs retain the flexible dynamical system modeling framework of Neural ODEs while leveraging the many benefits of KANs, including faster neural scaling, stronger interpretability, and lower parameter counts when compared against MLPs. We demonstrate these benefits in three test cases: the Lotka-Volterra predator-prey model, Burgers' equation, and the Fisher-KPP PDE. We showcase the strong performance of parameter-lean KAN-ODE systems generally in reconstructing entire dynamical systems, and also in targeted applications to the inference of a source term in an otherwise known flow field. We additionally demonstrate the interpretability of KAN-ODEs via activation function visualization and symbolic regression of trained results. The successful training of KAN-ODEs and their improved performance when compared to traditional Neural ODEs implies significant potential in leveraging this novel network architecture in myriad scientific machine learning applications.

equation, kan-ode, neural ode, (12 more...)

2407.04192

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Kostic, Vladimir R., Lounici, Karim, Pacreau, Gregoire, Novelli, Pietro, Turri, Giacomo, Pontil, Massimiliano

Neural Conditional Probability for Inference

arXiv.org Machine LearningJul-1-2024

We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.

confidence interval, density estimation, estimation, (17 more...)

2407.01171

Country:

North America > United States > New York (0.04)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Machine LearningJul-1-2024

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks

Duda, Jarek

Biological neural networks seem qualitatively superior (e.g. in learning, flexibility, robustness) from current artificial like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). Simultaneously, in contrast to them: have fundamentally multidirectional signal propagation~\cite{axon}, also of probability distributions e.g. for uncertainty estimation, and are believed not being able to use standard backpropagation training~\cite{backprop}. There are proposed novel artificial neurons based on HCR (Hierarchical Correlation Reconstruction) removing the above low level differences: with neurons containing local joint distribution model (of its connections), representing joint density on normalized variables as just linear combination among $(f_\mathbf{j})$ orthonormal polynomials: $\rho(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$ and $B$ some chosen basis, with basis growth approaching complete description of joint distribution. By various index summations of such $(a_\mathbf{j})$ tensor as neuron parameters, we get simple formulas for e.g. conditional expected values for propagation in any direction, like $E[x|y,z]$, $E[y|x]$, which degenerate to KAN-like parametrization if restricting to pairwise dependencies. Such HCR network can also propagate probability distributions (also joint) like $\rho(y,z|x)$. It also allows for additional training approaches, like direct $(a_\mathbf{j})$ estimation, through tensor decomposition, or more biologically plausible information bottleneck training: layers directly influencing only neighbors, optimizing content to maximize information about the next layer, and minimizing about the previous to minimize the noise.

neural network, neuron, normalization, (16 more...)

2405.05097

Country:

Europe > Poland > Lesser Poland Province > Kraków (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Machine LearningJun-30-2024

Structured and Balanced Multi-component and Multi-layer Neural Networks

Zhang, Shijun, Zhao, Hongkai, Zhong, Yimin, Zhou, Haomin

In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to approximate functions with complex features with both accuracy and efficiency in terms of degrees of freedom and computation cost. The main idea is motivated by a multi-component, each of which can be approximated effectively by a single-layer network, and multi-layer decomposition in a "divide-and-conquer" type of strategy to deal with a complex function. While an easy modification to fully connected neural networks (FCNNs) or multi-layer perceptrons (MLPs) through the introduction of balanced multi-component structures in the network, MMNNs achieve a significant reduction of training parameters, a much more efficient training process, and a much improved accuracy compared to FCNNs or MLPs. Extensive numerical experiments are presented to illustrate the effectiveness of MMNNs in approximating high oscillatory functions and its automatic adaptivity in capturing localized features.

mmnn, neural network, target function, (16 more...)

2407.00765

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Machine LearningJun-27-2024

Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

Fan, Zhongxiang, Liu, Zhaocheng, Liang, Jian, Kong, Dongying, Li, Han, Jiang, Peng, Li, Shuang, Gai, Kun

This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.

dataset, epoch, meda, (14 more...)

2407.01607

Country:

Asia > China > Shandong Province > Dongying (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Kundu, Akash, Sarkar, Aritra, Sadhu, Abhishek

KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search

arXiv.org Artificial IntelligenceJun-25-2024

Quantum architecture search~(QAS) is a promising direction for optimization and automated design of quantum circuits towards quantum advantage. Recent techniques in QAS focus on machine learning-based approaches from reinforcement learning, like deep Q-network. While multi-layer perceptron-based deep Q-networks have been applied for QAS, their interpretability remains challenging due to the high number of parameters. In this work, we evaluate the practicality of KANs in quantum architecture search problems, analyzing their efficiency in terms of the probability of success, frequency of optimal solutions and their dependencies on various degrees of freedom of the network. In a noiseless scenario, the probability of success and the number of optimal quantum circuit configurations to generate the multi-qubit maximally entangled states are significantly higher than MLPs. Moreover in noisy scenarios, KAN can achieve a better fidelity in approximating maximally entangled state than MLPs, where the performance of the MLP significantly depends on the choice of activation function. Further investigation reveals that KAN requires a very small number of learnable parameters compared to MLPs, however, the average time of executing each episode for KAN is much higher.

arxiv preprint arxiv, kan, probability, (13 more...)

2406.1763

Country:

Europe > Poland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)