AITopics

2411.16914

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Genre: Research Report (0.50)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Abedsoltan, Amirhesam, Ma, Siyuan, Pandit, Parthe, Belkin, Mikhail

Fast training of large kernel models with delayed projections

arXiv.org Machine LearningNov-25-2024

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. They have also served as the foundation 2024) leverage the Nyström Approximation (NA) in combination for understanding many significant phenomena in with other strategies to enhance performance. Despite these advantages, ASkotch combines it with block coordinate descent, the scalability of kernel methods has remained a persistent whereas Falkon combines it with the Conjugate Gradient challenge, particularly when applied to large datasets. However, this limitation is critical for expanding the utility these strategies are limited by model size due to memory of kernel-based techniques in modern machine learning applications.

eigenpro 3, eigenpro 4, model size, (17 more...)

2411.16658

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Piran, Fardin Jalil, Chen, Zhiling, Imani, Mohsen, Imani, Farhad

Privacy-Preserving Federated Learning with Differentially Private Hyperdimensional Computing

arXiv.org Machine LearningNov-24-2024

Federated Learning (FL) is essential for efficient data exchange in Internet of Things (IoT) environments, as it trains Machine Learning (ML) models locally and shares only model updates. However, FL is vulnerable to privacy threats like model inversion and membership inference attacks, which can expose sensitive training data. To address these privacy concerns, Differential Privacy (DP) mechanisms are often applied. Yet, adding DP noise to black-box ML models degrades performance, especially in dynamic IoT systems where continuous, lifelong FL learning accumulates excessive noise over time. To mitigate this issue, we introduce Federated HyperDimensional computing with Privacy-preserving (FedHDPrivacy), an eXplainable Artificial Intelligence (XAI) framework that combines the neuro-symbolic paradigm with DP. FedHDPrivacy carefully manages the balance between privacy and performance by theoretically tracking cumulative noise from previous rounds and adding only the necessary incremental noise to meet privacy requirements. In a real-world case study involving in-process monitoring of manufacturing machining operations, FedHDPrivacy demonstrates robust performance, outperforming standard FL frameworks-including Federated Averaging (FedAvg), Federated Stochastic Gradient Descent (FedSGD), Federated Proximal (FedProx), Federated Normalized Averaging (FedNova), and Federated Adam (FedAdam)-by up to 38%. FedHDPrivacy also shows potential for future enhancements, such as multimodal data fusion.

artificial intelligence, data mining, machine learning, (20 more...)

2411.0114

Country:

North America > United States > Connecticut > Tolland County > Storrs (0.14)
North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Virginia (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Mohammadi, Hesameddin, Tinati, Mohammad, Tu, Stephen, Soltanolkotabi, Mahdi, Jovanović, Mihailo R.

Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problem

arXiv.org Artificial IntelligenceNov-24-2024

The symmetric low-rank matrix factorization serves as a building block in many learning tasks, including matrix recovery and training of neural networks. However, despite a flurry of recent research, the dynamics of its training via non-convex factorized gradient-descent-type methods is not fully understood especially in the over-parameterized regime where the fitted rank is higher than the true rank of the target matrix. To overcome this challenge, we characterize equilibrium points of the gradient flow dynamics and examine their local and global stability properties. To facilitate a precise global analysis, we introduce a nonlinear change of variables that brings the dynamics into a cascade connection of three subsystems whose structure is simpler than the structure of the original system. We demonstrate that the Schur complement to a principal eigenspace of the target matrix is governed by an autonomous system that is decoupled from the rest of the dynamics. In the over-parameterized regime, we show that this Schur complement vanishes at an $O(1/t)$ rate, thereby capturing the slow dynamics that arises from excess parameters. We utilize a Lyapunov-based approach to establish exponential convergence of the other two subsystems. By decoupling the fast and slow parts of the dynamics, we offer new insight into the shape of the trajectories associated with local search algorithms and provide a complete characterization of the equilibrium points and their global stability properties. Such an analysis via nonlinear control techniques may prove useful in several related over-parameterized problems.

artificial intelligence, diag, machine learning, (15 more...)

2411.15972

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Artificial IntelligenceNov-24-2024

FedQP: Towards Accurate Federated Learning using Quadratic Programming Guided Mutation

Weng, Jiawen, Xia, Zeke, Li, Ran, Hu, Ming, Chen, Mingsong

Due to the advantages of privacy-preserving, Federated Learning (FL) is widely used in distributed machine learning systems. However, existing FL methods suffer from low-inference performance caused by data heterogeneity. Specifically, due to heterogeneous data, the optimization directions of different local models vary greatly, making it difficult for the traditional FL method to get a generalized global model that performs well on all clients. As one of the state-of-the-art FL methods, the mutation-based FL method attempts to adopt a stochastic mutation strategy to guide the model training towards a well-generalized area (i.e., flat area in the loss landscape). Specifically, mutation allows the model to shift within the solution space, providing an opportunity to escape areas with poor generalization (i.e., sharp area). However, the stochastic mutation strategy easily results in diverse optimal directions of mutated models, which limits the performance of the existing mutation-based FL method. To achieve higher performance, this paper proposes a novel mutation-based FL approach named FedQP, utilizing a quadratic programming strategy to regulate the mutation directions wisely. By biasing the model mutation towards the direction of gradient update rather than traditional random mutation, FedQP can effectively guide the model to optimize towards a well-generalized area (i.e., flat area). Experiments on multiple well-known datasets show that our quadratic programming-guided mutation strategy effectively improves the inference accuracy of the global model in various heterogeneous data scenarios.

artificial intelligence, global model, machine learning, (15 more...)

2411.15847

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Singapore (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Sichuan Province (0.04)

Genre: Research Report (0.64)

Industry:

Media (0.74)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Goulart, Paimon, Papalexakis, Evangelos E.

Can a Large Language Model Learn Matrix Functions In Context?

arXiv.org Artificial IntelligenceNov-23-2024

Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL), where models learn from a few input-output pairs without explicit fine-tuning. In this paper, we explore the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition. Our experiments show that while LLMs perform comparably to traditional models such as Stochastic Gradient Descent (SGD) based Linear Regression and Neural Networks (NN) for simpler tasks, they outperform these models on more complex tasks, particularly in the case of top-k Singular Values. Furthermore, LLMs demonstrate strong scalability, maintaining high accuracy even as the matrix size increases. Additionally, we found that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models. These results suggest that LLMs could provide an efficient alternative to classical methods for solving high-dimensional problems. Future work will focus on extending these findings to larger matrices and more complex matrix operations while exploring the effect of using different numerical representations in ICL.

large language model, machine learning, natural language, (20 more...)

2411.15675

Country:

North America > United States > California > Riverside County > Riverside (0.14)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Guo, Anxin, Vijayaraghavan, Aravindan

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

arXiv.org Machine LearningNov-22-2024

We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not understand the basic algorithmic question of whether one arbitrary ReLU neuron is learnable in the non-realizable setting. In particular, all existing polynomial time algorithms only provide approximation guarantees for the better-behaved unbiased setting or restricted bias setting. Our main result is a polynomial time statistical query (SQ) algorithm that gives the first constant factor approximation for arbitrary bias. It outputs a ReLU activation that achieves a loss of $O(\mathrm{OPT}) + \varepsilon$ in time $\mathrm{poly}(d,1/\varepsilon)$, where $\mathrm{OPT}$ is the loss obtained by the optimal ReLU activation. Our algorithm presents an interesting departure from existing algorithms, which are all based on gradient descent and thus fall within the class of correlational statistical query (CSQ) algorithms. We complement our algorithmic result by showing that no polynomial time CSQ algorithm can achieve a constant factor approximation. Together, these results shed light on the intrinsic limitation of gradient descent, while identifying arguably the simplest setting (a single neuron) where there is a separation between SQ and CSQ algorithms.

algorithm, artificial intelligence, machine learning, (16 more...)

2411.14349

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria > Styria > Graz (0.04)
(20 more...)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

arXiv.org Machine LearningNov-22-2024

AdamZ: An Enhanced Optimisation Method for Neural Network Training

Zaznov, Ilia, Badii, Atta, Dufour, Alfonso, Kunkel, Julian

In recent years, the field of machine learning has witnessed significant advancements, particularly in the development of optimisation algorithms that enhance the efficiency and effectiveness of training deep neural networks. Among these algorithms, the Adam optimiser has gained widespread popularity due to its adaptive learning rate capabilities, which enable more efficient convergence compared to traditional methods such as stochastic gradient descent. However, despite its advantages, Adam is not without its limitations, particularly when it comes to handling issues such as overshooting and stagnation during the training process. To address these challenges, we introduce AdamZ as an advanced variant of the Adam optimiser. AdamZ is specifically designed to dynamically adjust the learning rate responsive to the characteristics of the loss function, thereby improving both convergence stability and model accuracy. This novel optimiser integrates mechanisms to detect and mitigate overshooting, at the point where the optimiser has stepped too far into the parameter space, and stagnation at points, where progress has started to stall despite ongoing training. By introducing hyperparameters such as overshoot and stagnation factors, thresholds, and patience levels, AdamZ provides a more responsive approach to learning rate adaptation than obtained through Adam.

artificial intelligence, deep learning, machine learning, (18 more...)

2411.15375

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
Europe > United Kingdom > England > Berkshire > Reading (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

arXiv.org Machine LearningNov-22-2024

Influence functions and regularity tangents for efficient active learning

Eaton, Frederik

In this paper we describe an efficient method for providing a regression model with a sense of curiosity about its data. In the field of machine learning, our framework for representing curiosity is called active learning, which means automatically choosing data points for which to query labels in the semisupervised setting. The methods we propose are based on computing a "regularity tangent" vector that can be calculated (with only a constant slow-down) together with the model's parameter vector during training. We then take the inner product of this tangent vector with the gradient vector of the model's loss at a given data point to obtain a measure of the influence of that point on the complexity of the model. There is only a single regularity tangent vector, of the same dimension as the parameter vector. Thus, in the proposed technique, once training is complete, evaluating our "curiosity" about a potential query data point can be done as quickly as calculating the model's loss gradient at that point. The new vector only doubles the amount of storage required by the model. We show that the quantity computed by our technique is an example of an "influence function", and that it measures the expected squared change in model complexity incurred by up-weighting a given data point. We propose a number of ways for using this quantity to choose new training data for a model in the framework of active learning.

artificial intelligence, machine learning, regularity tangent, (13 more...)

2411.15292

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Industry:

Education (0.46)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Aarts, Gert, Hajizadeh, Ouraman, Lucini, Biagio, Park, Chanju

Dyson Brownian motion and random matrix dynamics of weight matrices during learning

arXiv.org Artificial IntelligenceNov-20-2024

During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.

artificial intelligence, deep learning, machine learning, (18 more...)

2411.13512

Country:

Europe > United Kingdom (0.29)
Europe > Austria (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)