AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics

Shen, Keyi, Yu, Jiangwei, Zhang, Huan, Li, Yunzhu

arXiv.org Artificial IntelligenceDec-12-2024

Neural-network-based dynamics models learned from observational data have shown strong predictive capabilities for scene dynamics in robotic manipulation tasks. However, their inherent non-linearity presents significant challenges for effective planning. Current planning methods, often dependent on extensive sampling or local gradient descent, struggle with long-horizon motion planning tasks involving complex contact events. In this paper, we present a GPU-accelerated branch-and-bound (BaB) framework for motion planning in manipulation tasks that require trajectory optimization over neural dynamics models. Our approach employs a specialized branching heuristics to divide the search space into subdomains, and applies a modified bound propagation method, inspired by the state-of-the-art neural network verifier alpha-beta-CROWN, to efficiently estimate objective bounds within these subdomains. The branching process guides planning effectively, while the bounding process strategically reduces the search space. Our framework achieves superior planning performance, generating high-quality state-action trajectories and surpassing existing methods in challenging, contact-rich manipulation tasks such as non-prehensile planar pushing with obstacles, object sorting, and rope routing in both simulated and real-world settings. Furthermore, our framework supports various neural network architectures, ranging from simple multilayer perceptrons to advanced graph neural dynamics models, and scales efficiently with different model sizes.

artificial intelligence, latexit sha1, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.09584

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Italy > Lazio > Rome (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes

Tyurin, Alexander

arXiv.org Artificial IntelligenceDec-11-2024

We focus on the classification problem with a separable dataset, one of the most important and classical problems from machine learning. The standard approach to this task is logistic regression with gradient descent (LR+GD). Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes, defying conventional optimization theory. Our work investigates this phenomenon and makes three interconnected key observations about LR+GD with large step sizes. First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size $\gamma \to \infty.$ Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a solution for the classification problem, meaning that logistic loss is an unreliable metric of the proximity to a solution. Surprisingly, high loss values can actually indicate faster convergence. Third, since the convergence rate in terms of loss function values of LR+GD is unreliable, we examine the iteration complexity required by LR+GD with large step sizes to solve the classification problem and prove that this complexity is suboptimal. To address this, we propose a new method, Normalized LR+GD - based on the connection between LR+GD and the perceptron algorithm - with much better theoretical guarantees.

artificial intelligence, iteration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.08424

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria

Kornowski, Guy, Shamir, Ohad

arXiv.org Artificial IntelligenceDec-9-2024

We study the problem of solving matrix games of the form $\max_{\mathbf{w}\in\mathcal{W}}\min_{\mathbf{p}\in\Delta}\mathbf{p}^{\top}A\mathbf{w}$, where $A$ is some matrix and $\Delta$ is the probability simplex. This problem encapsulates canonical tasks such as finding a linear separator and computing Nash equilibria in zero-sum games. However, perhaps surprisingly, its inherent complexity (as formalized in the standard framework of oracle complexity [Nemirovski and Yudin, 1983]) is not well-understood. In this work, we first identify different oracle models which are implicitly used by prior algorithms, amounting to multiplying the matrix $A$ by a vector from either one or both sides. We then prove complexity lower bounds for algorithms under both access models, which in particular imply a separation between them. Specifically, we start by proving that algorithms for linear separability based on one-sided multiplications must require $\Omega(\gamma_A^{-2})$ iterations, where $\gamma_A$ is the margin, as matched by the Perceptron algorithm. We then prove that accelerated algorithms for this task, which utilize multiplications from both sides, must require $\tilde{\Omega}(\gamma_{A}^{-2/3})$ iterations, establishing the first oracle complexity barrier for such algorithms. Finally, by adapting our lower bound to $\ell_1$ geometry, we prove that computing an $\epsilon$-approximate Nash equilibrium requires $\tilde{\Omega}(\epsilon^{-2/5})$ iterations, which is an exponential improvement over the previously best-known lower bound due to Hadiji et al. [2024].

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.0699

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Sequential Compression Layers for Efficient Federated Learning in Foundational Models

Mahla, Navyansh, Gupta, Sunny, Sethi, Amit

arXiv.org Artificial IntelligenceDec-9-2024

Federated Learning (FL) has gained popularity for fine-tuning large language models (LLMs) across multiple nodes, each with its own private data. While LoRA has been widely adopted for parameter-efficient federated fine-tuning, recent theoretical and empirical studies highlight its suboptimal performance in the federated learning context. In response, we propose a novel, simple, and more effective parameter-efficient fine-tuning method that does not rely on LoRA. Our approach introduces a small multi-layer perceptron (MLP) layer between two existing MLP layers--the up_proj (the FFN projection layer following the self-attention module) and down_proj--within the feed-forward network of the transformer block. This solution addresses the bottlenecks associated with LoRA in federated fine-tuning and outperforms recent LoRA-based approaches, demonstrating superior performance for both language models and vision encoders.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.07021

Country:

Asia > India > Maharashtra > Mumbai (0.05)
North America > United States > Virginia (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine (0.47)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

MDiFF: Exploiting Multimodal Score-based Diffusion Models for New Fashion Product Performance Forecasting

Avogaro, Andrea, Capogrosso, Luigi, Fummi, Franco, Cristani, Marco

arXiv.org Artificial IntelligenceDec-7-2024

The fast fashion industry suffers from significant environmental impacts due to overproduction and unsold inventory. Accurately predicting sales volumes for unreleased products could significantly improve efficiency and resource utilization. However, predicting performance for entirely new items is challenging due to the lack of historical data and rapidly changing trends, and existing deterministic models often struggle with domain shifts when encountering items outside the training data distribution. The recently proposed diffusion models address this issue using a continuous-time diffusion process. This allows us to simulate how new items are adopted, reducing the impact of domain shift challenges faced by deterministic models. As a result, in this paper, we propose MDiFF: a novel two-step multimodal diffusion modelsbased pipeline for New Fashion Product Performance Forecasting (NF-PPF). First, we use a score-based diffusion model to predict multiple future sales for different clothes over time. Then, we refine these multiple predictions with a lightweight Multi-layer Perceptron (MLP) to get the final forecast.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2412.0684

Country: Europe > Italy (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Entity-based Reinforcement Learning for Autonomous Cyber Defence

Thompson, Isaac Symes, Caron, Alberto, Hicks, Chris, Mavroudis, Vasilios

arXiv.org Artificial IntelligenceDec-6-2024

A significant challenge for autonomous cyber defence is ensuring a defensive agent's ability to generalise across diverse network topologies and configurations. This capability is necessary for agents to remain effective when deployed in dynamically changing environments, such as an enterprise network where devices may frequently join and leave. Standard approaches to deep reinforcement learning, where policies are parameterised using a fixed-input multi-layer perceptron (MLP) expect fixed-size observation and action spaces. In autonomous cyber defence, this makes it hard to develop agents that generalise to environments with network topologies different from those trained on, as the number of nodes affects the natural size of the observation and action spaces. To overcome this limitation, we reframe the problem of autonomous network defence using entity-based reinforcement learning, where the observation and action space of an agent are decomposed into a collection of discrete entities. This framework enables the use of policy parameterisations specialised in compositional generalisation. We train a Transformer-based policy on the Yawning Titan cyber-security simulation environment and test its generalisation capabilities across various network topologies. We demonstrate that this approach significantly outperforms an MLP-based policy when training across fixed-size networks of varying topologies, and matches performance when training on a single network. We also demonstrate the potential for zero-shot generalisation to networks of a different size to those seen in training. These findings highlight the potential for entity-based reinforcement learning to advance the field of autonomous cyber defence by providing more generalisable policies capable of handling variations in real-world network environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2410.17647

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
(10 more...)

Genre:

Research Report (0.81)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Mixed Delay/Nondelay Embeddings Based Neuromorphic Computing with Patterned Nanomagnet Arrays

Ti, Changpeng, Hassan, Usman, Vatsavai, Sairam Sri, McCarter, Margaret, Vasdev, Aastha, An, Jincheng, Achinuq, Barat, Welp, Ulrich, Cheung, Sen-Ching, Thakkar, Ishan G, Hastings, J. Todd

arXiv.org Artificial IntelligenceDec-5-2024

Patterned nanomagnet arrays (PNAs) have been shown to exhibit a strong geometrically frustrated dipole interaction. Some PNAs have also shown emergent domain wall dynamics. Previous works have demonstrated methods to physically probe these magnetization dynamics of PNAs to realize neuromorphic reservoir systems that exhibit chaotic dynamical behavior and high-dimensional nonlinearity. These PNA reservoir systems from prior works leverage echo state properties and linear/nonlinear short-term memory of component reservoir nodes to map and preserve the dynamical information of the input time-series data into nondelay spatial embeddings. Such mappings enable these PNA reservoir systems to imitate and predict/forecast the input time series data. However, these prior PNA reservoir systems are based solely on the nondelay spatial embeddings obtained at component reservoir nodes. As a result, they require a massive number of component reservoir nodes, or a very large spatial embedding (i.e., high-dimensional spatial embedding) per reservoir node, or both, to achieve acceptable imitation and prediction accuracy. These requirements reduce the practical feasibility of such PNA reservoir systems. To address this shortcoming, we present a mixed delay/nondelay embeddings-based PNA reservoir system. Our system uses a single PNA reservoir node with the ability to obtain a mixture of delay/nondelay embeddings of the dynamical information of the time-series data applied at the input of a single PNA reservoir node. Our analysis shows that when these mixed delay/nondelay embeddings are used to train a perceptron at the output layer, our reservoir system outperforms existing PNA-based reservoir systems for the imitation of NARMA 2, NARMA 5, NARMA 7, and NARMA 10 time series data, and for the short-term and long-term prediction of the Mackey Glass time series data.

artificial intelligence, machine learning, reservoir system, (11 more...)

arXiv.org Artificial Intelligence

2412.04622

Country:

North America > United States > Kentucky > Fayette County > Lexington (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Training MLPs on Graphs without Supervision

Wang, Zehong, Zhang, Zheyuan, Zhang, Chuxu, Ye, Yanfang

arXiv.org Artificial IntelligenceDec-4-2024

Graph Neural Networks (GNNs) have demonstrated their effectiveness in various graph learning tasks, yet their reliance on neighborhood aggregation during inference poses challenges for deployment in latency-sensitive applications, such as real-time financial fraud detection. To address this limitation, recent studies have proposed distilling knowledge from teacher GNNs into student Multi-Layer Perceptrons (MLPs) trained on node content, aiming to accelerate inference. However, these approaches often inadequately explore structural information when inferring unseen nodes. To this end, we introduce SimMLP, a Self-supervised framework for learning MLPs on graphs, designed to fully integrate rich structural information into MLPs. Notably, SimMLP is the first MLP-learning method that can achieve equivalence to GNNs in the optimal case. The key idea is to employ self-supervised learning to align the representations encoded by graph context-aware GNNs and neighborhood dependency-free MLPs, thereby fully integrating the structural information into MLPs. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias, highlighting SimMLP's advanced structural learning capabilities. Additionally, we conduct extensive experiments on 20 benchmark datasets, covering node classification, link prediction, and graph classification, to showcase SimMLP's superiority over state-of-the-art baselines, particularly in scenarios involving unseen nodes (e.g., inductive and cold-start node classification) where structural insights are crucial. Our codes are available at: https://github.com/Zehong-Wang/SimMLP.

dataset, information, simmlp, (14 more...)

arXiv.org Artificial Intelligence

2412.03864

Country:

Europe > Germany > Lower Saxony > Hanover (0.05)
North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks

Zhang, Xianyang, Zhou, Huijuan

arXiv.org Machine LearningDec-4-2024

Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al. (2024) that offers improved interpretability and a more parsimonious design in many science-oriented tasks compared to multi-layer perceptrons. This work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions that are either represented by linear combinations of basis functions or lying in a low-rank Reproducing Kernel Hilbert Space (RKHS). In the first case, the generalization bound accommodates various choices of basis functions in forming the activation functions in each layer of KAN and is adapted to different operator norms at each layer. For a particular choice of operator norms, the bound scales with the $l_1$ norm of the coefficient matrices and the Lipschitz constants for the activation functions, and it has no dependence on combinatorial parameters (e.g., number of nodes) outside of logarithmic factors. Moreover, our result does not require the boundedness assumption on the loss function and, hence, is applicable to a general class of regression-type loss functions. In the low-rank case, the generalization bound scales polynomially with the underlying ranks as well as the Lipschitz constants of the activation functions in each layer. These bounds are empirically investigated for KANs trained with stochastic gradient descent on simulated and real data sets. The numerical results demonstrate the practical relevance of these bounds.

activation function, arxiv preprint arxiv, complexity, (13 more...)

arXiv.org Machine Learning

2410.08026

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Handwriting-based Automated Assessment and Grading of Degree of Handedness: A Pilot Study

Bala, Smriti, Vishnu, Venugopalan Y., Joshi, Deepak

arXiv.org Artificial IntelligenceDec-2-2024

Hand preference and degree of handedness (DoH) are two different aspects of human behavior which are often confused to be one. DoH is a person's inherent capability of the brain; affected by nature and nurture. In this study, we used dominant and non-dominant handwriting traits to assess DoH for the first time, on 43 subjects of three categories- Unidextrous, Partially Unidextrous, and Ambidextrous. Features extracted from the segmented handwriting signals called strokes were used for DoH quantification. Davies Bouldin Index, Multilayer perceptron, and Convolutional Neural Network (CNN) were used for automated grading of DoH. The outcomes of these methods were compared with the widely used DoH assessment questionnaires from Edinburgh Inventory (EI). The CNN based automated grading outperformed other computational methods with an average classification accuracy of 95.06% under stratified 10-fold cross-validation. The leave-one-subject-out strategy on this CNN resulted in a test individual's DoH score which was converted into a 4-point score. Around 90% of the obtained scores from all the implemented computational methods were found to be in accordance with the EI scores under 95% confidence interval. Automated grading of degree of handedness using handwriting signals can provide more resolution to the Edinburgh Inventory scores. This could be used in multiple applications concerned with neuroscience, rehabilitation, physiology, psychometry, behavioral sciences, and forensics.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.01587

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback