AITopics

The canonical deep learning approach for learning requires computing a gradient term at each layer by back-propagating the error signal from the output towards each learnable parameter. Given the stacked structure of neural networks, where each layer builds on the representation of the layer below, this approach leads to hierarchical representations. More abstract features live on the top layers of the model, while features on lower layers are expected to be less abstract. In contrast to this, we introduce a new learning method named NoProp, which does not rely on either forward or backwards propagation. Instead, NoProp takes inspiration from diffusion and flow matching methods, where each layer independently learns to denoise a noisy target. We believe this work takes a first step towards introducing a new family of gradient-free learning methods, that does not learn hierarchical representations -- at least not in the usual sense. NoProp needs to fix the representation at each layer beforehand to a noised version of the target, learning a local denoising process that can then be exploited at inference. We demonstrate the effectiveness of our method on MNIST, CIFAR-10, and CIFAR-100 image classification benchmarks. Our results show that NoProp is a viable learning algorithm which achieves superior accuracy, is easier to use and computationally more efficient compared to other existing back-propagation-free methods. By departing from the traditional gradient based learning paradigm, NoProp alters how credit assignment is done within the network, enabling more efficient distributed learning as well as potentially impacting other characteristics of the learning process.

artificial intelligence, machine learning, representation, (18 more...)

2503.24322

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Anderson, Prashanti, Bakshi, Ainesh, Majid, Mahbod, Tiegel, Stefan

Sample-Optimal Private Regression in Polynomial Time

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

artificial intelligence, machine learning, sample-optimal private regression, (1 more...)

2503.24321

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Solving the Best Subset Selection Problem via Suboptimal Algorithms

Singh, Vikram, Sun, Min

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.

algorithm, artificial intelligence, machine learning, (20 more...)

2503.243

Country:

North America > United States > Alabama (0.28)
North America > United States > Oklahoma (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Choi, Young-Pil, Kalise, Dante, Peters, Andrés A.

Collisionless and Decentralized Formation Control for Strings

Multi-agent systems (MAS) have proven to be a versatile framework for studying diverse scalability problems in Science and Engineering, such as dynamic networks [35], autonomous vehicles [5], collective behaviour of humans or animals [42, 43], and many others [2, 6]. Mathematically, MAS are often modelled as large-scale dynamical systems where each agent can be considered as a subset of states, updated via interaction forces such as attraction, repulsion, alignment, etc., [27, 19] or through the optimization of a pay-off function in a control/game framework [32, 29]. In this work, we approach the study of MAS from a control viewpoint. We study a class of sparsely interconnected agents in one dimension, interacting through nonlinear couplings and a decentralized control law. The elementary building block of our approach is the celebrated Cucker-Smale model for consensus dynamics [19], which corresponds to a MAS where each agent is endowed with second-order nonlinear dynamics for velocity alignment, and where the influence of neighbouring agents decays with distance. The Cucker-Smale model and variants can represent the physical motion of agents on the real line, inspired by autonomous vehicle formations in platooning with a nearest-neighbour interaction scheme [41, 44].

agent, artificial intelligence, particle, (17 more...)

2102.13621

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Vödisch, Niclas, Dodel, David, Schötz, Michael

FSOCO: The Formula Student Objects in Context Dataset

This paper presents the FSOCO dataset, a collaborative dataset for vision-based cone detection systems in Formula Student Driverless competitions. It contains human annotated ground truth labels for both bounding boxes and instance-wise segmentation masks. The data buy-in philosophy of FSOCO asks student teams to contribute to the database first before being granted access ensuring continuous growth. By providing clear labeling guidelines and tools for a sophisticated raw image selection, new annotations are guaranteed to meet the desired quality. The effectiveness of the approach is shown by comparing prediction results of a network trained on FSOCO and its unregulated predecessor. The FSOCO dataset can be found at https://fsoco.github.io/fsoco-dataset/.

artificial intelligence, dataset, machine learning, (13 more...)

doi: 10.4271/12-05-01-0003

2012.07139

Country:

Genre: Research Report (0.40)

Industry: Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.93)
(2 more...)

Tschantz, Alexander, Koudahl, Magnus, Linander, Hampus, Da Costa, Lancelot, Heins, Conor, Beck, Jeff, Buckley, Christopher

Bayesian Predictive Coding

Predictive coding (PC) is an influential theory of information processing in the brain, providing a biologically plausible alternative to backpropagation. It is motivated in terms of Bayesian inference, as hidden states and parameters are optimised via gradient descent on variational free energy. However, implementations of PC rely on maximum \textit{a posteriori} (MAP) estimates of hidden states and maximum likelihood (ML) estimates of parameters, limiting their ability to quantify epistemic uncertainty. In this work, we investigate a Bayesian extension to PC that estimates a posterior distribution over network parameters. This approach, termed Bayesian Predictive coding (BPC), preserves the locality of PC and results in closed-form Hebbian weight updates. Compared to PC, our BPC algorithm converges in fewer epochs in the full-batch setting and remains competitive in the mini-batch setting. Additionally, we demonstrate that BPC offers uncertainty quantification comparable to existing methods in Bayesian deep learning, while also improving convergence properties. Together, these results suggest that BPC provides a biologically plausible method for Bayesian learning in the brain, as well as an attractive approach to uncertainty quantification in deep learning.

artificial intelligence, machine learning, posterior distribution, (16 more...)

2503.24016

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

Zhang, Jianguo, Hoang, Thai, Zhu, Ming, Liu, Zuxin, Wang, Shiyu, Awalgaonkar, Tulika, Prabhakar, Akshara, Chen, Haolin, Yao, Weiran, Liu, Zhiwei, Tan, Juntao, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Savarese, Silvio, Xiong, Caiming

Action models are essential for enabling autonomous agents to perform complex tasks. However, training large action models remains challenging due to the diversity of agent environments and the complexity of agentic data. Despite growing interest, existing infrastructure provides limited support for scalable, agent-specific fine-tuning. We present ActionStudio, a lightweight and extensible data and training framework designed for large action models. ActionStudio unifies heterogeneous agent trajectories through a standardized format, supports diverse training paradigms including LoRA, full fine-tuning, and distributed setups, and integrates robust preprocessing and verification tools. We validate its effectiveness across both public and realistic industry benchmarks, demonstrating strong performance and practical scalability. We open-sourced code and data at https://github.com/SalesforceAIResearch/xLAM to facilitate research in the community.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2503.22673

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Egami, Shusaku, Matsushita, Kyoumoto, Ugai, Takanori, Fukuda, Ken

Comparison of Metadata Representation Models for Knowledge Graph Embeddings

Hyper-relational Knowledge Graphs (HRKGs) extend traditional KGs beyond binary relations, enabling the representation of contextual, provenance, and temporal information in domains, such as historical events, sensor data, video content, and narratives. HRKGs can be structured using several Metadata Representation Models (MRMs), including Reification (REF), Singleton Property (SGP), and RDF-star (RDR). However, the effects of different MRMs on KG Embedding (KGE) and Link Prediction (LP) models remain unclear. This study evaluates MRMs in the context of LP tasks, identifies the limitations of existing evaluation frameworks, and introduces a new task that ensures fair comparisons across MRMs. Furthermore, we propose a framework that effectively reflects the knowledge representations of the three MRMs in latent space. Experiments on two types of datasets reveal that REF performs well in simple HRKGs, whereas SGP is less effective. However, in complex HRKGs, the differences among MRMs in the LP tasks are minimal. Our findings contribute to an optimal knowledge representation strategy for HRKGs in LP tasks.

artificial intelligence, mrm, representation, (16 more...)

2503.21804

Country: Europe (0.28)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.79)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)

The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

Stollenwerk, Felix

A recent paper proposes Dynamic Tanh (DyT) as a drop-in replacement for layer normalization (LN). Although the method is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we shed light on the mathematical relationship between layer normalization and dynamic activation functions. In particular, we derive DyT from LN and show that a well-defined approximation is needed to do so. By dropping said approximation, an alternative activation function is obtained, which we call Dynamic Inverse Square Root Unit (DyISRU). DyISRU is the exact counterpart of layer normalization, and we demonstrate numerically that it indeed resembles LN more accurately than DyT does.

artificial intelligence, machine learning, normalization, (12 more...)

2503.21708

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People

Kamikubo, Rie, Kayukawa, Seita, Kaniwa, Yuka, Wang, Allan, Kacorri, Hernisa, Takagi, Hironobu, Asakawa, Chieko

Autonomous navigation robots can increase the independence of blind people but often limit user control, following what is called in Japanese an "omakase" approach where decisions are left to the robot. This research investigates ways to enhance user control in social robot navigation, based on two studies conducted with blind participants. The first study, involving structured interviews (N=14), identified crowded spaces as key areas with significant social challenges. The second study (N=13) explored navigation tasks with an autonomous robot in these environments and identified design strategies across different modes of autonomy. Participants preferred an active role, termed the "boss" mode, where they managed crowd interactions, while the "monitor" mode helped them assess the environment, negotiate movements, and interact with the robot. These findings highlight the importance of shared control and user involvement for blind users, offering valuable insights for designing future social navigation robots.

artificial intelligence, large language model, natural language, (19 more...)

doi: 10.1145/3706598.3714112

2503.21997

Country:

North America > United States > New York (0.28)
North America > United States > Maryland (0.28)
Asia > Japan > Honshū > Kantō (0.15)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)