AITopics

2510.13846

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

arXiv.org Machine LearningOct-14-2025

A Unified Framework for Lifted Training and Inversion Approaches

Wang, Xiaoyu, Valavanis, Alexandra, Mahmood, Azhir, Mang, Andreas, Benning, Martin, Repetti, Audrey

The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly but penalised with penalty terms. This chapter introduces a unified framework that encapsulates various lifted training strategies--including the Method of Auxiliary Coordinates (MAC), Fenchel Lifted Networks, and Lifted Bregman Training--and demonstrates how diverse architectures, such as Multi-Layer Perceptrons (MLPs), Residual Neural Networks (ResNets), and Proximal Neural Networks (PNNs), fit within this structure. By leveraging tools from convex optimisation, particularly Bregman distances, the framework facilitates distributed optimisation, accommodates non-differentiable proximal activations, and can improve the conditioning of the training landscape. We discuss the implementation of these methods using block-coordinate descent (BCD) strategies--including deterministic implementations enhanced by accelerated (e.g., Nesterov, Heavyball) and adaptive (e.g., Adam) optimisation techniques--as well as implicit stochastic gradient methods (ISGM). Furthermore, we explore the application of this framework to inverse problems, detailing methodologies for both the training of specialised networks (e.g., unrolled architectures) and the stable inversion of pre-trained networks. Numerical results on standard imaging tasks validate the effectiveness and stability of the lifted Bregman approach compared to conventional training, particularly for architectures employing proximal activations.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2510.09796

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.87)

Martínez-Ibarra, Antonio, González-Vidal, Aurora, Cánovas-Rodríguez, Adrián, Skarmeta, Antonio F.

Chlorophyll-a Mapping and Prediction in the Mar Menor Lagoon Using C2RCC-Processed Sentinel 2 Imagery

arXiv.org Artificial IntelligenceOct-14-2025

The Mar Menor, Europe's largest coastal lagoon, located in Spain, has undergone severe eutrophication crises. Monitoring chlorophyll-a (Chl-a) is essential to anticipate harmful algal blooms and guide mitigation. Traditional in situ measurements are spatially and temporally limited. Satellite-based approaches provide a more comprehensive view, enabling scalable, long-term, and transferable monitoring. This study aims to overcome limitations of chlorophyll monitoring, often restricted to surface estimates or limited temporal coverage, by developing a reliable methodology to predict and map Chl-a across the water column of the Mar Menor. The work integrates Sentinel 2 imagery with buoy-based ground truth to create models capable of high-resolution, depth-specific monitoring, enhancing early-warning capabilities for eutrophication. Nearly a decade of Sentinel 2 images was atmospherically corrected using C2RCC processors. Buoy data were aggregated by depth (0-1 m, 1-2 m, 2-3 m, 3-4 m). Multiple ML and DL algorithms-including RF, XGBoost, CatBoost, Multilater Perceptron Networks, and ensembles-were trained and validated using cross-validation. Systematic band-combination experiments and spatial aggregation strategies were tested to optimize prediction. Results show depth-dependent performance. At the surface, C2X-Complex with XGBoost and ensemble models achieved R2 = 0.89; at 1-2 m, CatBoost and ensemble models reached R2 = 0.87; at 2-3 m, TOA reflectances with KNN performed best (R2 = 0.81); while at 3-4 m, RF achieved R2 = 0.66. Generated maps successfully reproduced known eutrophication events (e.g., 2016 crisis, 2025 surge), confirming robustness. The study delivers an end-to-end, validated methodology for depth-specific Chl-amapping. Its integration of multispectral band combinations, buoy calibration, and ML/DL modeling offers a transferable framework for other turbid coastal systems.

artificial intelligence, machine learning, sentinel 2, (15 more...)

2510.09736

Country:

North America > United States (0.92)
Europe (0.86)

Genre: Research Report > New Finding (0.66)

Industry:

Energy (0.94)
Information Technology (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.34)

Wang, Linfeng, Campino, Susana, Clark, Taane G., Phelan, Jody E.

Decoding Positive Selection in Mycobacterium tuberculosis with Phylogeny-Guided Graph Attention Models

arXiv.org Artificial IntelligenceOct-13-2025

Positive selection drives the emergence of adaptive mutations in Mycobacterium tuberculosis, shaping drug resistance, transmissibility, and virulence. Phylogenetic trees capture evolutionary relationships among isolates and provide a natural framework for detecting such adaptive signals. We present a phylogeny-guided graph attention network (GAT) approach, introducing a method for converting SNP-annotated phylogenetic trees into graph structures suitable for neural network analysis. Using 500 M. tuberculosis isolates from four major lineages and 249 single-nucleotide variants (84 resistance-associated and 165 neutral) across 61 drug-resistance genes, we constructed graphs where nodes represented isolates and edges reflected phylogenetic distances. Edges between isolates separated by more than seven internal nodes were pruned to emphasise local evolutionary structure. Node features encoded SNP presence or absence, and the GAT architecture included two attention layers, a residual connection, global attention pooling, and a multilayer perceptron classifier. The model achieved an accuracy of 0.88 on a held-out test set and, when applied to 146 WHO-classified "uncertain" variants, identified 41 candidates with convergent emergence across multiple lineages, consistent with adaptive evolution. This work demonstrates the feasibility of transforming phylogenies into GNN-compatible structures and highlights attention-based models as effective tools for detecting positive selection, aiding genomic surveillance and variant prioritisation.

artificial intelligence, machine learning, mutation, (17 more...)

2510.08703

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Neural Information Processing SystemsOct-10-2025, 22:37:09 GMT

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Y ang 1,2,5 Wang Zeng

Recent advancements in Multimodal Large Language Models (MLLMs) have greatly improved their abilities in image understanding. However, these models often struggle with grasping pixel-level semantic details, e.g., the keypoints of an object. To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.

arxiv preprint arxiv, keypoint, zhang, (11 more...)

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.46)

Neural Information Processing SystemsOct-10-2025, 21:44:30 GMT

Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron

The ability of a brain or a neural network to efficiently learn depends crucially on both the task structure and the learning rule.

equation, input noise, perceptron, (16 more...)

Country: North America > United States > Oregon (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.52)

Kuipers, Bart, Byrman, Freek, Uyterlinde, Daniel, García-Castellanos, Alejandro

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

arXiv.org Artificial IntelligenceOct-10-2025

Amortized optimization accelerates the solution of related optimization problems by learning mappings that exploit shared structure across problem instances. We explore the use of Scale Equivariant Graph Metanetworks (ScaleGMNs) for this purpose. By operating directly in weight space, ScaleGMNs enable single-shot fine-tuning of existing models, reducing the need for iterative optimization. We demonstrate the effectiveness of this approach empirically and provide a theoretical result: the gauge freedom induced by scaling symmetries is strictly smaller in convolutional neural networks than in multi-layer perceptrons. This insight helps explain the performance differences observed between architectures in both our work and that of Kalogeropoulos et al. (2024). Overall, our findings underscore the potential of symmetry-aware metanetworks as a powerful approach for efficient and generalizable neural network optimization. Open-source code: https://github.com/daniuyter/scalegmn_amortization

artificial intelligence, machine learning, representation, (18 more...)

2510.083

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Artificial IntelligenceOct-10-2025

Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation

Yin, Shiyuan, Bai, Chenjia, Zhang, Zihao, Jin, Junwei, Zhang, Xinxin, Zhang, Chi, Li, Xuelong

Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding. However, LLM hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans. While researchers have explored uncertainty estimation to improve the reliability of LLM-based planning, existing studies have not sufficiently differentiated between epistemic and intrinsic uncertainty, limiting the effectiveness of uncertainty estimation. In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. Furthermore, epistemic uncertainty is subdivided into task clarity and task familiarity for more accurate evaluation. The overall uncertainty assessments are obtained using random network distillation and multi-layer perceptron regression heads driven by LLM features. We validated our approach in two distinct experimental settings: kitchen manipulation and tabletop rearrangement experiments. The results show that, compared to existing methods, our approach yields uncertainty estimates that are more closely aligned with the actual execution outcomes.

large language model, machine learning, uncertainty estimation, (18 more...)

2510.08044

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Neural Information Processing SystemsOct-9-2025, 23:39:04 GMT

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models Shuxia Lin

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments.

const, learngene, vit model, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)
Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 01:52:17 GMT

Supplementary Material A Proof of identification (3)

We state it here for clarity and completeness. The data generating mechanism for ( X, A,Z, W, U) is summarized in Table 1, and the setups of varying parameters in each scenario are summarized in Table 2. Table 1: Data generating mechanism and setup for fixed parameters across scenarios.21 X)null + ωW, (20) where the first equality is due to Assumption 1. Furthermore, note that E[h( W, 1, X)|X, Z,U ] = E[h( W, 1, X)| X,U ] = E[Y | X,A = 1, U] = E[Y | X,A = 1, Z,U ] = b X)null, 22 where the first and third equality is due to Assumption 1, the second equality follows from Theorem 1 of Miao et al. (2018a) under Assumptions 2 and 3, and the last equality is by (19). X) null + ωW, where the second equality is due to Assumption 1, and the third equality is due to Theorem 2.2 of Cui et al. (2023) under Assumptions 4 and 5, and the last equality is due to (20). Step (i) The method we adopt is neural maximum moment restriction (NMMR), which employs multilayer perceptron (MLP) to estimate the confounding bridges (Kompa et al., 2022).

artificial intelligence, equality, machine learning, (19 more...)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)