Goto

Collaborating Authors

 residual learning


Multimodal Residual Learning for Visual QA

Neural Information Processing Systems

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.


Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

Neural Information Processing Systems

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and question sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets---we define these as slices, the key abstraction in our approach. To address slice-level performance, practitioners often train separate expert models on slice subsets or use multi-task hard parameter sharing. We propose Slice-based Learning, a new programming model in which the slicing function (SF), a programmer abstraction, is used to specify additional model capacity for each slice. Any model can leverage SFs to learn slice-specific representations, which are combined with an attention mechanism to make slice-aware predictions. We show that our approach improves over baselines in terms of computational complexity and slice-specific performance by up to 19.0 points, and overall performance by up to 4.6 F1 points on applications spanning natural language understanding and computer vision benchmarks as well as production-scale industrial systems.


Multimodal Residual Learning for Visual QA

Neural Information Processing Systems

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.



ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning

Liu, Xingyu, Goh, Kun Ming

arXiv.org Artificial Intelligence

Abstract--Convolutional Neural Networks (CNNs) have rev-olutionised computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Residual Networks (ResNet), introduced by He et al. (2015), which overcome this limitation by using skip connections. ResNet enables the training of networks with hundreds of layers by allowing gradients to flow directly through shortcut connections that bypass intermediate layers. In our implementation on the CIF AR-10 dataset, ResNet-18 achieves 89.9% accuracy compared to 84.1% for a traditional deep CNN of similar depth, while also converging faster and training more stably. Deep Convolutional Neural Networks (CNNs) have become the foundation of modern computer vision, powering applications from image classification to object detection.


Residual Correction Models for AC Optimal Power Flow Using DC Optimal Power Flow Solutions

Za'ter, Muhy Eddin, Hodge, Bri-Mathias, Baker, Kyri

arXiv.org Artificial Intelligence

Solving the nonlinear AC optimal power flow (AC OPF) problem remains a major computational bottleneck for real-time grid operations. In this paper, we propose a residual learning paradigm that uses fast DC optimal power flow (DC OPF) solutions as a baseline, and learns only the nonlinear corrections required to provide the full AC-OPF solution. The method utilizes a topology-aware Graph Neural Network with local attention and two-level DC feature integration, trained using a physics-informed loss that enforces AC power-flow feasibility and operational limits. Evaluations on OPFData for 57-, 118-, and 2000-bus systems show around 25% lower MSE, up to 3X reduction in feasibility error, and up to 13X runtime speedup compared to conventional AC OPF solvers. The model maintains accuracy under N-1 contingencies and scales efficiently to large networks. These results demonstrate that residual learning is a practical and scalable bridge between linear approximations and AC-feasible OPF, enabling near real-time operational decision making.


ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

Zhao, Siheng, Ze, Yanjie, Wang, Yue, Liu, C. Karen, Abbeel, Pieter, Shi, Guanya, Duan, Rocky

arXiv.org Artificial Intelligence

Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .


Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

Neural Information Processing Systems

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets---we define these as slices, the key abstraction in our approach. To address slice-level performance, practitioners often train separate "expert" models on slice subsets or use multi-task hard parameter sharing. We propose Slice-based Learning, a new programming model in which the slicing function (SF), a programmer abstraction, is used to specify additional model capacity for each slice. Any model can leverage SFs to learn slice-specific representations, which are combined with an attention mechanism to make slice-aware predictions.


Predictive Kinematic Coordinate Control for Aerial Manipulators based on Modified Kinematics Learning

Li, Zhengzhen, Shen, Jiahao, Ji, Mengyu, Cao, Huazi, Zhao, Shiyu

arXiv.org Artificial Intelligence

High-precision manipulation has always been a developmental goal for aerial manipulators. This paper investigates the kinematic coordinate control issue in aerial manipulators. We propose a predictive kinematic coordinate control method, which includes a learning-based modified kinematic model and a model predictive control (MPC) scheme based on weight allocation. Compared to existing methods, our proposed approach offers several attractive features. First, the kinematic model incorporates closed-loop dynamics characteristics and online residual learning. Compared to methods that do not consider closed-loop dynamics and residuals, our proposed method has improved accuracy by 59.6$\%$. Second, a MPC scheme that considers weight allocation has been proposed, which can coordinate the motion strategies of quadcopters and manipulators. Compared to methods that do not consider weight allocation, the proposed method can meet the requirements of more tasks. The proposed approach is verified through complex trajectory tracking and moving target tracking experiments. The results validate the effectiveness of the proposed method.


Reviews: Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

Neural Information Processing Systems

Originality: - The authors claim this paradigm towards specifying ML models is novel. It is somewhat difficult for me to assess the originality of this work as it's not exactly my area, but I am inclined to agree that their approach seems new and interesting. Quality: - Section 3.3: It's quite generous to call these "key properties" of the model, as really they refer to the results of this particular instantiation of slice-based learning on this toy dataset. It's definitely nice to see that the approach works on a toy dataset, but I would strongly consider reframing this section. The latter two are adequately addressed in the paper and experiments, but the noise aspect was not really addressed.