AITopics | Ren, Bin

Collaborating Authors

Ren, Bin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle

Wang, Yinchuan, Ren, Bin, Zhang, Xiang, Wang, Pengyu, Wang, Chaoqun, Song, Rui, Li, Yibin, Meng, Max Q. -H.

arXiv.org Artificial IntelligenceJan-3-2025

LiDAR-based SLAM is recognized as one effective method to offer localization guidance in rough environments. However, off-the-shelf LiDAR-based SLAM methods suffer from significant pose estimation drifts, particularly components relevant to the vertical direction, when passing to uneven terrains. This deficiency typically leads to a conspicuously distorted global map. In this article, a LiDAR-based SLAM method is presented to improve the accuracy of pose estimations for ground vehicles in rough terrains, which is termed Rotation-Optimized LiDAR-Only (ROLO) SLAM. The method exploits a forward location prediction to coarsely eliminate the location difference of consecutive scans, thereby enabling separate and accurate determination of the location and orientation at the front-end. Furthermore, we adopt a parallel-capable spatial voxelization for correspondence-matching. We develop a spherical alignment-guided rotation registration within each voxel to estimate the rotation of vehicle. By incorporating geometric alignment, we introduce the motion constraint into the optimization formulation to enhance the rapid and effective estimation of LiDAR's translation. Subsequently, we extract several keyframes to construct the submap and exploit an alignment from the current scan to the submap for precise pose estimation. Meanwhile, a global-scale factor graph is established to aid in the reduction of cumulative errors. In various scenes, diverse experiments have been conducted to evaluate our method. The results demonstrate that ROLO-SLAM excels in pose estimation of ground vehicles and outperforms existing state-of-the-art LiDAR SLAM frameworks.

artificial intelligence, dataset, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/rob.22505

2501.02166

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.68)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Niu, Wei, Sanim, Md Musfiqur Rahman, Shu, Zhihao, Guan, Jiexiong, Shen, Xipeng, Yin, Miao, Agrawal, Gagan, Ren, Bin

arXiv.org Artificial IntelligenceApr-21-2024

This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications. This paper presents SmartMem, a comprehensive framework for eliminating most layout transformations, with the idea that multiple operators can use the same tensor layout through careful choice of layout and implementation of operations. Our approach is based on classifying the operators into four groups, and considering combinations of producer-consumer edges between the operators. We develop a set of methods for searching such layouts. Another component of our work is developing efficient memory layouts for 2.5 dimensional memory commonly seen in mobile devices. Our experimental results show that SmartMem outperforms 5 state-of-the-art DNN execution frameworks on mobile devices across 18 varied neural networks, including CNNs, Transformers with both local and global attention, as well as LLMs. In particular, compared to DNNFusion, SmartMem achieves an average speedup of 2.8$\times$, and outperforms TVM and MNN with speedups of 6.9$\times$ and 7.9$\times$, respectively, on average.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3620666.3651384

2404.13528

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.83)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Socially Pertinent Robots in Gerontological Healthcare

Alameda-Pineda, Xavier, Addlesee, Angus, García, Daniel Hernández, Reinke, Chris, Arias, Soraya, Arrigoni, Federica, Auternaud, Alex, Blavette, Lauriane, Beyan, Cigdem, Camara, Luis Gomez, Cohen, Ohad, Conti, Alessandro, Dacunha, Sébastien, Dondrup, Christian, Ellinson, Yoav, Ferro, Francesco, Gannot, Sharon, Gras, Florian, Gunson, Nancie, Horaud, Radu, D'Incà, Moreno, Kimouche, Imad, Lemaignan, Séverin, Lemon, Oliver, Liotard, Cyril, Marchionni, Luca, Moradi, Mordehay, Pajdla, Tomas, Pino, Maribel, Polic, Michal, Py, Matthieu, Rado, Ariel, Ren, Bin, Ricci, Elisa, Rigaud, Anne-Sophie, Rota, Paolo, Romeo, Marta, Sebe, Nicu, Sieińska, Weronika, Tandeitnik, Pinchas, Tonini, Francesco, Turro, Nicolas, Wintz, Timothée, Yu, Yanchao

arXiv.org Artificial IntelligenceApr-11-2024

Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilities will be useful and accepted in real-life facilities is yet to be answered. This paper is an attempt to partially answer this question, via two waves of experiments with patients and companions in a day-care gerontological facility in Paris with a full-sized humanoid robot endowed with social and conversational interaction capabilities. The software architecture, developed during the H2020 SPRING project, together with the experimental protocol, allowed us to evaluate the acceptability (AES) and usability (SUS) with more than 60 end-users. Overall, the users are receptive to this technology, especially when the robot perception and action skills are robust to environmental clutter and flexible to handle a plethora of different interactions.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2404.0756

Country:

North America > Canada > Ontario (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)
(4 more...)

Add feedback

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

Niu, Wei, Agrawal, Gagan, Ren, Bin

arXiv.org Artificial IntelligenceFeb-29-2024

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD$^2$ runs up to $3.9\times$ faster than these systems while saving up to $88\%$ peak memory consumption.

artificial intelligence, machine learning, sod 2, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3617232.3624869

2403.00176

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Key-Graph Transformer for Image Restoration

Ren, Bin, Li, Yawei, Liang, Jingyun, Ranjan, Rakesh, Liu, Mengyuan, Cucchiara, Rita, Van Gool, Luc, Sebe, Nicu

arXiv.org Artificial IntelligenceFeb-4-2024

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In response to these challenges, we introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes. Then the proposed Key-Graph Attention is conducted under the guidance of the Key-Graph only among selected nodes with linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed KGT's state-of-the-art performance, showcasing advancements both quantitatively and qualitatively.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.02634

Country: Asia > South Korea (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

Wang, Ti, Liu, Mengyuan, Liu, Hong, Ren, Bin, You, Yingxuan, Li, Wenhao, Sebe, Nicu, Li, Xia

arXiv.org Artificial IntelligenceFeb-3-2024

Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on two challenging datasets: Human3.6M and MPI-INF-3DHP. Notably, our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M. Our source code will be open-sourced.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2402.02339

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)

Li, Dong, Jin, Ruoming, Ren, Bin

arXiv.org Artificial IntelligenceDec-13-2023

Inspired by the success of contrastive learning, we systematically examine recommendation losses, including listwise (softmax), pairwise (BPR), and pointwise (MSE and CCL) losses. In this endeavor, we introduce InfoNCE+, an optimized generalization of InfoNCE with balance coefficients, and highlight its performance advantages, particularly when aligned with our new decoupled contrastive loss, MINE+. We also leverage debiased InfoNCE to debias pointwise recommendation loss (CCL) as Debiased CCL. Interestingly, our analysis reveals that linear models like iALS and EASE are inherently debiased. Empirical results demonstrates the effectiveness of MINE+ and Debiased-CCL.

artificial intelligence, loss function, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2312.0852

Country:

Europe > France (0.28)
North America > United States > New York (0.15)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RoMo-HER: Robust Model-based Hindsight Experience Replay

Huang, Yuming, Ren, Bin

arXiv.org Artificial IntelligenceJun-28-2023

Sparse rewards are one of the factors leading to low sample efficiency in multi-goal reinforcement learning (RL). Based on Hindsight Experience Replay (HER), model-based relabeling methods have been proposed to relabel goals using virtual trajectories obtained by interacting with the trained model, which can effectively enhance the sample efficiency in accurately modelable sparse-reward environments. However, they are ineffective in robot manipulation environment. In our paper, we design a robust framework called Robust Model-based Hindsight Experience Replay (RoMo-HER) which can effectively utilize the dynamical model in robot manipulation environments to enhance the sample efficiency. RoMo-HER is built upon a dynamics model and a novel goal relabeling technique called Foresight relabeling (FR), which selects the prediction starting state with a specific strategy, predicts the future trajectory of the starting state, and then relabels the goal using the dynamics model and the latest policy to train the agent. Experimental results show that RoMo-HER has higher sample efficiency than HER and Model-based Hindsight Experience Replay in several simulated robot manipulation environments. Furthermore, we integrate RoMo-HER and Relay Hindsight Experience Replay (RHER), which currently exhibits the highest sampling efficiency in most benchmark environments, resulting in a novel approach called Robust Model-based Relay Hindsight Experience Replay (RoMo-RHER). Our experimental results demonstrate that RoMo-RHER achieves higher sample efficiency over RHER, outperforming RHER by 25% and 26% in FetchPush-v1 and FetchPickandPlace-v1, respectively.

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2306.16061

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)

Add feedback

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

Li, Gen, Ji, Jie, Qin, Minghai, Niu, Wei, Ren, Bin, Afghah, Fatemeh, Guo, Linke, Ma, Xiaolong

arXiv.org Artificial IntelligenceJun-18-2023

As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git

artificial intelligence, machine learning, video, (18 more...)

arXiv.org Artificial Intelligence

2303.08331

Country: Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

Zhan, Zheng, Gong, Yifan, Zhao, Pu, Yuan, Geng, Niu, Wei, Wu, Yushu, Zhang, Tianyun, Jayaweera, Malith, Kaeli, David, Ren, Bin, Lin, Xue, Wang, Yanzhi

arXiv.org Artificial IntelligenceFeb-14-2023

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deployment of SISR tasks on mobile, we combine neural architecture search with pruning search and propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement. To decrease the search cost, we leverage the weight sharing strategy by introducing a supernet and decouple the search problem into three stages, including supernet construction, compiler-aware architecture and pruning search, and compiler-aware pruning ratio search. With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality (in terms of PSNR and SSIM) on mobile platforms (Samsung Galaxy S20).

artificial intelligence, machine learning, pruning, (19 more...)

arXiv.org Artificial Intelligence

2108.0891

Genre: Research Report (0.50)

Industry:

Information Technology (0.66)
Semiconductors & Electronics (0.48)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback