AITopics | Wang, Chengjie

Collaborating Authors

Wang, Chengjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing

Xie, Guoyang, Wang, Jinbao, Liu, Jiaqi, Lyu, Jiayi, Liu, Yong, Wang, Chengjie, Zheng, Feng, Jin, Yaochu

arXiv.org Artificial IntelligenceJan-27-2024

Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing (IM). Recently, many advanced algorithms have been reported, but their performance deviates considerably with various IM settings. We realize that the lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications. In addition, it is difficult for researchers to analyze IAD algorithms without a uniform benchmark. To solve this problem, we propose a uniform IM benchmark, for the first time, to assess how well these algorithms perform, which includes various levels of supervision (unsupervised versus fully supervised), learning paradigms (few-shot, continual and noisy label), and efficiency (memory usage and inference speed). Then, we construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets with a uniform setting. Extensive experiments (17,017 total) on IM-IAD provide in-depth insights into IAD algorithm redesign or selection. Moreover, the proposed IM-IAD benchmark challenges existing algorithms and suggests future research directions. To foster reproducibility and accessibility, the source code of IM-IAD is uploaded on the website, https://github.com/M-3LAB/IM-IAD.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2301.13359

Country:

Asia > China (0.95)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

Fei, Junjie, Wang, Teng, Zhang, Jinrui, He, Zhenyu, Wang, Chengjie, Zheng, Feng

arXiv.org Artificial IntelligenceJul-31-2023

Image-to-text generation aims to describe images using natural language. Recently, zero-shot image captioning based on pre-trained vision-language models (VLMs) and large language models (LLMs) has made significant progress. However, we have observed and empirically demonstrated that these methods are susceptible to modality bias induced by LLMs and tend to generate descriptions containing objects (entities) that do not actually exist in the image but frequently appear during training (i.e., object hallucination). In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios. ViECap incorporates entity-aware hard prompts to guide LLMs' attention toward the visual entities present in the image, enabling coherent caption generation across diverse scenes. With entity-aware hard prompts, ViECap is capable of maintaining performance when transferring from in-domain to out-of-domain scenarios. Extensive experiments demonstrate that ViECap sets a new state-of-the-art cross-domain (transferable) captioning and performs competitively in-domain captioning compared to previous VLMs-based zero-shot methods. Our code is available at: https://github.com/FeiElysia/ViECap

artificial intelligence, natural language, viecap, (17 more...)

arXiv.org Artificial Intelligence

2307.16525

Country:

Asia > China (0.67)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving

Yan, Guohang, Pi, Jiahao, Wang, Chengjie, Cai, Xinyu, Li, Yikang

arXiv.org Artificial IntelligenceFeb-28-2023

Accurate and reliable sensor calibration is critical for fusing LiDAR and inertial measurements in autonomous driving. This paper proposes a novel three-stage extrinsic calibration method between LiDAR and GNSS/INS for autonomous driving. The first stage can quickly calibrate the extrinsic parameters between the sensors through point cloud surface features so that the extrinsic can be narrowed from a large initial error to a small error range in little time. The second stage can further calibrate the extrinsic parameters based on LiDAR-mapping space occupancy while removing motion distortion. In the final stage, the z-axis errors caused by the plane motion of the autonomous vehicle are corrected, and an accurate extrinsic parameter is finally obtained. Specifically, This method utilizes the planar features in the environment, making it possible to quickly carry out calibration. Experimental results on real-world data sets demonstrate the reliability and accuracy of our method. The codes are open-sourced on the Github website. The code link is https://github.com/OpenCalib/LiDAR2INS.

artificial intelligence, calibration, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2209.07694

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.92)
Automobiles & Trucks (0.92)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

How to Reduce Change Detection to Semantic Segmentation

Wang, Guo-Hua, Gao, Bin-Bin, Wang, Chengjie

arXiv.org Artificial IntelligenceFeb-14-2023

Change detection (CD) aims to identify changes that occur in an image pair taken different times. Prior methods devise specific networks from scratch to predict change masks in pixel-level, and struggle with general segmentation problems. In this paper, we propose a new paradigm that reduces CD to semantic segmentation which means tailoring an existing and powerful semantic segmentation network to solve CD. This new paradigm conveniently enjoys the mainstream semantic segmentation techniques to deal with general segmentation problems in CD. Hence we can concentrate on studying how to detect changes. We propose a novel and importance insight that different change types exist in CD and they should be learned separately. Based on it, we devise a module named MTF to extract the change information and fuse temporal features. MTF enjoys high interpretability and reveals the essential characteristic of CD. And most segmentation networks can be adapted to solve the CD problems with our MTF module. Finally, we propose C-3PO, a network to detect changes at pixel-level. C-3PO achieves state-of-the-art performance without bells and whistles. It is simple but effective and can be considered as a new baseline in this field. Our code is at https://github.com/DoctorKey/C-3PO.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2206.07557

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Huang, Dihe, Chen, Ying, Ding, Yikang, Liao, Jinli, Liu, Jianlin, Wu, Kai, Nie, Qiang, Liu, Yong, Wang, Chengjie, Li, Zhiheng

arXiv.org Artificial IntelligenceJan-27-2023

Bird's eye view (BEV) is widely adopted by most of the current point cloud detectors due to the applicability of well-explored 2D detection techniques. However, existing methods obtain BEV features by simply collapsing voxel or point features along the height dimension, which causes the heavy loss of 3D spatial information. To alleviate the information loss, we propose a novel point cloud detection network based on a Multi-level feature dimensionality reduction strategy, called MDRNet. In MDRNet, the Spatial-aware Dimensionality Reduction (SDR) is designed to dynamically focus on the valuable parts of the object during voxel-to-BEV feature transformation. Furthermore, the Multi-level Spatial Residuals (MSR) is proposed to fuse the multi-level spatial information in the BEV feature maps. Extensive experiments on nuScenes show that the proposed method outperforms the state-of-the-art methods. The code will be available upon publication.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.09464

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)

Add feedback

SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive Background Prototypes

Chen, Jiacheng, Gao, Bin-Bin, Lu, Zongqing, Xue, Jing-Hao, Wang, Chengjie, Liao, Qingmin

arXiv.org Artificial IntelligenceApr-28-2021

Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples in support images. Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype. However, this framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only. To address this issue, in this paper, we introduce a complementary self-contrastive task into few-shot semantic segmentation. Our new model is able to associate the pixels in a region with the prototype of this region, no matter they are in the foreground or background. To this end, we generate self-contrastive background prototypes directly from the query image, with which we enable the construction of complete sample pairs and thus a complementary and auxiliary segmentation task to achieve the training of a better segmentation model. Extensive experiments on PASCAL-5$^i$ and COCO-20$^i$ demonstrate clearly the superiority of our proposal. At no expense of inference efficiency, our model achieves state-of-the results in both 1-shot and 5-shot settings for few-shot semantic segmentation.

artificial intelligence, machine learning, prototype, (17 more...)

arXiv.org Artificial Intelligence

2104.09216

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Learning Dynamic Alignment via Meta-filter for Few-shot Learning

Xu, Chengming, Liu, Chen, Zhang, Li, Wang, Chengjie, Li, Jilin, Huang, Feiyue, Xue, Xiangyang, Fu, Yanwei

arXiv.org Artificial IntelligenceMar-24-2021

Few-shot learning (FSL), which aims to recognise new classes by adapting the learned knowledge with extremely limited few-shot (support) examples, remains an important open problem in computer vision. Most of the existing methods for feature alignment in few-shot learning only consider image-level or spatial-level alignment while omitting the channel disparity. Our insight is that these methods would lead to poor adaptation with redundant matching, and leveraging channel-wise adjustment is the key to well adapting the learned knowledge to new classes. Therefore, in this paper, we propose to learn a dynamic alignment, which can effectively highlight both query regions and channels according to different local support information. Specifically, this is achieved by first dynamically sampling the neighbourhood of the feature position conditioned on the input few shot, based on which we further predict a both position-dependent and channel-dependent Dynamic Meta-filter. The filter is used to align the query feature with position-specific and channel-specific knowledge. Moreover, we adopt Neural Ordinary Differential Equation (ODE) to enable a more accurate control of the alignment. In such a sense our model is able to better capture fine-grained semantic context of the few-shot example and thus facilitates dynamical knowledge adaptation for few-shot learning. The resulting framework establishes the new state-of-the-arts on major few-shot visual recognition benchmarks, including miniImageNet and tieredImageNet.

alignment, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2103.13582

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Lin, Chuming, Xu, Chengming, Luo, Donghao, Wang, Yabiao, Tai, Ying, Wang, Chengjie, Li, Jilin, Huang, Feiyue, Fu, Yanwei

arXiv.org Artificial IntelligenceMar-24-2021

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video.While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3. Code is available at https://github.com/TencentYoutuResearch/ActionDetection-AFSD.

artificial intelligence, machine learning, proposal, (17 more...)

arXiv.org Artificial Intelligence

2103.13137

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback