AITopics | Cao, Guanqun

Collaborating Authors

Cao, Guanqun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Muhammad, Shamsuddeen Hassan, Ousidhoum, Nedjma, Abdulmumin, Idris, Wahle, Jan Philip, Ruas, Terry, Beloucif, Meriem, de Kock, Christine, Surange, Nirmal, Teodorescu, Daniela, Ahmad, Ibrahim Said, Adelani, David Ifeoluwa, Aji, Alham Fikri, Ali, Felermino D. M. A., Alimova, Ilseyar, Araujo, Vladimir, Babakov, Nikolay, Baes, Naomi, Bucur, Ana-Maria, Bukula, Andiswa, Cao, Guanqun, Cardenas, Rodrigo Tufino, Chevi, Rendi, Chukwuneke, Chiamaka Ijeoma, Ciobotaru, Alexandra, Dementieva, Daryna, Gadanya, Murja Sani, Geislinger, Robert, Gipp, Bela, Hourrane, Oumaima, Ignat, Oana, Lawan, Falalu Ibrahim, Mabuya, Rooweither, Mahendra, Rahmad, Marivate, Vukosi, Piper, Andrew, Panchenko, Alexander, Ferreira, Charles Henrique Porto, Protasov, Vitaly, Rutunda, Samuel, Shrivastava, Manish, Udrea, Aura Cristina, Wanzare, Lilian Diana Awuor, Wu, Sophie, Wunderlich, Florian Valentin, Zhafran, Hanif Muhammad, Zhang, Tianhui, Zhou, Yi, Mohammad, Saif M.

arXiv.org Artificial IntelligenceFeb-17-2025

People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks -- significantly impacts different applications in NLP and other fields, most work in the area is focused on high-resource languages. Therefore, this has led to major disparities in research and proposed solutions, especially for low-resource languages that suffer from the lack of high-quality datasets. In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages. BRIGHTER covers predominantly low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances from various domains annotated by fluent speakers. We describe the data collection and annotation processes and the challenges of building these datasets. Then, we report different experimental results for monolingual and crosslingual multi-label emotion identification, as well as intensity-level emotion recognition. We investigate results with and without using LLMs and analyse the large variability in performance across languages and text domains. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition and discuss their impact and utility.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.11926

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.88)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding

Xia, Jingming, Cao, Guanqun, Ma, Guang, Luo, Yiben, Li, Qinzhao, Oyekan, John

arXiv.org Artificial IntelligenceFeb-1-2025

Monocular depth estimation involves predicting depth from a single RGB image and plays a crucial role in applications such as autonomous driving, robotic navigation, 3D reconstruction, etc. Recent advancements in learning-based methods have significantly improved depth estimation performance. Generative models, particularly Stable Diffusion, have shown remarkable potential in recovering fine details and reconstructing missing regions through large-scale training on diverse datasets. However, models like CLIP, which rely on textual embeddings, face limitations in complex outdoor environments where rich context information is needed. These limitations reduce their effectiveness in such challenging scenarios. Here, we propose a novel image-based semantic embedding that extracts contextual information directly from visual features, significantly improving depth prediction in complex environments. Evaluated on the KITTI and Waymo datasets, our method achieves performance comparable to state-of-the-art models while addressing the shortcomings of CLIP embeddings in handling outdoor scenes. By leveraging visual semantics directly, our method demonstrates enhanced robustness and adaptability in depth estimation tasks, showcasing its potential for application to other visual perception tasks.

artificial intelligence, image understanding, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.01666

Country: Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.88)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models

Cao, Guanqun, Mckenna, Ryan, Oyekan, John

arXiv.org Artificial IntelligenceJan-30-2025

Object rearrangement is a significant task for collaborative robots, where they are directed to manipulate objects into a specified goal state. Determining the placement of objects is a major challenge that influences the efficiency of the rearrangement process. Most current methods heavily rely on pre-collected datasets to train the model for predicting the goal position and are restricted to specific instructions, which limits their broader applicability and effectiveness.In this paper, we propose a framework of language-conditioned object rearrangement based on the Large Language Model (LLM). Particularly, our approach mimics human reasoning by using past successful experiences as a reference to infer the desired goal position. Based on LLM's strong natural language comprehension and inference ability, our method can generalise to handle various everyday objects and free-form language instructions in a zero-shot manner. Experimental results demonstrate that our methods can effectively execute the robotic rearrangement tasks, even those involving long sequential orders.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18516

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robotic Perception of Transparent Objects: A Review

Jiang, Jiaqi, Cao, Guanqun, Deng, Jiankang, Do, Thanh-Toan, Luo, Shan

arXiv.org Artificial IntelligenceOct-17-2023

Transparent object perception is a rapidly developing research problem in artificial intelligence. The ability to perceive transparent objects enables robots to achieve higher levels of autonomy, unlocking new applications in various industries such as healthcare, services and manufacturing. Despite numerous datasets and perception methods being proposed in recent years, there is still a lack of in-depth understanding of these methods and the challenges in this field. To address this gap, this article provides a comprehensive survey of the platforms and recent advances for robotic perception of transparent objects. We highlight the main challenges and propose future directions of various transparent object perception tasks, i.e., segmentation, reconstruction, and pose estimation. We also discuss the limitations of existing datasets in diversity and complexity, and the benefits of employing multi-modal sensors, such as RGB-D cameras, thermal cameras, and polarised imaging, for transparent object perception. Furthermore, we identify perception challenges in complex and dynamic environments, as well as for objects with changeable geometries. Finally, we provide an interactive online platform to navigate each reference: \url{https://sites.google.com/view/transperception}.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Artificial Intelligence

2304.00157

Country: North America > United States > Oklahoma > Beaver County (0.15)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.67)

Add feedback

Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders

Cao, Guanqun, Jiang, Jiaqi, Bollegala, Danushka, Luo, Shan

arXiv.org Artificial IntelligenceJul-14-2023

The missing signal caused by the objects being occluded or an unstable sensor is a common challenge during data collection. Such missing signals will adversely affect the results obtained from the data, and this issue is observed more frequently in robotic tactile perception. In tactile perception, due to the limited working space and the dynamic environment, the contact between the tactile sensor and the object is frequently insufficient and unstable, which causes the partial loss of signals, thus leading to incomplete tactile data. The tactile data will therefore contain fewer tactile cues with low information density. In this paper, we propose a tactile representation learning method, named TacMAE, based on Masked Autoencoder to address the problem of incomplete tactile data in tactile perception. In our framework, a portion of the tactile image is masked out to simulate the missing contact region. By reconstructing the missing signals in the tactile image, the trained model can achieve a high-level understanding of surface geometry and tactile properties from limited tactile cues. The experimental results of tactile texture recognition show that our proposed TacMAE can achieve a high recognition accuracy of 71.4% in the zero-shot transfer and 85.8% after fine-tuning, which are 15.2% and 8.2% higher than the results without using masked modeling. The extensive experiments on YCB objects demonstrate the knowledge transferability of our proposed method and the potential to improve efficiency in tactile exploration.

artificial intelligence, machine learning, tactile image, (17 more...)

arXiv.org Artificial Intelligence

2307.07358

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multimodal Zero-Shot Learning for Tactile Texture Recognition

Cao, Guanqun, Jiang, Jiaqi, Bollegala, Danushka, Li, Min, Luo, Shan

arXiv.org Artificial IntelligenceJun-22-2023

Tactile sensing plays an irreplaceable role in robotic material recognition. It enables robots to distinguish material properties such as their local geometry and textures, especially for materials like textiles. However, most tactile recognition methods can only classify known materials that have been touched and trained with tactile data, yet cannot classify unknown materials that are not trained with tactile data. To solve this problem, we propose a tactile zero-shot learning framework to recognise unknown materials when they are touched for the first time without requiring training tactile samples. The visual modality, providing tactile cues from sight, and semantic attributes, giving high-level characteristics, are combined together to bridge the gap between touched classes and untouched classes. A generative model is learnt to synthesise tactile features according to corresponding visual images and semantic embeddings, and then a classifier can be trained using the synthesised tactile features of untouched materials for zero-shot recognition. Extensive experiments demonstrate that our proposed multimodal generative model can achieve a high recognition accuracy of 83.06% in classifying materials that were not touched before. The robotic experiment demo and the dataset are available at https://sites.google.com/view/multimodalzsl.

artificial intelligence, machine learning, tactile feature, (15 more...)

arXiv.org Artificial Intelligence

2306.12705

Country:

Asia > China (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Multiclass classification for multidimensional functional data through deep neural networks

Wang, Shuoyang, Cao, Guanqun

arXiv.org Artificial IntelligenceMay-23-2023

The intrinsically infinite-dimensional features of the functional observations over multidimensional domains render the standard classification methods effectively inapplicable. To address this problem, we introduce a novel multiclass functional deep neural network (mfDNN) classifier as an innovative data mining and classification tool. Specifically, we consider sparse deep neural network architecture with rectifier linear unit (ReLU) activation function and minimize the cross-entropy loss in the multiclass classification setup. This neural network architecture allows us to employ modern computational tools in the implementation. The convergence rates of the misclassification risk functions are also derived for both fully observed and discretely observed multidimensional functional data. We demonstrate the performance of mfDNN on simulated data and several benchmark datasets from different application domains.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.13349

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Vis2Hap: Vision-based Haptic Rendering by Cross-modal Generation

Cao, Guanqun, Jiang, Jiaqi, Mao, Ningtao, Bollegala, Danushka, Li, Min, Luo, Shan

arXiv.org Artificial IntelligenceJan-17-2023

To assist robots in teleoperation tasks, haptic rendering which allows human operators access a virtual touch feeling has been developed in recent years. Most previous haptic rendering methods strongly rely on data collected by tactile sensors. However, tactile data is not widely available for robots due to their limited reachable space and the restrictions of tactile sensors. To eliminate the need for tactile data, in this paper we propose a novel method named as Vis2Hap to generate haptic rendering from visual inputs that can be obtained from a distance without physical interaction. We take the surface texture of objects as key cues to be conveyed to the human operator. To this end, a generative model is designed to simulate the roughness and slipperiness of the object's surface. To embed haptic cues in Vis2Hap, we use height maps from tactile sensors and spectrograms from friction coefficients as the intermediate outputs of the generative model. Once Vis2Hap is trained, it can be used to generate height maps and spectrograms of new surface textures, from which a friction image can be obtained and displayed on a haptic display. The user study demonstrates that our proposed Vis2Hap method enables users to access a realistic haptic feeling similar to that of physical objects. The proposed vision-based haptic rendering has the potential to enhance human operators' perception of the remote environment and facilitate robotic manipulation.

artificial intelligence, friction coefficient, haptic rendering, (16 more...)

arXiv.org Artificial Intelligence

2301.06826

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Deep Neural Network Classifier for Multi-dimensional Functional Data

Wang, Shuoyang, Cao, Guanqun, Shang, Zuofeng

arXiv.org Machine LearningMay-17-2022

We propose a new approach, called as functional deep neural network (FDNN), for classifying multi-dimensional functional data. Specifically, a deep neural network is trained based on the principle components of the training data which shall be used to predict the class label of a future data function. Unlike the popular functional discriminant analysis approaches which rely on Gaussian assumption, the proposed FDNN approach applies to general non-Gaussian multi-dimensional functional data. Moreover, when the log density ratio possesses a locally connected functional modular structure, we show that FDNN achieves minimax optimality. The superiority of our approach is demonstrated through both simulated and real-world datasets.

artificial intelligence, machine learning, multi-dimensional functional data, (1 more...)

arXiv.org Machine Learning

2205.08592

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Estimation of the Mean Function of Functional Data via Deep Neural Networks

Wang, Shuoyang, Cao, Guanqun, Shang, Zuofeng

arXiv.org Machine LearningDec-8-2020

In this work, we propose a deep neural network method to perform nonparametric regression for functional data. The proposed estimators are based on sparsely connected deep neural networks with ReLU activation function. By properly choosing network architecture, our estimator achieves the optimal nonparametric convergence rate in empirical norm. Under certain circumstances such as trigonometric polynomial kernel and a sufficiently large sampling frequency, the convergence rate is even faster than root-$n$ rate. Through Monte Carlo simulation studies we examine the finite-sample performance of the proposed method. Finally, the proposed method is applied to analyze positron emission tomography images of patients with Alzheimer disease obtained from the Alzheimer Disease Neuroimaging Initiative database.

deep learning, estimator, neural network, (19 more...)

arXiv.org Machine Learning

2012.04573

Country: North America > United States (0.48)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.88)
Health & Medicine > Diagnostic Medicine > Imaging (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback