AITopics | Tian, Yu

Collaborating Authors

Tian, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Novel Approach to WaveNet Architecture for RF Signal Separation with Learnable Dilation and Data Augmentation

Tian, Yu, Alhammadi, Ahmed, Quran, Abdullah, Ali, Abubakar Sani

arXiv.org Artificial IntelligenceFeb-8-2024

ABSTRACT In this paper, we address the intricate issue of RF signal separation by presenting a novel adaptation of the WaveNet architecture that introduces learnable dilation parameters, significantly enhancing signal separation in dense RF spectrums. Our focused architectural refinements and innovative data augmentation strategies have markedly improved the model's ability to discern complex signal sources. This paper details our comprehensive methodology, including the refined model architecture, data preparation techniques, and the strategic training strategy that have been pivotal to our success. The efficacy of our approach is evidenced by the substantial improvements recorded: a 58.82% increase in SINR at a BER of 10 Notably, our model achieved first place in the challenge [1], demonstrating its Figure 1: Modified Wavenet with Learnable Dilation and superior performance and establishing a new standard for Padding machine learning applications within the RF communications domain. Index Terms-- Radio Frequency Signal Separation, Machine Learning, WaveNet Architecture, Learnable Dilation, Data Augmentation 1. INTRODUCTION The co-channel signal separation in the crowded radiofrequency Figure 1: An Illustration of Learnable Dilation Rate (RF) spectrum is a crucial task for enabling various wireless systems to operate simultaneously.

artificial intelligence, machine learning, survey article, (11 more...)

arXiv.org Artificial Intelligence

2402.09461

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.17)

Genre:

Research Report > Promising Solution (0.41)
Overview > Innovation (0.41)

Industry: Media (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evil Geniuses: Delving into the Safety of LLM-based Agents

Tian, Yu, Yang, Xiao, Zhang, Jingyuan, Dong, Yinpeng, Su, Hang

arXiv.org Artificial IntelligenceFeb-2-2024

Rapid advancements in large language models (LLMs) have revitalized in LLM-based agents, exhibiting impressive human-like behaviors and cooperative capabilities in various scenarios. However, these agents also bring some exclusive risks, stemming from the complexity of interaction environments and the usability of tools. This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level. Specifically, we initially propose to employ a template-based attack strategy on LLM-based agents to find the influence of agent quantity. In addition, to address interaction environment and role specificity issues, we introduce Evil Geniuses (EG), an effective attack method that autonomously generates prompts related to the original role to examine the impact across various role definitions and attack levels. EG leverages Red-Blue exercises, significantly improving the generated prompt aggressiveness and similarity to original roles. Our evaluations on CAMEL, Metagpt and ChatDev based on GPT-3.5 and GPT-4, demonstrate high success rates. Extensive evaluation and discussion reveal that these agents are less robust, prone to more harmful behaviors, and capable of generating stealthier content than LLMs, highlighting significant safety challenges and guiding future research. Our code is available at https://github.com/T1aNS1R/Evil-Geniuses.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.11855

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Large Generative AI Models for Telecom: The Next Big Thing?

Bariah, Lina, Zhao, Qiyang, Zou, Hang, Tian, Yu, Bader, Faouzi, Debbah, Merouane

arXiv.org Artificial IntelligenceDec-23-2023

The evolution of generative artificial intelligence (GenAI) constitutes a turning point in reshaping the future of technology in different aspects. Wireless networks in particular, with the blooming of self-evolving networks, represent a rich field for exploiting GenAI and reaping several benefits that can fundamentally change the way how wireless networks are designed and operated nowadays. To be specific, large GenAI models are envisioned to open up a new era of autonomous wireless networks, in which multi-modal GenAI models trained over various Telecom data, can be fine-tuned to perform several downstream tasks, eliminating the need for building and training dedicated AI models for each specific task and paving the way for the realization of artificial general intelligence (AGI)-empowered wireless networks. In this article, we aim to unfold the opportunities that can be reaped from integrating large GenAI models into the Telecom domain. In particular, we first highlight the applications of large GenAI models in future wireless networks, defining potential use-cases and revealing insights on the associated theoretical and practical challenges. Furthermore, we unveil how 6G can open up new opportunities through connecting multiple on-device large GenAI models, and hence, paves the way to the collective intelligence paradigm. Finally, we put a forward-looking vision on how large GenAI models will be the key to realize self-evolving networks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.10249

Country:

North America > Canada > Alberta > Census Division No. 8 > Red Deer County (0.24)
North America > Canada > Alberta > Census Division No. 7 > Stettler County No. 6 (0.24)
North America > Canada > Alberta > Census Division No. 5 > Starland County (0.24)
North America > Canada > Alberta > Census Division No. 5 > Kneehill County (0.24)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Information Technology (0.68)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Fairness-Driven Optimization of RIS-Augmented 5G Networks for Seamless 3D UAV Connectivity Using DRL Algorithms

Tian, Yu, Alhammadi, Ahmed, He, Jiguang, Fakhreddine, Aymen, Bader, Faouzi

arXiv.org Artificial IntelligenceNov-14-2023

In this paper, we study the problem of joint active and passive beamforming for reconfigurable intelligent surface (RIS)-assisted massive multiple-input multiple-output systems towards the extension of the wireless cellular coverage in 3D, where multiple RISs, each equipped with an array of passive elements, are deployed to assist a base station (BS) to simultaneously serve multiple unmanned aerial vehicles (UAVs) in the same time-frequency resource of 5G wireless communications. With a focus on ensuring fairness among UAVs, our objective is to maximize the minimum signal-to-interference-plus-noise ratio (SINR) at UAVs by jointly optimizing the transmit beamforming parameters at the BS and phase shift parameters at RISs. We propose two novel algorithms to address this problem. The first algorithm aims to mitigate interference by calculating the BS beamforming matrix through matrix inverse operations once the phase shift parameters are determined. The second one is based on the principle that one RIS element only serves one UAV and the phase shift parameter of this RIS element is optimally designed to compensate the phase offset caused by the propagation and fading. To obtain the optimal parameters, we utilize one state-of-the-art reinforcement learning algorithm, deep deterministic policy gradient, to solve these two optimization problems. Simulation results are provided to illustrate the effectiveness of our proposed solution and some insightful remarks are observed.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2312.0942

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry: Telecommunications (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

Add feedback

How Robust is Google's Bard to Adversarial Image Attacks?

Dong, Yinpeng, Chen, Huanran, Chen, Jiawei, Fang, Zhengwei, Yang, Xiao, Zhang, Yichi, Tian, Yu, Su, Hang, Zhu, Jun

arXiv.org Artificial IntelligenceOct-14-2023

Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard. Update: GPT-4V is available at October 2023. We further evaluate its robustness under the same set of adversarial examples, achieving a 45% attack success rate.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.11751

Country: Asia (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety (0.34)
Information Technology > Security & Privacy (0.31)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Just Noticeable Difference Modeling for Face Recognition System

Tian, Yu, Ni, Zhangkai, Chen, Baoliang, Wang, Shurun, Wang, Shiqi, Wang, Hanli, Kwong, Sam

arXiv.org Artificial IntelligenceSep-28-2023

High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.

artificial intelligence, face recognition system, just noticeable difference modeling

arXiv.org Artificial Intelligence

2209.05856

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

Yu, Haichao, Tian, Yu, Kumar, Sateesh, Yang, Linjie, Wang, Heng

arXiv.org Artificial IntelligenceSep-27-2023

The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.

artificial intelligence, data filtering, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2309.15954

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction

Tian, Yu, Zhao, Qiyang, Kherroubi, Zine el abidine, Boukhalfa, Fouzi, Wu, Kebin, Bader, Faouzi

arXiv.org Artificial IntelligenceSep-21-2023

Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from a sequence of images, point clouds, and radar raw data sampled over time. At each convolutional layer, we use transformer encoders to learn the hidden relations between feature tokens from different modalities and time instances over abstraction space and produce encoded vectors for the next-level feature extraction. We train the model on a combination of different modalities with supervised learning. We try to enhance the model over imbalanced data by utilizing focal loss and exponential moving average. We also evaluate data processing and augmentation techniques such as image enhancement, segmentation, background filtering, multimodal data flipping, radar signal transformation, and GPS angle calibration. Experimental results show that our solution trained on image and GPS data produces the best distance-based accuracy of predicted beams at 78.44%, with effective generalization to unseen day scenarios near 73% and night scenarios over 84%. This outperforms using other modalities and arbitrary data processing techniques, which demonstrates the effectiveness of transformers with feature fusion in performing radio beam prediction from images and GPS. Furthermore, our solution could be pretrained from large sequences of multimodality wireless data, on fine-tuning for multiple downstream radio network tasks.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2309.11811

Country:

Europe > France (0.47)
Europe > Spain (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.84)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.93)
Information Technology (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification

Chen, Yuanhong, Liu, Fengbei, Wang, Hu, Wang, Chong, Tian, Yu, Liu, Yuyuan, Carneiro, Gustavo

arXiv.org Artificial IntelligenceAug-9-2023

Deep learning methods have shown outstanding classification accuracy in medical imaging problems, which is largely attributed to the availability of large-scale datasets manually annotated with clean labels. However, given the high cost of such manual annotation, new medical imaging classification problems may need to rely on machine-generated noisy labels extracted from radiology reports. Indeed, many Chest X-ray (CXR) classifiers have already been modelled from datasets with noisy labels, but their training procedure is in general not robust to noisy-label samples, leading to sub-optimal models. Furthermore, CXR datasets are mostly multi-label, so current noisy-label learning methods designed for multi-class problems cannot be easily adapted. In this paper, we propose a new method designed for the noisy multi-label CXR learning, which detects and smoothly re-labels samples from the dataset, which is then used to train common multi-label classifiers. The proposed method optimises a bag of multi-label descriptors (BoMD) to promote their similarity with the semantic descriptors produced by BERT models from the multi-label image annotation. Our experiments on diverse noisy multi-label training sets and clean testing sets show that our model has state-of-the-art accuracy and robustness in many CXR multi-label classification benchmarks.

artificial intelligence, descriptor, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2203.01937

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Wu, Cheng-En, Tian, Yu, Yu, Haichao, Wang, Heng, Morgado, Pedro, Hu, Yu Hen, Yang, Linjie

arXiv.org Artificial IntelligenceJul-22-2023

Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.

artificial intelligence, machine learning, robustness, (16 more...)

arXiv.org Artificial Intelligence

2307.11978

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback