AITopics | Wu, Yu

Collaborating Authors

Wu, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EAPCR: A Universal Feature Extractor for Scientific Data without Explicit Feature Relation Patterns

Yu, Zhuohang, An, Ling, Li, Yansong, Wu, Yu, Dong, Zeyu, Liu, Zhangdi, Gao, Le, Zhang, Zhenyu, Zhou, Chichun

arXiv.org Artificial IntelligenceNov-12-2024

Conventional methods, including Decision Tree (DT)-based methods, have been effective in scientific tasks, such as non-image medical diagnostics, system anomaly detection, and inorganic catalysis efficiency prediction. However, most deep-learning techniques have struggled to surpass or even match this level of success as traditional machine-learning methods. The primary reason is that these applications involve multi-source, heterogeneous data where features lack explicit relationships. This contrasts with image data, where pixels exhibit spatial relationships; textual data, where words have sequential dependencies; and graph data, where nodes are connected through established associations. The absence of explicit Feature Relation Patterns (FRPs) presents a significant challenge for deep learning techniques in scientific applications that are not image, text, and graph-based. In this paper, we introduce EAPCR, a universal feature extractor designed for data without explicit FRPs. Tested across various scientific tasks, EAPCR consistently outperforms traditional methods and bridges the gap where deep learning models fall short. To further demonstrate its robustness, we synthesize a dataset without explicit FRPs. While Kolmogorov-Arnold Network (KAN) and feature extractors like Convolutional Neural Networks (CNNs), Graph Convolutional Networks (GCNs), and Transformers struggle, EAPCR excels, demonstrating its robustness and superior performance in scientific tasks without FRPs.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.08164

Country:

North America > Canada > Ontario (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization

Huang, Huayang, Wu, Yu, Wang, Qian

arXiv.org Artificial IntelligenceNov-6-2024

Watermarking generative content serves as a vital tool for authentication, ownership protection, and mitigation of potential misuse. Existing watermarking methods face the challenge of balancing robustness and concealment. They empirically inject a watermark that is both invisible and robust and passively achieve concealment by limiting the strength of the watermark, thus reducing the robustness. In this paper, we propose to explicitly introduce a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks. To be specific, we implant a robust watermark in an intermediate diffusion state and then guide the model to hide the watermark in the final generated image. We employ an adversarial optimization algorithm to produce the optimal hiding prompt guiding signal for each watermark. The prompt embedding is optimized to minimize artifacts in the generated image, while the watermark is optimized to achieve maximum strength. The watermark can be verified by reversing the generation process. Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering and shows superior invisibility compared to other state-of-the-art robust watermarking methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.03862

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Li, Yansong, Dong, Zeyu, Luo, Ertai, Wu, Yu, Wu, Shuo, Han, Shuo

arXiv.org Artificial IntelligenceOct-15-2024

Reinforcement learning (RL) algorithms can be divided into two classes: model-free algorithms, which are sample-inefficient, and model-based algorithms, which suffer from model bias. Dyna-style algorithms combine these two approaches by using simulated data from an estimated environmental model to accelerate model-free training. However, their efficiency is compromised when the estimated model is inaccurate. Previous works address this issue by using model ensembles or pretraining the estimated model with data collected from the real environment, increasing computational and sample complexity. To tackle this issue, we introduce an out-of-distribution (OOD) data filter that removes simulated data from the estimated model that significantly diverges from data collected in the real environment. We show theoretically that this technique enhances the quality of simulated data. With the help of the OOD data filter, the data simulated from the estimated model better mimics the data collected by interacting with the real model. This improvement is evident in the critic updates compared to using the simulated data without the OOD data filter. Our experiment integrates the data filter into the model-based policy optimization (MBPO) algorithm. The results demonstrate that our method requires fewer interactions with the real environment to achieve a higher level of optimality than MBPO, even without a model ensemble.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.1216

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

StatioCL: Contrastive Learning for Time Series via Non-Stationary and Temporal Contrast

Wu, Yu, Dang, Ting, Spathis, Dimitris, Jia, Hong, Mascolo, Cecilia

arXiv.org Artificial IntelligenceOct-13-2024

Contrastive learning (CL) has emerged as a promising approach for representation learning in time series data by embedding similar pairs closely while distancing dissimilar ones. However, existing CL methods often introduce false negative pairs (FNPs) by neglecting inherent characteristics and then randomly selecting distinct segments as dissimilar pairs, leading to erroneous representation learning, reduced model performance, and overall inefficiency. To address these issues, we systematically define and categorize FNPs in time series into semantic false negative pairs and temporal false negative pairs for the first time: the former arising from overlooking similarities in label categories, which correlates with similarities in non-stationarity and the latter from neglecting temporal proximity. Moreover, we introduce StatioCL, a novel CL framework that captures non-stationarity and temporal dependency to mitigate both FNPs and rectify the inaccuracies in learned representations. By interpreting and differentiating non-stationary states, which reflect the correlation between trends or temporal dynamics with underlying data patterns, StatioCL effectively captures the semantic characteristics and eliminates semantic FNPs. Simultaneously, StatioCL establishes fine-grained similarity levels based on temporal dependencies to capture varying temporal proximity between segments and to mitigate temporal FNPs. Evaluated on real-world benchmark time series classification datasets, StatioCL demonstrates a substantial improvement over state-of-the-art CL methods, achieving a 2.9% increase in Recall and a 19.2% reduction in FNPs. Most importantly, StatioCL also shows enhanced data efficiency and robustness against label scarcity.

artificial intelligence, machine learning, temporal reasoning, (15 more...)

arXiv.org Artificial Intelligence

2410.10048

Country:

North America > United States (0.30)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.54)

Add feedback

Learning To Help: Training Models to Assist Legacy Devices

Wu, Yu, Sarwate, Anand

arXiv.org Artificial IntelligenceSep-24-2024

Machine learning models implemented in hardware on physical devices may be deployed for a long time. The computational abilities of the device may be limited and become outdated with respect to newer improvements. Because of the size of ML models, offloading some computation (e.g. to an edge cloud) can help such legacy devices. We cast this problem in the framework of learning with abstention (LWA) in which the expert (edge) must be trained to assist the client (device). Prior work on LWA trains the client assuming the edge is either an oracle or a human expert. In this work, we formalize the reverse problem of training the expert for a fixed (legacy) client. As in LWA, the client uses a rejection rule to decide when to offload inference to the expert (at a cost). We find the Bayes-optimal rule, prove a generalization bound, and find a consistent surrogate loss function. Empirical results show that our framework outperforms confidence-based rejection rules.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.16253

Country: North America > United States > New Jersey > Middlesex County > Piscataway (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Zhang, Yuwei, Xia, Tong, Han, Jing, Wu, Yu, Rizos, Georgios, Liu, Yang, Mosuily, Mohammed, Chauhan, Jagmohan, Mascolo, Cecilia

arXiv.org Artificial IntelligenceJun-23-2024

Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets ( 136K samples, 440 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from https://github.

data quality, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.16148

Country:

Europe > United Kingdom (0.28)
Europe > Portugal (0.28)
Asia > Middle East > Republic of Türkiye > Hatay Province (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(3 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Quality (0.93)

Add feedback

From Space-Time to Space-Order: Directly Planning a Temporal Planning Graph by Redefining CBS

Wu, Yu, Veerapaneni, Rishi, Li, Jiaoyang, Likhachev, Maxim

arXiv.org Artificial IntelligenceApr-23-2024

The majority of multi-agent path finding (MAPF) methods compute collision-free space-time paths which require agents to be at a specific location at a specific discretized timestep. However, executing these space-time paths directly on robotic systems is infeasible due to real-time execution differences (e.g. delays) which can lead to collisions. To combat this, current methods translate the space-time paths into a temporal plan graph (TPG) that only requires that agents observe the order in which they navigate through locations where their paths cross. However, planning space-time paths and then post-processing them into a TPG does not reduce the required agent-to-agent coordination, which is fixed once the space-time paths are computed. To that end, we propose a novel algorithm Space-Order CBS that can directly plan a TPG and explicitly minimize coordination. Our main theoretical insight is our novel perspective on viewing a TPG as a set of space-visitation order paths where agents visit locations in relative orders (e.g. 1st vs 2nd) as opposed to specific timesteps. We redefine unique conflicts and constraints for adapting CBS for space-order planning. We experimentally validate how Space-Order CBS can return TPGs which significantly reduce coordination, thus subsequently reducing the amount of agent-agent communication and leading to more robustness to delays during execution.

agent, artificial intelligence, planning & scheduling, (17 more...)

arXiv.org Artificial Intelligence

2404.15137

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Zhao, Tianhao, Chen, Yongcan, Wu, Yu, Liu, Tianyang, Du, Bo, Xiao, Peilun, Qiu, Shi, Yang, Hongda, Li, Guozhen, Yang, Yi, Lin, Yutian

arXiv.org Artificial IntelligenceApr-2-2024

Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.

artificial intelligence, image understanding, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.01925

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.89)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)
(2 more...)

Add feedback

A Survey on Trustworthy Edge Intelligence: From Security and Reliability To Transparency and Sustainability

Wang, Xiaojie, Wang, Beibei, Wu, Yu, Ning, Zhaolong, Guo, Song, Yu, Fei Richard

arXiv.org Artificial IntelligenceJan-25-2024

Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial Intelligence (AI) to push the capabilities of AI to the network edge for real-time, efficient and secure intelligent decision-making and computation. However, EI faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications, which together affect the trustworthiness of EI in the eyes of stakeholders. This survey comprehensively summarizes the characteristics, architecture, technologies, and solutions of trustworthy EI. Specifically, we first emphasize the need for trustworthy EI in the context of the trend toward large models. We then provide an initial definition of trustworthy EI, explore its key characteristics and give a multi-layered architecture for trustworthy EI. Then, we summarize several important issues that hinder the achievement of trustworthy EI. Subsequently, we present enabling technologies for trustworthy EI systems and provide an in-depth literature review of the state-of-the-art solutions for realizing the trustworthiness of EI. Finally, we discuss the corresponding research challenges and open issues.

machine learning, real time system, reinforcement learning, (24 more...)

arXiv.org Artificial Intelligence

2310.17944

Country:

Asia > China > Hong Kong (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.87)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(7 more...)

Add feedback

WavMark: Watermarking for Audio Generation

Chen, Guangyu, Wu, Yu, Liu, Shujie, Liu, Tao, Du, Xiaoyong, Wei, Furu

arXiv.org Artificial IntelligenceJan-7-2024

Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic data, watermarking presents a proactive and robust defence mechanism against these looming risks. This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet. The watermark is imperceptible to human senses and exhibits strong resilience against various attacks. It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection. Moreover, this framework boasts high flexibility, allowing for the combination of multiple watermark segments to achieve heightened robustness and expanded capacity. Utilizing 10 to 20-second audio as the host, our approach demonstrates an average Bit Error Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over 2800\% in BER compared to the state-of-the-art watermarking tool. See https://aka.ms/wavmark for demos of our work.

artificial intelligence, machine learning, robustness, (19 more...)

arXiv.org Artificial Intelligence

2308.1277

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback