AITopics | Shen, Yanyan

Collaborating Authors

Shen, Yanyan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

Fang, Jingzhi, Shen, Yanyan, Wang, Yue, Chen, Lei

arXiv.org Artificial IntelligenceMar-21-2025

As large language models (LLMs) have shown great success in many tasks, they are used in various applications. While a lot of works have focused on the efficiency of single-LLM application (e.g., offloading, request scheduling, parallelism strategy selection), multi-LLM applications receive less attention, particularly in offline inference scenarios. In this work, we aim to improve the offline end-to-end inference efficiency of multi-LLM applications in the single-node multi-GPU environment. The problem involves two key decisions: (1) determining which LLMs to run concurrently each time (we may not run all the models at the same time), and (2) selecting a parallelism strategy to use for each LLM. This problem is NP-hard. Naive solutions may not work well because the running time for a model to complete a set of requests depends on the request workload and the selected parallelism strategy, and they lack an accurate model of the running time. As the LLM output lengths are unknown before running, to estimate the model running time, we propose a sampling-then-simulation method which first estimates the output lengths by sampling from an empirical cumulative function we obtained from a large dataset in advance, and then simulates the LLM inference process accordingly. Based on the simulation, we estimate the per-iteration latencys to get the total latency. A greedy method is proposed to optimize the scheduling of the LLMs in the application across the GPUs. We then propose a framework SamuLLM which contains two phases: planning, which calls the greedy method for an application and running, which runs the application and dynamically adjust the model scheduling based on the runtime information. Experiments on 3 applications and a mixed application show that SamuLLM can achieve 1.0-2.4$\times$ end-to-end speedups compared to the competitors.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.16893

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Austria (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting

Zhao, Lifan, Shen, Yanyan

arXiv.org Machine LearningDec-16-2024

Time series forecasting always faces the challenge of concept drift, where data distributions evolve over time, leading to a decline in forecast model performance. Existing solutions are based on online learning, which continually organize recent time series observations as new training samples and update model parameters according to the forecasting feedback on recent data. However, they overlook a critical issue: obtaining ground-truth future values of each sample should be delayed until after the forecast horizon. This delay creates a temporal gap between the training samples and the test sample. Our empirical analysis reveals that the gap can introduce concept drift, causing forecast models to adapt to outdated concepts. In this paper, we present \textsc{Proceed}, a novel proactive model adaptation framework for online time series forecasting. \textsc{Proceed} first estimates the concept drift between the recently used training samples and the current test sample. It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments, proactively adapting the model to the test sample. To enhance the generalization capability of the framework, \textsc{Proceed} is trained on synthetic diverse concept drifts. Extensive experiments on five real-world datasets across various forecast models demonstrate that \textsc{Proceed} brings more performance improvements than the state-of-the-art online learning methods, significantly facilitating forecast models' resilience against concept drifts. Code is available at \url{https://github.com/SJTU-DMTai/OnlineTSF}.

concept drift, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

2412.08435

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

When UAV Meets Federated Learning: Latency Minimization via Joint Trajectory Design and Resource Allocation

Zhang, Xuhui, Liu, Wenchao, Ren, Jinke, Xing, Huijun, Gui, Gui, Shen, Yanyan, Cui, Shuguang

arXiv.org Artificial IntelligenceDec-10-2024

Federated learning (FL) has emerged as a pivotal solution for training machine learning models over wireless networks, particularly for Internet of Things (IoT) devices with limited computation resources. Despite its benefits, the efficiency of FL is often restricted by the communication quality between IoT devices and the central server. To address this issue, we introduce an innovative approach by deploying an unmanned aerial vehicle (UAV) as a mobile FL server to enhance the training process of FL. By leveraging the UAV's maneuverability, we establish robust line-of-sight connections with IoT devices, significantly improving communication capacity. To improve the overall training efficiency, we formulate a latency minimization problem by jointly optimizing the bandwidth allocation, computing frequencies, transmit power for both the UAV and IoT devices, and the UAV's trajectory. Then, an efficient alternating optimization algorithm is developed to solve it efficiently. Furthermore, we analyze the convergence and computational complexity of the proposed algorithm. Finally, numerical results demonstrate that our proposed scheme not only outperforms existing benchmark schemes in terms of latency but also achieves training efficiency that closely approximate the ideal scenario.

artificial intelligence, iot device, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.07428

Country: Asia > China (0.69)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Information Technology > Robotics & Automation (0.48)
Aerospace & Defense > Aircraft (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

NeurDB: An AI-powered Autonomous Data System

Ooi, Beng Chin, Cai, Shaofeng, Chen, Gang, Shen, Yanyan, Tan, Kian-Lee, Wu, Yuncheng, Xiao, Xiaokui, Xing, Naili, Yue, Cong, Zeng, Lingze, Zhang, Meihui, Zhao, Zhanhao

arXiv.org Artificial IntelligenceJul-4-2024

In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.

data mining, information retrieval, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2405.03924

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Dong, Yihang, Chen, Xuhang, Shen, Yanyan, Ng, Michael Kwok-Po, Qian, Tao, Wang, Shuqiang

arXiv.org Artificial IntelligenceMay-28-2024

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.

artificial intelligence, emotion recognition, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.19373

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators

Zhao, Lifan, Shen, Yanyan

arXiv.org Artificial IntelligenceJan-30-2024

Recently, channel-independent methods have achieved state-of-the-art performance in multivariate time series (MTS) forecasting. Despite reducing overfitting risks, these methods miss potential opportunities in utilizing channel dependence for accurate predictions. We argue that there exist locally stationary lead-lag relationships between variates, i.e., some lagged variates may follow the leading indicators within a short time period. Exploiting such channel dependence is beneficial since leading indicators offer advance information that can be used to reduce the forecasting difficulty of the lagged variates. In this paper, we propose a new method named LIFT that first efficiently estimates leading indicators and their leading steps at each time step and then judiciously allows the lagged variates to utilize the advance information from leading indicators. LIFT plays as a plugin that can be seamlessly collaborated with arbitrary time series forecasting methods. Extensive experiments on six real-world datasets demonstrate that LIFT improves the state-of-the-art methods by 5.5% in average forecasting performance. Multivariate time series (MTS) forecasting, one of the most popular research topics, is a fundamental task in various domains such as weather, traffic, and finance. An MTS consists of multiple channels (a.k.a., variates Many MTS forecasting researches argue each channel has dependence on other channels.

artificial intelligence, machine learning, variate, (16 more...)

arXiv.org Artificial Intelligence

2401.17548

Country: North America > United States > California (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Industry:

Energy (0.93)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.85)

Add feedback

Towards Fine-Grained Explainability for Heterogeneous Graph Neural Network

Li, Tong, Deng, Jiale, Shen, Yanyan, Qiu, Luyu, Huang, Yongxiang, Cao, Caleb Chen

arXiv.org Artificial IntelligenceDec-23-2023

Recently, Their goal is to learn or search for optimal graph objects that heterogeneous graph neural networks (HGNs) have maximize mutual information with the predictions. While become one of the standard paradigms for modeling rich such explanations answer the question "what is salient to semantics of heterogeneous graphs in various application the prediction", they fail to unveil "how the salient objects domains such as e-commerce, finance, and healthcare (Lv affect the prediction". In particular, there may exist multiple et al. 2021; Wang et al. 2022). In parallel with the proliferation paths in the graph to propagate the information of the salient of HGNs, understanding the reasons behind the objects to the target object and affect its prediction. Without predictions from HGNs is urgently demanded in order to distinguishing these different influential paths, the answer to build trust and confidence in the models for both users and the "how" question remains unclear, which could compromise stakeholders. For example, a customer would be satisfied if the utility of the explanation. This issue becomes more an HGN-based recommender system accompanies recommended prominent when it comes to explaining HGNs due to the items with explanations; a bank manager may want complex semantics of heterogeneous graphs.

artificial intelligence, explanation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2312.15237

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation

Li, Jia, Shen, Yanyan, Chen, Lei, NG, Charles Wang Wai

arXiv.org Artificial IntelligenceNov-26-2023

The acquisition of accurate rainfall distribution in space is an important task in hydrological analysis and natural disaster pre-warning. However, it is impossible to install rain gauges on every corner. Spatial interpolation is a common way to infer rainfall distribution based on available raingauge data. However, the existing works rely on some unrealistic pre-settings to capture spatial correlations, which limits their performance in real scenarios. To tackle this issue, we propose the SSIN, which is a novel data-driven self-supervised learning framework for rainfall spatial interpolation by mining latent spatial patterns from historical observation data. Inspired by the Cloze task and BERT, we fully consider the characteristics of spatial interpolation and design the SpaFormer model based on the Transformer architecture as the core of SSIN. Our main idea is: by constructing rich self-supervision signals via random masking, SpaFormer can learn informative embeddings for raw data and then adaptively model spatial correlations based on rainfall spatial context. Extensive experiments on two real-world raingauge datasets show that our method outperforms the state-of-the-art solutions. In addition, we take traffic spatial interpolation as another use case to further explore the performance of our method, and SpaFormer achieves the best performance on one large real-world traffic dataset, which further confirms the effectiveness and generality of our method.

artificial intelligence, machine learning, spatial interpolation, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3589321

2311.1553

Country:

Asia > China (0.30)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Wang, Liping, Li, Jiawei, Zhao, Lifan, Kou, Zhizhuo, Wang, Xiaohan, Zhu, Xinyi, Wang, Hao, Shen, Yanyan, Chen, Lei

arXiv.org Artificial IntelligenceAug-9-2023

Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.

data mining, knowledge management, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2308.04947

Country:

North America > United States (0.67)
Asia > China > Guangdong Province (0.15)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Add feedback

DoubleAdapt: A Meta-learning Approach to Incremental Learning for Stock Trend Forecasting

Zhao, Lifan, Kong, Shuming, Shen, Yanyan

arXiv.org Artificial IntelligenceJul-17-2023

Stock trend forecasting is a fundamental task of quantitative investment where precise predictions of price trends are indispensable. As an online service, stock data continuously arrive over time. It is practical and efficient to incrementally update the forecast model with the latest data which may reveal some new patterns recurring in the future stock market. However, incremental learning for stock trend forecasting still remains under-explored due to the challenge of distribution shifts (a.k.a. concept drifts). With the stock market dynamically evolving, the distribution of future data can slightly or significantly differ from incremental data, hindering the effectiveness of incremental updates. To address this challenge, we propose DoubleAdapt, an end-to-end framework with two adapters, which can effectively adapt the data and the model to mitigate the effects of distribution shifts. Our key insight is to automatically learn how to adapt stock data into a locally stationary distribution in favor of profitable updates. Complemented by data adaptation, we can confidently adapt the model parameters under mitigated distribution shifts. We cast each incremental learning task as a meta-learning task and automatically optimize the adapters for desirable data adaptation and parameter initialization. Experiments on real-world stock datasets demonstrate that DoubleAdapt achieves state-of-the-art predictive performance and shows considerable efficiency.

artificial intelligence, distribution shift, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.09862

Country: North America > United States (0.70)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback