Not enough data to create a plot.
Try a different view from the menu above.
Wang, Dan
Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation
Wang, Wenxuan, Wu, Kai, Li, Yujian Betterest, Wang, Dan, Zhang, Xiaoyu, Liu, Jing
Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as data scarcity and data imbalance continue to hinder their development. To address this, we consider modeling complex systems through symbolic expressions that serve as semantic descriptors of time series. Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations. Leveraging the S2 dataset, we develop SymTime, a pre-trained foundation model for TSA. SymTime demonstrates competitive performance across five major TSA tasks when fine-tuned with downstream task, rivaling foundation models pre-trained on real-world datasets. This approach underscores the potential of dual-modality data generation and pretraining mechanisms in overcoming data scarcity and enhancing task performance.
Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein
Guo, Xiaotong, Yang, Deqian, Wang, Dan, Zhao, Haochen, Li, Yuan, Sui, Zhilin, Zhou, Tao, Zhang, Lijun, Meng, Yanda
Accurate segmentation of pulmonary structures iscrucial in clinical diagnosis, disease study, and treatment planning. Significant progress has been made in deep learning-based segmentation techniques, but most require much labeled data for training. Consequently, developing precise segmentation methods that demand fewer labeled datasets is paramount in medical image analysis. The emergence of pre-trained vision-language foundation models, such as CLIP, recently opened the door for universal computer vision tasks. Exploiting the generalization ability of these pre-trained foundation models on downstream tasks, such as segmentation, leads to unexpected performance with a relatively small amount of labeled data. However, exploring these models for pulmonary artery-vein segmentation is still limited. This paper proposes a novel framework called Language-guided self-adaptive Cross-Attention Fusion Framework. Our method adopts pre-trained CLIP as a strong feature extractor for generating the segmentation of 3D CT scans, while adaptively aggregating the cross-modality of text and image representations. We propose a s pecially designed adapter module to fine-tune pre-trained CLIP with a self-adaptive learning strategy to effectively fuse the two modalities of embeddings. We extensively validate our method on a local dataset, which is the largest pulmonary artery-vein CT dataset to date and consists of 718 labeled data in total. The experiments show that our method outperformed other state-of-the-art methods by a large margin. Our data and code will be made publicly available upon acceptance.
Enabling Time-series Foundation Model for Building Energy Forecasting via Contrastive Curriculum Learning
Liang, Rui, Deng, Yang, Xie, Donghua, He, Fang, Wang, Dan
Advances in time-series forecasting are driving a shift from conventional machine learning models to foundation models (FMs) that are trained with generalized knowledge. However, existing FMs still perform poorly in the energy fields, such as building energy forecasting (BEF). This paper studies the adaptation of FM to BEF tasks. We demonstrate the shortcomings of fine-tuning FM straightforwardly from both the perspectives of FM and the data. To overcome these limitations, we propose a new \textit{contrastive curriculum learning}-based training method. Our method optimizes the ordering of training data in the context of TSFM adaptation. Experiments show that our method can improve the zero/few-shot performance by 14.6\% compared to the existing FMs. Our code and new TSFM will be available at
MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation
Luo, Jialin, Wang, Yuanzhi, Gu, Ziqi, Qiu, Yide, Yao, Shuaizhen, Wang, Fuyun, Xu, Chunyan, Zhang, Wenhua, Wang, Dan, Cui, Zhen
Recently, the diffusion-based generative paradigm has achieved impressive general image generation capabilities with text prompts due to its accurate distribution modeling and stable training process. However, generating diverse remote sensing (RS) images that are tremendously different from general images in terms of scale and perspective remains a formidable challenge due to the lack of a comprehensive remote sensing image generation dataset with various modalities, ground sample distances (GSD), and scenes. In this paper, we propose a Multi-modal, Multi-GSD, Multi-scene Remote Sensing (MMM-RS) dataset and benchmark for text-to-image generation in diverse remote sensing scenarios. Specifically, we first collect nine publicly available RS datasets and conduct standardization for all samples. To bridge RS images to textual semantic information, we utilize a large-scale pretrained vision-language model to automatically output text prompts and perform hand-crafted rectification, resulting in information-rich text-image pairs (including multi-modal images). In particular, we design some methods to obtain the images with different GSD and various environments (e.g., low-light, foggy) in a single sample. With extensive manual screening and refining annotations, we ultimately obtain a MMM-RS dataset that comprises approximately 2.1 million text-image pairs. Extensive experimental results verify that our proposed MMM-RS dataset allows off-the-shelf diffusion models to generate diverse RS images across various modalities, scenes, weather conditions, and GSD. The dataset is available at https://github.com/ljl5261/MMM-RS.
SAMEdge: An Edge-cloud Video Analytics Architecture for the Segment Anything Model
Lu, Rui, Shi, Siping, Liu, Yanting, Wang, Dan
As artificial intelligence continues to evolve, it is increasingly capable of handling a wide range of video analytics tasks with merely one large model. One of the key foundation technologies is the Segment Anything Model (SAM), which allows the video analytics tasks to be determined on the fly according to the input prompts from the user. However, achieving real-time response in video analytics applications is crucial for user experiences due to the limited communication and computation resources on the edge, especially with SAM, where users may continuously interact by adding or adjusting prompts. In this paper, we propose SAMEdge, a novel edge-cloud computing architecture designed to support SAM computations for edge users. SAMEdge integrates new modules on the edge and the cloud to maximize analytics accuracy under visual prompts and image prompts input with latency constraints. It addresses resource challenges associated with prompt encoding and image encoding by offering a visual prompt transformation algorithm for visual prompts and efficient workload partitioning for image encoding. SAMEdge is implemented by extending the open-source SAM project from Meta AI. We demonstrate the practical application of SAMEdge through a case study on a Visual Tour Guide application. Our evaluation indicates that SAMEdge significantly enhances the accuracy of the video analytics application under distinct network bandwidths across various prompts.
Coarse-To-Fine Tensor Trains for Compact Visual Representations
Loeschcke, Sebastian, Wang, Dan, Leth-Espensen, Christian, Belongie, Serge, Kastoryano, Michael J., Benaim, Sagie
The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly compact tensor train representation, is still lacking. This has prevented practitioners from deploying the full potential of tensor networks for visual data. To this end, we propose 'Prolongation Upsampling Tensor Train (PuTT)', a novel method for learning tensor train representations in a coarse-to-fine manner. Our method involves the prolonging or `upsampling' of a learned tensor train representation, creating a sequence of 'coarse-to-fine' tensor trains that are incrementally refined. We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. For full results see our project webpage: https://sebulo.github.io/PuTT_website/
Privacy Protectability: An Information-theoretical Approach
Shi, Siping, Zhang, Bihai, Wang, Dan
Recently, inference privacy has attracted increasing attention. The inference privacy concern arises most notably in the widely deployed edge-cloud video analytics systems, where the cloud needs the videos captured from the edge. The video data can contain sensitive information and subject to attack when they are transmitted to the cloud for inference. Many privacy protection schemes have been proposed. Yet, the performance of a scheme needs to be determined by experiments or inferred by analyzing the specific case. In this paper, we propose a new metric, \textit{privacy protectability}, to characterize to what degree a video stream can be protected given a certain video analytics task. Such a metric has strong operational meaning. For example, low protectability means that it may be necessary to set up an overall secure environment. We can also evaluate a privacy protection scheme, e.g., assume it obfuscates the video data, what level of protection this scheme has achieved after obfuscation. Our definition of privacy protectability is rooted in information theory and we develop efficient algorithms to estimate the metric. We use experiments on real data to validate that our metric is consistent with empirical measurements on how well a video stream can be protected for a video analytics task.
On-edge Multi-task Transfer Learning: Model and Practice with Data-driven Task Allocation
Zheng, Zimu, Chen, Qiong, Hu, Chuang, Wang, Dan, Liu, Fangming
On edge devices, data scarcity occurs as a common problem where transfer learning serves as a widely-suggested remedy. Nevertheless, transfer learning imposes a heavy computation burden to resource-constrained edge devices. Existing task allocation works usually assume all submitted tasks are equally important, leading to inefficient resource allocation at a task level when directly applied in Multi-task Transfer Learning (MTL). To address these issues, we first reveal that it is crucial to measure the impact of tasks on overall decision performance improvement and quantify \emph{task importance}. We then show that task allocation with task importance for MTL (TATIM) is a variant of the NP-complete Knapsack problem, where the complicated computation to solve this problem needs to be conducted repeatedly under varying contexts. To solve TATIM with high computational efficiency, we propose a Data-driven Cooperative Task Allocation (DCTA) approach. Finally, we evaluate the performance of DCTA by not only a trace-driven simulation, but also a new comprehensive real-world AIOps case study that bridges model and practice via a new architecture and main components design within the AIOps system. Extensive experiments show that our DCTA reduces 3.24 times of processing time, and saves 48.4\% energy consumption compared with the state-of-the-art when solving TATIM.
Application of Deep Neural Networks to assess corporate Credit Rating
Golbayani, Parisa, Wang, Dan, Florescu, Ionut
Recent literature implements machine learning techniques to assess corporate credit rating based on financial statement reports. In this work, we analyze the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor's. We analyze companies from the energy, financial and healthcare sectors in US. The goal of the analysis is to improve application of machine learning algorithms to credit assessment. To this end, we focus on three questions. First, we investigate if the algorithms perform better when using a selected subset of features, or if it is better to allow the algorithms to select features themselves. Second, is the temporal aspect inherent in financial data important for the results obtained by a machine learning algorithm? Third, is there a particular neural network architecture that consistently outperforms others with respect to input features, sectors and holdout set? We create several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedure.
Mobile APP User Attribute Prediction by Heterogeneous Information Network Modeling
Zhang, Hekai, Gong, Jibing, Teng, Zhiyong, Wang, Dan, Wang, Hongfei, Du, Linfeng, Bhuiyan, Zakirul Alam
User-based attribute information, such as age and gender, is usually considered as user privacy information. It is difficult for enterprises to obtain user-based privacy attribute information. However, user-based privacy attribute information has a wide range of applications in personalized services, user behavior analysis and other aspects. this paper advances the HetPathMine model and puts forward TPathMine model. With applying the number of clicks of attributes under each node to express the user's emotional preference information, optimizations of the solution of meta-path weight are also presented. Based on meta-path in heterogeneous information networks, the new model integrates all relationships among objects into isomorphic relationships of classified objects. Matrix is used to realize the knowledge dissemination of category knowledge among isomorphic objects. The experimental results show that: (1) the prediction of user attributes based on heterogeneous information networks can achieve higher accuracy than traditional machine learning classification methods; (2) TPathMine model based on the number of clicks is more accurate in classifying users of different age groups, and the weight of each meta-path is consistent with human intuition or the real world situation.