corr 0
Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning
Nguyen, Duy A., Kamboj, Abhi, Do, Minh N.
Addressing missing modalities and limited labeled data is crucial for advancing robust multimodal learning. We propose Robult, a scalable framework designed to mitigate these challenges by preserving modality-specific information and leveraging redundancy through a novel information-theoretic approach. Robult optimizes two core objectives: (1) a soft Positive-Unlabeled (PU) contrastive loss that maximizes task-relevant feature alignment while effectively utilizing limited labeled data in semi-supervised settings, and (2) a latent reconstruction loss that ensures unique modality-specific information is retained. These strategies, embedded within a modular design, enhance performance across various downstream tasks and ensure resilience to incomplete modalities during inference. Experimental results across diverse datasets validate that Robult achieves superior performance over existing approaches in both semi-supervised learning and missing modality contexts. Furthermore, its lightweight design promotes scalability and seamless integration with existing architectures, making it suitable for real-world multimodal applications.
How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data
Stoian, Mihaela Cฤtฤlina, Dyrmishi, Salijona, Cordy, Maxime, Lukasiewicz, Thomas, Giunchiglia, Eleonora
Deep Generative Models (DGMs) have been shown to be powerful tools for generating tabular data, as they have been increasingly able to capture the complex distributions that characterize them. However, to generate realistic synthetic data, it is often not enough to have a good approximation of their distribution, as it also requires compliance with constraints that encode essential background knowledge on the problem at hand. In this paper, we address this limitation and show how DGMs for tabular data can be transformed into Constrained Deep Generative Models (C-DGMs), whose generated samples are guaranteed to be compliant with the given constraints. This is achieved by automatically parsing the constraints and transforming them into a Constraint Layer (CL) seamlessly integrated with the DGM. Our extensive experimental analysis with various DGMs and tasks reveals that standard DGMs often violate constraints, some exceeding $95\%$ non-compliance, while their corresponding C-DGMs are never non-compliant. Then, we quantitatively demonstrate that, at training time, C-DGMs are able to exploit the background knowledge expressed by the constraints to outperform their standard counterparts with up to $6.5\%$ improvement in utility and detection. Further, we show how our CL does not necessarily need to be integrated at training time, as it can be also used as a guardrail at inference time, still producing some improvements in the overall performance of the models. Finally, we show that our CL does not hinder the sample generation time of the models.
Adaptive Dependency Learning Graph Neural Networks
Sriramulu, Abishek, Fourrier, Nicolas, Bergmeir, Christoph
Graph Neural Networks (GNN) have recently gained popularity in the forecasting domain due to their ability to model complex spatial and temporal patterns in tasks such as traffic forecasting and region-based demand forecasting. Most of these methods require a predefined graph as input, whereas in real-life multivariate time series problems, a well-predefined dependency graph rarely exists. This requirement makes it harder for GNNs to be utilised widely for multivariate forecasting problems in other domains such as retail or energy. In this paper, we propose a hybrid approach combining neural networks and statistical structure learning models to self-learn the dependencies and construct a dynamically changing dependency graph from multivariate data aiming to enable the use of GNNs for multivariate forecasting even when a well-defined graph does not exist. The statistical structure modeling in conjunction with neural networks provides a well-principled and efficient approach by bringing in causal semantics to determine dependencies among the series. Finally, we demonstrate significantly improved performance using our proposed approach on real-world benchmark datasets without a pre-defined dependency graph.
Homological Neural Networks: A Sparse Architecture for Multivariate Complexity
Wang, Yuanrong, Briola, Antonio, Aste, Tomaso
The rapid progress of Artificial Intelligence research came with the development of increasingly complex deep learning models, leading to growing challenges in terms of computational complexity, energy efficiency and interpretability. In this study, we apply advanced network-based information filtering techniques to design a novel deep neural network unit characterized by a sparse higher-order graphical architecture built over the homological structure of underlying data. We demonstrate its effectiveness in two application domains which are traditionally challenging for deep learning: tabular data and time series regression problems. Results demonstrate the advantages of this novel design which can tie or overcome the results of state-of-the-art machine learning and deep learning models using only a fraction of parameters.
Grasping Core Rules of Time Series through Pure Models
Liu, Gedi, Jiang, Yifeng, Ouyang, Yi, Zhong, Keyang, Wang, Yang
Time series underwent the transition from statistics to deep learning, as did many other machine learning fields. Although it appears that the accuracy has been increasing as the model is updated in a number of publicly available datasets, it typically only increases the scale by several times in exchange for a slight difference in accuracy. Through this experiment, we point out a different line of thinking, time series, especially long-term forecasting, may differ from other fields. It is not necessary to use extensive and complex models to grasp all aspects of time series, but to use pure models to grasp the core rules of time series changes. With this simple but effective idea, we created PureTS, a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks while being nearly the lightest model and having the fastest running speed. On this basis, we discuss the potential of pure linear layers in both phenomena and essence. The ability to understand the core law contributes to the high precision of long-distance prediction, and reasonable fluctuation prevents it from distorting the curve in multi-step prediction like mainstream deep learning models, which is summarized as a pure linear neural network that avoids over-fluctuating. Finally, we suggest the fundamental design standards for lightweight long-step time series tasks: input and output should try to have the same dimension, and the structure avoids fragmentation and complex operations.
Disentanglement and Generalization Under Correlation Shifts
Funke, Christina M., Vicol, Paul, Wang, Kuan-Chieh, Kรผmmerer, Matthias, Zemel, Richard, Bethge, Matthias
Correlations between factors of variation are prevalent in real-world data. Machine learning algorithms may benefit from exploiting such correlations, as they can increase predictive performance on noisy data. However, often such correlations are not robust (e.g., they may change between domains, datasets, or applications) and we wish to avoid exploiting them. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems with Gaussian data. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings.
Dialysis adequacy predictions using a machine learning method - Scientific Reports
Dialysis adequacy is an important survival indicator in patients with chronic hemodialysis. However, there are inconveniences and disadvantages to measuring dialysis adequacy by blood samples. This study used machine learning models to predict dialysis adequacy in chronic hemodialysis patients using repeatedly measured data during hemodialysis. This study included 1333 hemodialysis sessions corresponding to the monthly examination dates of 61 patients. Patient demographics and clinical parameters were continuously measured from the hemodialysis machine; 240 measurements were collected from each hemodialysis session. Machine learning models (random forest and extreme gradient boosting [XGBoost]) and deep learning models (convolutional neural network and gated recurrent unit) were compared with multivariable linear regression models. The mean absolute percentage error (MAPE), root mean square error (RMSE), and Spearmanโs rank correlation coefficient (Corr) for each model using fivefold cross-validation were calculated as performance measurements. The XGBoost model had the best performance among all methods (MAPEโ=โ2.500; RMSEโ=โ2.906; Corrโ=โ0.873). The deep learning models with convolutional neural network (MAPEโ=โ2.835; RMSEโ=โ3.125; Corrโ=โ0.833) and gated recurrent unit (MAPEโ=โ2.974; RMSEโ=โ3.230; Corrโ=โ0.824) had similar performances. The linear regression models had the lowest performance (MAPEโ=โ3.284; RMSEโ=โ3.586; Corrโ=โ0.770) compared with other models. Machine learning methods can accurately infer hemodialysis adequacy using continuously measured data from hemodialysis machines.
Deep Learning for Two-Sided Matching
Ravindranath, Sai Srivatsa, Feng, Zhe, Li, Shira, Ma, Jonathan, Kominers, Scott D., Parkes, David C.
Two-sided matching markets, such as Uber, Airbnb, stock markets, and dating apps, play a significant role in today's world. As a result, there is a tremendous and rising interest to design better mechanisms for two-sided matching. The seminal work of Gale and Shapley [14] introduced a simple mechanism for stable matching in two-sided markets--Deferred-acceptance (DA)--which has since has been applied in doctor-hospital matching [25], school choice [3, 22, 2], and the matching of cadets to their branches of military service [30, 29]. DA is stable, i.e., no pair of agents mutually prefer each other to their DA partners. On the other hand, DA is not strategy-proof (SP); that is, under fully general preferences, it is always possible that some agent can mis-report her preferences to obtain a better matching than she would receive under the DA mechanism.
Multi-Task Time Series Forecasting With Shared Attention
Chen, Zekai, E, Jiaze, Zhang, Xiao, Sheng, Hao, Cheng, Xiuzheng
Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on single-task forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient training instances. As the Transformer architecture and other attention-based models have demonstrated its great capability of capturing long term dependency, we propose two self-attention based sharing schemes for multi-task time series forecasting which can train jointly across multiple tasks. We augment a sequence of paralleled Transformer encoders with an external public multi-head attention function, which is updated by all data of all tasks. Experiments on a number of real-world multi-task time series forecasting tasks show that our proposed architectures can not only outperform the state-of-the-art single-task forecasting baselines but also outperform the RNN-based multi-task forecasting method.
Parallel Extraction of Long-term Trends and Short-term Fluctuation Framework for Multivariate Time Series Forecasting
Xu, Haoyan, Duan, Ziheng, Huang, Yida, Feng, Jie, Ren, Anni, Zhang, Qianru, Song, Pengyu, Wang, Xiaoqian
Multivariate time series forecasting is widely used in various fields. Reasonable prediction results can assist people in planning and decision-making, generate benefits and avoid risks. Normally, there are two characteristics of time series, that is, long-term trend and short-term fluctuation. For example, stock prices will have a long-term upward trend with the market, but there may be a small decline in the short term. These two characteristics are often relatively independent of each other. However, the existing prediction methods often do not distinguish between them, which reduces the accuracy of the prediction model. In this paper, a MTS forecasting framework that can capture the long-term trends and short-term fluctuations of time series in parallel is proposed. This method uses the original time series and its first difference to characterize long-term trends and short-term fluctuations. Three prediction sub-networks are constructed to predict long-term trends, short-term fluctuations and the final value to be predicted. In the overall optimization goal, the idea of multi-task learning is used for reference, which is to make the prediction results of long-term trends and short-term fluctuations as close to the real values as possible while requiring to approximate the values to be predicted. In this way, the proposed method uses more supervision information and can more accurately capture the changing trend of the time series, thereby improving the forecasting performance.