AITopics

2308.02282

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-24-2023

Causal Discovery with Unobserved Variables: A Proxy Variable Approach

Liu, Mingzhou, Sun, Xinwei, Qiao, Yu, Wang, Yizhou

Discovering causal relations from observational data is important. The existence of unobserved variables, such as latent confounders or mediators, can mislead the causal identification. To address this issue, proximal causal discovery methods proposed to adjust for the bias with the proxy of the unobserved variable. However, these methods presumed the data is discrete, which limits their real-world application. In this paper, we propose a proximal causal discovery method that can well handle the continuous variables. Our observation is that discretizing continuous variables can can lead to serious errors and comprise the power of the proxy. Therefore, to use proxy variables in the continuous case, the critical point is to control the discretization error. To this end, we identify mild regularity conditions on the conditional distributions, enabling us to control the discretization error to an infinitesimal level, as long as the proxy is discretized with sufficiently fine, finite bins. Based on this, we design a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present. Our test is consistent, meaning it has ideal power when large samples are available. We demonstrate the effectiveness of our method using synthetic and real-world data.

artificial intelligence, data mining, machine learning, (18 more...)

2305.05281

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.68)
Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

arXiv.org Artificial IntelligenceFeb-27-2023

Out-of-Distribution Representation Learning for Time Series Classification

Lu, Wang, Wang, Jindong, Sun, Xinwei, Chen, Yiqiang, Xie, Xing

Time series classification is an important problem in real world. Due to its non-stationary property that the distribution changes over time, it remains challenging to build models for generalization to unseen distributions. In this paper, we propose to view the time series classification problem from the distribution perspective. We argue that the temporal complexity attributes to the unknown latent distributions within. To this end, we propose DIVERSIFY to learn generalized representations for time series classification. DIVERSIFY takes an iterative process: it first obtains the worst-case distribution scenario via adversarial training, then matches the distributions of the obtained sub-domains. We also present some theoretical insights. We conduct experiments on gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition with a total of seven datasets in different settings. Results demonstrate that DIVERSIFY significantly outperforms other baselines and effectively characterizes the latent distributions by qualitative and quantitative analysis. Code is available at: https://github.com/microsoft/robustlearn.

artificial intelligence, deep learning, machine learning, (19 more...)

2209.07027

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningJul-5-2021

Causally Invariant Predictor with Shift-Robustness

Zheng, Xiangyu, Sun, Xinwei, Chen, Wei, Liu, Tie-Yan

This paper proposes an invariant causal predictor that is robust to distribution shift across domains and maximally reserves the transferable invariant information. Based on a disentangled causal factorization, we formulate the distribution shift as soft interventions in the system, which covers a wide range of cases for distribution shift as we do not make prior specifications on the causal structure or the intervened variables. Instead of imposing regularizations to constrain the invariance of the predictor, we propose to predict by the intervened conditional expectation based on the do-operator and then prove that it is invariant across domains. More importantly, we prove that the proposed predictor is the robust predictor that minimizes the worst-case quadratic loss among the distributions of all domains. For empirical learning, we propose an intuitive and flexible estimating method based on data regeneration and present a local causal discovery procedure to guide the regeneration step. The key idea is to regenerate data such that the regenerated distribution is compatible with the intervened graph, which allows us to incorporate standard supervised learning methods with the regenerated data. Experimental results on both synthetic and real data demonstrate the efficacy of our predictor in improving the predictive accuracy and robustness across domains.

immunology, inductive learning, predictor, (17 more...)

2107.01876

Country: Asia (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)

arXiv.org Machine LearningNov-4-2020

Latent Causal Invariant Model

Sun, Xinwei, Wu, Botong, Liu, Chang, Zheng, Xiangyu, Chen, Wei, Qin, Tao, Liu, Tie-yan

Current supervised learning can learn spurious correlation during the data-fitting process, imposing issues regarding interpretability, out-of-distribution (OOD) generalization, and robustness. To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction. Specifically, we introduce latent variables that are separated into (a) output-causative factors and (b) others that are spuriously correlated to the output via confounders, to model the underlying causal factors. We further assume the generating mechanisms from latent space to observed data to be causally invariant. We give the identifiable claim of such invariance, particularly the disentanglement of output-causative factors from others, as a theoretical guarantee for precise inference and avoiding spurious correlation. We propose a Variational-Bayesian-based method for estimation and to optimize over the latent space for prediction. The utility of our approach is verified by improved interpretability, prediction power on various OOD scenarios (including healthcare) and robustness on security.

arxiv preprint arxiv, neural network, neurology, (19 more...)

2011.02203

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Switzerland (0.14)
Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Information Technology > Security & Privacy (0.45)
Health & Medicine > Diagnostic Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceNov-3-2020

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Liu, Chang, Sun, Xinwei, Wang, Jindong, Li, Tao, Qin, Tao, Chen, Wei, Liu, Tie-Yan

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on causality to model the two factors separately, and learn it on a single training domain for prediction without (OOD generalization) or with (domain adaptation) unsupervised data in a test domain. We prove that CSG identifies the semantic factor on the training domain, and the invariance principle of causality subsequently guarantees the boundedness of OOD generalization error and the success of adaptation. We design learning methods for both effective learning and easy prediction, by leveraging the graphical structure of CSG. Empirical study demonstrates the effect of our methods to improve test accuracy for OOD generalization and domain adaptation.

adaptation, deep learning, neural network, (19 more...)

2011.01681

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJul-4-2020

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Fu, Yanwei, Liu, Chen, Li, Donghao, Sun, Xinwei, Zeng, Jinshan, Yao, Yuan

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.

deep learning, dessilbi, neural network, (16 more...)

2007.0201

Country:

Asia (0.67)
Europe (0.67)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Machine LearningOct-13-2019

iSplit LBI: Individualized Partial Ranking with Ties via Split LBI

Xu, Qianqian, Sun, Xinwei, Yang, Zhiyong, Cao, Xiaochun, Huang, Qingming, Yao, Yuan

Due to the inherent uncertainty of data, the problem of predicting partial ranking from pairwise comparison data with ties has attracted increasing interest in recent years. However, in real-world scenarios, different individuals often hold distinct preferences. It might be misleading to merely look at a global partial ranking while ignoring personal diversity. In this paper, instead of learning a global ranking which is agreed with the consensus, we pursue the tie-aware partial ranking from an individualized perspective. Particularly, we formulate a unified framework which not only can be used for individualized partial ranking prediction, but also be helpful for abnormal user selection. This is realized by a variable splitting-based algorithm called \ilbi. Specifically, our algorithm generates a sequence of estimations with a regularization path, where both the hyperparameters and model parameters are updated. At each step of the path, the parameters can be decomposed into three orthogonal parts, namely, abnormal signals, personalized signals and random noise. The abnormal signals can serve the purpose of abnormal user selection, while the abnormal signals and personalized signals together are mainly responsible for individual partial ranking prediction. Extensive experiments on simulated and real-world datasets demonstrate that our new approach significantly outperforms state-of-the-art alternatives. The code is now availiable at https://github.com/qianqianxu010/NeurIPS2019-iSplitLBI.

crowdsourcing, international conference, social media, (18 more...)

1910.05905

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.69)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningMay-22-2019

Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence

Fu, Yanwei, Liu, Chen, Li, Donghao, Sun, Xinwei, Zeng, Jinshan, Yao, Yuan

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling an over-parameterized model to compressive ones, we propose a parsimonious learning approach based on differential inclusions of inverse scale spaces, that generates a family of models from simple to complex ones with a better efficiency and interpretability than stochastic gradient descent in exploring the model space. It enjoys a simple discretization, the Split Linearized Bregman Iterations, with provable global convergence that from any initializations, algorithmic iterations converge to a critical point of empirical risks. One may exploit the proposed method to boost the complexity of neural networks progressively. Numerical experiments with MNIST, Cifar-10/100, and ImageNet are conducted to show the method is promising in training large scale models with a favorite interpretability.

deep learning, neural network, splitlbi, (15 more...)

1905.09449

Country: North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-24-2019

$S^{2}$-LBI: Stochastic Split Linearized Bregman Iterations for Parsimonious Deep Learning

Fu, Yanwei, Li, Donghao, Sun, Xinwei, Zhang, Shun, Wang, Yizhou, Yao, Yuan

This paper proposes a novel Stochastic Split Linearized Bregman Iteration ($S^{2}$-LBI) algorithm to efficiently train the deep network. The $S^{2}$-LBI introduces an iterative regularization path with structural sparsity. Our $S^{2}$-LBI combines the computational efficiency of the LBI, and model selection consistency in learning the structural sparsity. The computed solution path intrinsically enables us to enlarge or simplify a network, which theoretically, is benefited from the dynamics property of our $S^{2}$-LBI algorithm. The experimental results validate our $S^{2}$-LBI on MNIST and CIFAR-10 dataset. For example, in MNIST, we can either boost a network with only 1.5K parameters (1 convolutional layer of 5 filters, and 1 FC layer), achieves 98.40\% recognition accuracy; or we simplify $82.5\%$ of parameters in LeNet-5 network, and still achieves the 98.47\% recognition accuracy. In addition, we also have the learning results on ImageNet, which will be added in the next version of our report.

algorithm, deep learning, neural network, (16 more...)

1904.10873

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)