Bayesian Learning
Finite sample properties of parametric MMD estimation: robustness to misspecification and dependence
Chérief-Abdellatif, Badr-Eddine, Alquier, Pierre
One of the main challenges in statistics is the design of a universal estimation procedure. Given data, a universal procedure is an algorithm that provides an estimator of the generating distribution which is simultaneously statistically consistent when the true distribution belongs to the model, and robust otherwise. Typically, a universal estimator is consistent for any model, with minimaxoptimal or fast rates of convergence and is robust to small departures from the model assumptions [Bickel, 1976] such as sparse instead of dense effects or non-Gaussian errors in high dimensional linear regression. Unfortunately, most statistical procedures are based upon strong assumptions on the model or on the corresponding parameter set, and very famous estimation methods such as maximum likelihood estimation (MLE), method of moments or Bayesian posterior inference may fail even on simple problems when such assumptions do not hold. For instance, even though MLE is consistent and asymptotically normal with optimal rates of convergence in parametric estimation under suitable regularity assumptions [Le Cam, 1970, Van der Vaart, 1990] and in nonparametric estimation under entropy conditions, this method behaves poorly in case of misspecification when the true generating distribution of the data does not belong to the chosen model. Let us investigate a simple example presented in [Birgé, 2006] that illustrates the non-universal characteristic of MLE. We observe a collection of n independent and identically distributed (i.i.d) random variables X
A method of supervised learning from conflicting data with hidden contexts
Zhang, Tianren, Jiang, Yizhou, Chen, Feng
Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.
A Hybrid Transformer Model for Fake News Detection: Leveraging Bayesian Optimization and Bidirectional Recurrent Unit
Huang, Tianyi, Xu, Zeqiu, Yu, Peiyang, Yi, Jingyuan, Xu, Xiaochuan
In this paper, we propose an optimized Transformer model that integrates Bayesian algorithms with a Bidirectional Gated Recurrent Unit (BiGRU), and apply it to fake news classification for the first time. First, we employ the TF-IDF method to extract features from news texts and transform them into numeric representations to facilitate subsequent machine learning tasks. Two sets of experiments are then conducted for fake news detection and classification: one using a Transformer model optimized only with BiGRU, and the other incorporating Bayesian algorithms into the BiGRU-based Transformer. Experimental results show that the BiGRU-optimized Transformer achieves 100% accuracy on the training set and 99.67% on the test set, while the addition of the Bayesian algorithm maintains 100% accuracy on the training set and slightly improves test-set accuracy to 99.73%. This indicates that the Bayesian algorithm boosts model accuracy by 0.06%, further enhancing the detection capability for fake news. Moreover, the proposed algorithm converges rapidly at around the 10th training epoch with accuracy nearing 100%, demonstrating both its effectiveness and its fast classification ability. Overall, the optimized Transformer model, enhanced by the Bayesian algorithm and BiGRU, exhibits excellent continuous learning and detection performance, offering a robust technical means to combat the spread of fake news in the current era of information overload.
Neural Spatiotemporal Point Processes: Trends and Challenges
Mukherjee, Sumantrak, Elhamdi, Mouad, Mohler, George, Selby, David A., Xie, Yao, Vollmer, Sebastian, Grossmann, Gerrit
Spatiotemporal point processes (STPPs) are probabilistic models for events occurring in continuous space and time. Real-world event data often exhibit intricate dependencies and heterogeneous dynamics. By incorporating modern deep learning techniques, STPPs can model these complexities more effectively than traditional approaches. Consequently, the fusion of neural methods with STPPs has become an active and rapidly evolving research area. In this review, we categorize existing approaches, unify key design choices, and explain the challenges of working with this data modality. We further highlight emerging trends and diverse application domains. Finally, we identify open challenges and gaps in the literature.
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition
Liu, Yuanshi, Zhang, Haihan, Chen, Qian, Fang, Cong
A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution (OOD). In theory, the challenge remains unsolved even under the canonical setting of covariate shift for the linear model. This paper studies the foundational (high-dimensional) linear regression where the ground truth variables are confined to an ellipse-shape constraint and addresses two fundamental questions in this regime: (i) given the target covariate matrix, what is the min-max \emph{optimal} algorithm under covariate shift? (ii) for what kinds of target classes, the commonly-used SGD-type algorithms achieve optimality? Our analysis starts with establishing a tight lower generalization bound via a Bayesian Cramer-Rao inequality. For (i), we prove that the optimal estimator can be simply a certain linear transformation of the best estimator for the source distribution. Given the source and target matrices, we show that the transformation can be efficiently computed via a convex program. The min-max optimal analysis for SGD leverages the idea that we recognize both the accumulated updates of the applied algorithms and the ideal transformation as preconditions on the learning variables. We provide sufficient conditions when SGD with its acceleration variants attain optimality.
Advancing machine fault diagnosis: A detailed examination of convolutional neural networks
Vashishtha, Govind, Chauhan, Sumika, Sehri, Mert, Hebda-Sobkowicz, Justyna, Zimroz, Radoslaw, Dumond, Patrick, Kumar, Rajesh
The growing complexity of machinery and the increasing demand for operational efficiency and safety have driven the development of advanced fault diagnosis techniques. Among these, convolutional neural networks (CNNs) have emerged as a powerful tool, offering robust and accurate fault detection and classification capabilities. This comprehensive review delves into the application of CNNs in machine fault diagnosis, covering its theoretical foundation, architectural variations, and practical implementations. The strengths and limitations of CNNs are analyzed in this domain, discussing their effectiveness in handling various fault types, data complexities, and operational environments. Furthermore, we explore the evolving landscape of CNN-based fault diagnosis, examining recent advancements in data augmentation, transfer learning, and hybrid architectures. Finally, we highlight future research directions and potential challenges to further enhance the application of CNNs for reliable and proactive machine fault diagnosis.
2D Integrated Bayesian Tomography of Plasma Electron Density Profile for HL-3 Based on Gaussian Process
Wang, Cong, Yang, Renjie, Li, Dong, Yang, Zongyu, Wang, Zhijun, Wei, Yixiong, Li, Jing
This paper introduces an integrated Bayesian model that combines line integral measurements and point values using Gaussian Process (GP). The proposed method leverages Gaussian Process Regression (GPR) to incorporate point values into 2D profiles and employs coordinate mapping to integrate magnetic flux information for 2D inversion. The average relative error of the reconstructed profile, using the integrated Bayesian tomography model with normalized magnetic flux, is as low as 3.60*10^(-4). Additionally, sensitivity tests were conducted on the number of grids, the standard deviation of synthetic diagnostic data, and noise levels, laying a solid foundation for the application of the model to experimental data. This work not only achieves accurate 2D inversion using the integrated Bayesian model but also provides a robust framework for decoupling pressure information from equilibrium reconstruction, thus making it possible to optimize equilibrium reconstruction using inversion results.
Learning in Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners
Easley, David, Kolumbus, Yoav, Tardos, Eva
We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our agents aim to maximize the expected growth rate of their wealth but have different theories on how to learn this best. We focus on comparing Bayesian and no-regret learners in market dynamics. Bayesian learners with a prior over a finite set of models that assign positive prior probability to the correct model have posterior probabilities that converge exponentially to the correct model. Consequently, they survive even in the presence of agents who invest according to the correct model of the stochastic process. Bayesians with a continuum prior converge to the correct model at a rate of $O((\log T)/T)$. Online learning theory provides no-regret algorithms for maximizing the log of wealth in this setting, achieving a worst-case regret bound of $O(\log T)$ without assuming a steady underlying stochastic process but comparing to the best fixed investment rule. This regret, as we observe, is of the same order of magnitude as that of a Bayesian learner with a continuum prior. However, we show that even such low regret may not be sufficient for survival in asset markets: an agent can have regret as low as $O(\log T)$, but still vanish in market dynamics when competing against agents who invest according to the correct model or even against a perfect Bayesian with a finite prior. On the other hand, we show that Bayesian learning is fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Any no-regret learner will drive out of the market an imperfect Bayesian whose finite prior or update rule has even small errors. We formally establish the relationship between notions of survival, vanishing, and market domination studied in economics and the framework of regret minimization, thus bridging these theories.
Data Sensor Fusion In Digital Twin Technology For Enhanced Capabilities In A Home Environment
Momoh, Benjamin, Yahaya, Salisu
This paper investigates the integration of data sensor fusion in digital twin technology to bolster home environment capabilities, particularly in the context of challenges brought on by the coronavirus pandemic and its economic effects. The study underscores the crucial role of digital transformation in not just adapting to, but also mitigating disruptions during the fourth industrial revolution. Using the Wit Motion sensor, data was collected for activities such as walking, working, sitting, and lying, with sensors measuring accelerometers, gyroscopes, and magnetometers. The research integrates Cyber-physical systems, IoT, AI, and robotics to fortify digital twin capabilities. The paper compares sensor fusion methods, including feature-level fusion, decision-level fusion, and Kalman filter fusion, alongside machine learning models like SVM, GBoost, and Random Forest to assess model effectiveness. Results show that sensor fusion significantly improves the accuracy and reliability of these models, as it compensates for individual sensor weaknesses, particularly with magnetometers. Despite higher accuracy in ideal conditions, integrating data from multiple sensors ensures more consistent and reliable results in real-world settings, thereby establishing a robust system that can be confidently applied in practical scenarios.