Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score. We will make the code publicly available shortly.
Probabilistic graphical modeling (PGM) provides a framework for formulating an interpretable generative process of data and expressing uncertainty about unknowns, but it lacks flexibility. Deep learning (DL) is an alternative framework for learning from data that has achieved great empirical success in recent years. DL offers great flexibility, but it lacks the interpretability and calibration of PGM. This thesis develops deep probabilistic graphical modeling (DPGM.) DPGM consists in leveraging DL to make PGM more flexible. DPGM brings about new methods for learning from data that exhibit the advantages of both PGM and DL. We use DL within PGM to build flexible models endowed with an interpretable latent structure. One model class we develop extends exponential family PCA using neural networks to improve predictive performance while enforcing the interpretability of the latent factors. Another model class we introduce enables accounting for long-term dependencies when modeling sequential data, which is a challenge when using purely DL or PGM approaches. Finally, DPGM successfully solves several outstanding problems of probabilistic topic models, a widely used family of models in PGM. DPGM also brings about new algorithms for learning with complex data. We develop reweighted expectation maximization, an algorithm that unifies several existing maximum likelihood-based algorithms for learning models parameterized by neural networks. This unifying view is made possible using expectation maximization, a canonical inference algorithm in PGM. We also develop entropy-regularized adversarial learning, a learning paradigm that deviates from the traditional maximum likelihood approach used in PGM. From the DL perspective, entropy-regularized adversarial learning provides a solution to the long-standing mode collapse problem of generative adversarial networks, a widely used DL approach.
Stochastic optimal control problems have a long tradition in applied probability, with the questions addressed being of high relevance in a multitude of fields. Even though theoretical solutions are well understood in many scenarios, their practicability suffers from the assumption of known dynamics of the underlying stochastic process, raising the statistical challenge of developing purely data-driven strategies. For the mathematically separated classes of continuous diffusion processes and L\'evy processes, we show that developing efficient strategies for related singular stochastic control problems can essentially be reduced to finding rate-optimal estimators with respect to the sup-norm risk of objects associated to the invariant distribution of ergodic processes which determine the theoretical solution of the control problem. From a statistical perspective, we exploit the exponential $\beta$-mixing property as the common factor of both scenarios to drive the convergence analysis, indicating that relying on general stability properties of Markov processes is a sufficiently powerful and flexible approach to treat complex applications requiring statistical methods. We show moreover that in the L\'evy case $-$ even though per se jump processes are more difficult to handle both in statistics and control theory $-$ a fully data-driven strategy with regret of significantly better order than in the diffusion case can be constructed.
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
This includes missions related to environment exploration and monitoring in which an UAV is tasked with producing a map for a quantity of interest (e.g., pollutant concentration, terrain elevation, or vegetation growth) by collecting measurements at various locations across a region of interest (e.g., a reservoir, a city, or a crop) [10, 13, 17, 23, 40]. The data collected by the UAV can be used to construct a statistical model for the quantity of interest, which in turn can be used for analysis and policy making. Of course, the statistical model is only as good as the measurements made by the UAV. Therefore, the question of data collection (i.e., how, when, and where to make measurements) is of paramount importance, especially from the standpoint of detecting anomalies in the environment. Path-planning algorithms for environment exploration come in two flavors. Approaches in which the UAV decides on its next move one step at a time are referred to as myopic [24, 42]. Myopic algorithms are suitable for most situations but lack a mechanism for anticipation, which may be problematic in cases where path-planning decisions may have negative longterm consequences (e.g., the UAV gets stuck because of maneuverability constraints).
Building operations represent a significant percentage of the total primary energy consumed in most countries due to the proliferation of Heating, Ventilation and Air-Conditioning (HVAC) installations in response to the growing demand for improved thermal comfort. Reducing the associated energy consumption while maintaining comfortable conditions in buildings are conflicting objectives and represent a typical optimization problem that requires intelligent system design. Over the last decade, different methodologies based on the Artificial Intelligence (AI) techniques have been deployed to find the sweet spot between energy use in HVAC systems and suitable indoor comfort levels to the occupants. This paper performs a comprehensive and an in-depth systematic review of AI-based techniques used for building control systems by assessing the outputs of these techniques, and their implementations in the reviewed works, as well as investigating their abilities to improve the energy-efficiency, while maintaining thermal comfort conditions. This enables a holistic view of (1) the complexities of delivering thermal comfort to users inside buildings in an energy-efficient way, and (2) the associated bibliographic material to assist researchers and experts in the field in tackling such a challenge. Among the 20 AI tools developed for both energy consumption and comfort control, functions such as identification and recognition patterns, optimization, predictive control. Based on the findings of this work, the application of AI technology in building control is a promising area of research and still an ongoing, i.e., the performance of AI-based control is not yet completely satisfactory. This is mainly due in part to the fact that these algorithms usually need a large amount of high-quality real-world data, which is lacking in the building or, more precisely, the energy sector.
Artificial Intelligence is one of the fastest growing technologies of the 21st century and accompanies us in our daily lives when interacting with technical applications. However, reliance on such technical systems is crucial for their widespread applicability and acceptance. The societal tools to express reliance are usually formalized by lawful regulations, i.e., standards, norms, accreditations, and certificates. Therefore, the T\"UV AUSTRIA Group in cooperation with the Institute for Machine Learning at the Johannes Kepler University Linz, proposes a certification process and an audit catalog for Machine Learning applications. We are convinced that our approach can serve as the foundation for the certification of applications that use Machine Learning and Deep Learning, the techniques that drive the current revolution in Artificial Intelligence. While certain high-risk areas, such as fully autonomous robots in workspaces shared with humans, are still some time away from certification, we aim to cover low-risk applications with our certification procedure. Our holistic approach attempts to analyze Machine Learning applications from multiple perspectives to evaluate and verify the aspects of secure software development, functional requirements, data quality, data protection, and ethics. Inspired by existing work, we introduce four criticality levels to map the criticality of a Machine Learning application regarding the impact of its decisions on people, environment, and organizations. Currently, the audit catalog can be applied to low-risk applications within the scope of supervised learning as commonly encountered in industry. Guided by field experience, scientific developments, and market demands, the audit catalog will be extended and modified accordingly.
Markov networks are popular models for discrete multivariate systems where the dependence structure of the variables is specified by an undirected graph. To allow for more expressive dependence structures, several generalizations of Markov networks have been proposed. Here we consider the class of contextual Markov networks which takes into account possible context-specific independences among pairs of variables. Structure learning of contextual Markov networks is very challenging due to the extremely large number of possible structures. One of the main challenges has been to design a score, by which a structure can be assessed in terms of model fit related to complexity, without assuming chordality. Here we introduce the marginal pseudo-likelihood as an analytically tractable criterion for general contextual Markov networks. Our criterion is shown to yield a consistent structure estimator. Experiments demonstrate the favorable properties of our method in terms of predictive accuracy of the inferred models.
Model-based planning and prospection are widely studied in both cognitive neuroscience and artificial intelligence (AI), but from different perspectives - and with different desiderata in mind (biological realism versus scalability) that are difficult to reconcile. Here, we introduce a novel method to plan in large POMDPs - Active Tree Search - that combines the normative character and biological realism of a leading planning theory in neuroscience (Active Inference) and the scalability of Monte-Carlo methods in AI. This unification is beneficial for both approaches. On the one hand, using Monte-Carlo planning permits scaling up the biologically grounded approach of Active Inference to large-scale problems. On the other hand, the theory of Active Inference provides a principled solution to the balance of exploration and exploitation, which is often addressed heuristically in Monte-Carlo methods. Our simulations show that Active Tree Search successfully navigates binary trees that are challenging for sampling-based methods, problems that require adaptive exploration, and the large POMDP problem Rocksample. Furthermore, we illustrate how Active Tree Search can be used to simulate neurophysiological responses (e.g., in the hippocampus and prefrontal cortex) of humans and other animals that contain large planning problems. These simulations show that Active Tree Search is a principled realisation of neuroscientific and AI theories of planning, which offers both biological realism and scalability.
Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost.