Victoria
Artificial intelligence could predict El Niño up to 18 months in advance
The dreaded El Niño strikes the globe every 2 to 7 years. As warm waters in the tropical Pacific Ocean shift eastward and trade winds weaken, the weather pattern ripples through the atmosphere, causing drought in southern Africa, wildfires in South America, and flooding on North America's Pacific coast. Climate scientists have struggled to predict El Niño events more than 1 year in advance, but artificial intelligence (AI) can now extend forecasts to 18 months, according to a new study. The work could help people in threatened regions better prepare for droughts and floods, for example by choosing which crops to plant, says William Hsieh, a retired climate scientist in Victoria, Canada, who worked on early El Niño forecasts but who was not involved in the current study. Longer forecasts could have "large economic benefits," he says.
Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz '96 Model
Gagne, David John II, Christensen, Hannah M., Subramanian, Aneesh C., Monahan, Adam H.
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, including neural networks, are able to represent a wide range of distributions and build optimized mappings between a large number of inputs and sub-grid forcings. Recent research on machine learning parameterizations has focused only on deterministic parameterizations. In this study, we develop a stochastic parameterization using the generative adversarial network (GAN) machine learning framework. The GAN stochastic parameterization is trained and evaluated on output from the Lorenz '96 model, which is a common baseline model for evaluating both parameterization and data assimilation techniques. We evaluate different ways of characterizing the input noise for the model and perform model runs with the GAN parameterization at weather and climate timescales. Some of the GAN configurations perform better than a baseline bespoke parameterization at both timescales, and the networks closely reproduce the spatio-temporal correlations and regimes of the Lorenz '96 system. We also find that in general those models which produce skillful forecasts are also associated with the best climate simulations.
Orometric Methods in Bounded Metric Data
Stubbemann, Maximilian, Hanika, Tom, Stumme, Gerd
A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions 'isolation' and 'prominence'. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of 'isolation' and 'prominence' empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.
Learning Patterns of Assonance for Authorship Attribution of Historical Texts
Ivanov, Lubomir (Iona College)
This paper deals with extracting and learning patterns of assonance as a stylistic feature for author attribution of historical texts. We describe an assonance extraction algorithm, and consider results from an extensive set of machine learning experiments, based on a historical corpus of 18th century American and British texts. The results are compared with those obtained from the use of other prosodic and traditional stylistic features.
Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances
Wager, Sanna, Tzanetakis, George, Wang, Cheng-i, Guo, Lijiang, Sivaraman, Aswin, Kim, Minje
We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance, popularly referred to as autotuning.
Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies
Xu, Yan, Xing, Li, Su, Jessica, Zhang, Xuekui, Qiu, Weiliang
Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperform traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.
Machine Learning and Deep Learning Algorithms for Bearing Fault Diagnostics - A Comprehensive Review
Zhang, Shen, Zhang, Shibo, Wang, Bingnan, Habetler, Thomas G.
In this survey paper, we systematically summarize the current literature on studies that apply machine learning (ML) and data mining techniques to bearing fault diagnostics. Conventional ML methods, including artificial neural network (ANN), principal component analysis (PCA), support vector machines (SVM), etc., have been successfully applied to detecting and categorizing bearing faults since the last decade, while the application of deep learning (DL) methods has sparked great interest in both the industry and academia in the last five years. In this paper, we will first review the conventional ML methods, before taking a deep dive into the latest developments in DL algorithms for bearing fault applications. Specifically, the superiority of the DL based methods over the conventional ML methods are analyzed in terms of metrics directly related to fault feature extraction and classifier performances; the new functionalities offered by DL techniques that cannot be accomplished before are also summarized. In addition, to obtain a more intuitive insight, a comparative study is performed on the classifier performance and accuracy for a number of papers utilizing the open source Case Western Reserve University (CWRU) bearing data set. Finally, based on the nature of the time-series 1-D data obtained from sensors monitoring the bearing conditions, recommendations and suggestions are provided to applying DL algorithms on bearing fault diagnostics based on specific applications, as well as future research directions to further improve its performance.
Iterative Refinement for $\ell_p$-norm Regression
Adil, Deeksha, Kyng, Rasmus, Peng, Richard, Sachdeva, Sushant
We give improved algorithms for the $\ell_{p}$-regression problem, $\min_{x} \|x\|_{p}$ such that $A x=b,$ for all $p \in (1,2) \cup (2,\infty).$ Our algorithms obtain a high accuracy solution in $\tilde{O}_{p}(m^{\frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{1}{3}})$ iterations, where each iteration requires solving an $m \times m$ linear system, $m$ being the dimension of the ambient space. By maintaining an approximate inverse of the linear systems that we solve in each iteration, we give algorithms for solving $\ell_{p}$-regression to $1 / \text{poly}(n)$ accuracy that run in time $\tilde{O}_p(m^{\max\{\omega, 7/3\}}),$ where $\omega$ is the matrix multiplication constant. For the current best value of $\omega > 2.37$, we can thus solve $\ell_{p}$ regression as fast as $\ell_{2}$ regression, for all constant $p$ bounded away from $1.$ Our algorithms can be combined with fast graph Laplacian linear equation solvers to give minimum $\ell_{p}$-norm flow / voltage solutions to $1 / \text{poly}(n)$ accuracy on an undirected graph with $m$ edges in $\tilde{O}_{p}(m^{1 + \frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{4}{3}})$ time. For sparse graphs and for matrices with similar dimensions, our iteration counts and running times improve on the $p$-norm regression algorithm by [Bubeck-Cohen-Lee-Li STOC`18] and general-purpose convex optimization algorithms. At the core of our algorithms is an iterative refinement scheme for $\ell_{p}$-norms, using the smoothed $\ell_{p}$-norms introduced in the work of Bubeck et al. Given an initial solution, we construct a problem that seeks to minimize a quadratically-smoothed $\ell_{p}$ norm over a subspace, such that a crude solution to this problem allows us to improve the initial solution by a constant factor, leading to algorithms with fast convergence.
Adversarial Examples: Opportunities and Challenges
Zhang, Jiliang, Jiang, Xiaoxiong
Abstract--With the advent of the era of artificial intelligence (AI), deep neural networks (DNNs) have shown huge superiority over human in image recognition, speech processing, autonomous vehicles and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs) which are designed by attackers to fool deep learning models. Different from real examples, AEs can hardly be distinguished from human eyes, but mislead the model to predict incorrect outputs and therefore threaten security critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of AI security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristic and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that we review the existing defenses and discuss their limitations. Finally, the future research opportunities and challenges of AEs are prospected. In the era of AI, DNNs have shown great advantages in autonomous vehicles, robotics, network security, image/speech recognition and natural language processing (NLP). For example, in 2017, an intelligent robot with the superior face recognition ability, named XiaoDu developed by Baidu, defeated a representative from the team of humans strongest brain with the score of 3:2 [1]. On October 19th, 2017, the DeepMind team of Google released the AlphaGo Zero, which shocked the world. Compared with the previous AlphaGo, AlphaGo Zero relies on reinforcement learning without any priori knowledge to grow chess skills and finally beats every human competitor [2]. For AI research, the United States received huge support from the government, such as the Federal Research Fund. In October 2016, the United States issued the project of Preparing for the Future of Artificial Intelligence and the National Artificial Intelligence Research and Development Strategic Plan, which raised AI to the national strategic level and formulated ambitious blueprints [3], [4]. Manuscript received xxx; revised xx; accepted xxx. This work is supported by the National Natural Science Foundation of China (Grant NOs. J. Zhang and X. Jiang are with the College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China (email: zhangjiliang@hnu.edu.cn). In the same year, AI was written into the nineteenth National Congress report, which pushed the development of AI industries to a new height and filled the gap in the top-level strategy of AI development [5].
Governing autonomous vehicles: emerging responses for safety, liability, privacy, cybersecurity, and industry risks
Taeihagh, Araz, Lim, Hazel Si Min
The benefits of autonomous vehicles (AVs) are widely acknowledged, but there are concerns about the extent of these benefits and AV risks and unintended consequences. In this article, we first examine AVs and different categories of the technological risks associated with them. We then explore strategies that can be adopted to address these risks, and explore emerging responses by governments for addressing AV risks. Our analyses reveal that, thus far, governments have in most instances avoided stringent measures in order to promote AV developments and the majority of responses are non-binding and focus on creating councils or working groups to better explore AV implications. The US has been active in introducing legislations to address issues related to privacy and cybersecurity. The UK and Germany, in particular, have enacted laws to address liability issues; other countries mostly acknowledge these issues, but have yet to implement specific strategies. To address privacy and cybersecurity risks strategies ranging from introduction or amendment of non-AV specific legislation to creating working groups have been adopted. Much less attention has been paid to issues such as environmental and employment risks, although a few governments have begun programmes to retrain workers who might be negatively affected.