South America
Dynamic clustering of time series data
Sartório, Victhor S., Fonseca, Thaís C. O.
We propose a new method for clustering multivariate time-series data based on Dynamic Linear Models. Whereas usual time-series clustering methods obtain static membership parameters, our proposal allows each time-series to dynamically change their cluster memberships over time. In this context, a mixture model is assumed for the time series and a flexible Dirichlet evolution for mixture weights allows for smooth membership changes over time. Posterior estimates and predictions can be obtained through Gibbs sampling, but a more efficient method for obtaining point estimates is presented, based on Stochastic Expectation-Maximization and Gradient Descent. Finally, two applications illustrate the usefulness of our proposed model to model both univariate and multivariate time-series: World Bank indicators for the renewable energy consumption of EU nations and the famous Gapminder dataset containing life-expectancy and GDP per capita for various countries.
OPFython: A Python-Inspired Optimum-Path Forest Classifier
de Rosa, Gustavo Henrique, Papa, João Paulo, Falcão, Alexandre Xavier
Machine learning techniques have been paramount throughout the last years, being applied in a wide range of tasks, such as classification, object recognition, person identification, image segmentation, among others. Nevertheless, conventional classification algorithms, e.g., Logistic Regression, Decision Trees, Bayesian classifiers, might lack complexity and diversity, not being suitable when dealing with real-world data. A recent graph-inspired classifier, known as the Optimum-Path Forest, has proven to be a state-of-the-art technique, comparable to Support Vector Machines and even surpassing it in some tasks. In this paper, we propose a Python-based Optimum-Path Forest framework, denoted as OPFython, where all of its functions and classes are based upon the original C language implementation. Additionally, as OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
A random forest based approach for predicting spreads in the primary catastrophe bond market
Makariou, Despoina, Barrieu, Pauline, Chen, Yining
We introduce a random forest approach to enable spreads' prediction in the primary catastrophe bond market. We investigate whether all information provided to investors in the offering circular prior to a new issuance is equally important in predicting its spread. The whole population of non-life catastrophe bonds issued from December 2009 to May 2018 is used. The random forest shows an impressive predictive power on unseen primary catastrophe bond data explaining 93% of the total variability. For comparison, linear regression, our benchmark model, has inferior predictive performance explaining only 47% of the total variability. All details provided in the offering circular are predictive of spread but in a varying degree. The stability of the results is studied. The usage of random forest can speed up investment decisions in the catastrophe bond industry.
CLCNet: Deep learning-based Noise Reduction for Hearing Aids using Complex Linear Coding
Schröter, Hendrik, Rosenkranz, Tobias, B., Alberto N. Escalante, Aubreville, Marc, Maier, Andreas
Noise reduction is an important part of modern hearing aids and is included in most commercially available devices. Deep learning-based state-of-the-art algorithms, however, either do not consider real-time and frequency resolution constrains or result in poor quality under very noisy conditions. To improve monaural speech enhancement in noisy environments, we propose CLCNet, a framework based on complex valued linear coding. First, we define complex linear coding (CLC) motivated by linear predictive coding (LPC) that is applied in the complex frequency domain. Second, we propose a framework that incorporates complex spectrogram input and coefficient output. Third, we define a parametric normalization for complex valued spectrograms that complies with low-latency and on-line processing. Our CLCNet was evaluated on a mixture of the EUROM database and a real-world noise dataset recorded with hearing aids and compared to traditional real-valued Wiener-Filter gains.
Investment In Skills Is Key For Success In The Age Of AI - The Adecco Group
New research from the Global Talent Competitiveness Index 2020 confirms that to succeed in the age of AI, more investment is needed in skills development and lifelong learning. While the emerging markets lag far behind the talent-rich nations, the gap can be bridged with the right set of policies. The currency of the AI-driven economy is talent. But while it is true that talent is high in demand, it is also short in supply. This especially rings true for the economies that fail to attract and build their own talented workforces.
Beyond Artificial Intelligence: Providing Insights to Your Customers
Providing your client with insights, briefly defined as short texts of analytically processed information, is a valuable addition to the services provided by virtually any company. Unfortunately, as engineers, or technicians in general, our training does not address in detail the techniques for writing insights. This short text seeks to serve as a basic guide for future analysts. I introduce the concept of insight and provide advice for the creation of concise and short intelligence pieces. As a senior data analyst, I must do precisely what Ray Dalio, finance magnate, mentions in his December 2019 conversation with Lex Fridman in his podcast "Artificial Intelligence" when asked what role machine learning will play in making decisions and in the analysis: TSC.ai (where I work as a Senior Data Analyst) is a technology company that uses articial intelligence to provide, precisely, intelligence to our customers.
Is NeurIPS Getting Too Big?
NeurIPS 2019, the latest incarnation of the Neural Information Processing Systems conference, wrapped up just over a week ago. Multiple great blog posts have already summarized various talks and key trends, so the goal of this piece is more humble: to reflect on the experience of attending the conference, and in particular whether its vast size is harmful to its purpose as a research conference. Thirteen thousand attendees, 1,428 accepted papers, and 57 workshops vast. This is 9 minutes condensed down to 15 seconds, and this is not even close to all the attendees! Is that a Rolling Stones concert?
COKE: Communication-Censored Kernel Learning for Decentralized Non-parametric Learning
Xu, Ping, Wang, Yue, Chen, Xiang, Zhi, Tian
This paper studies the decentralized optimization and learning problem where multiple interconnected agents aim to learn an optimal decision function defined over a reproducing kernel Hilbert (RKH) space by jointly minimizing a global objective function, with access to locally observed data only. As a non-parametric approach, kernel learning faces a major challenge in distributed implementation: the decision variables of local objective functions are data-dependent with different sizes and thus cannot be optimized under the decentralized consensus framework without any raw data exchange among agents. To circumvent this major challenge and preserve data privacy, we leverage the random feature (RF) approximation approach to map the large-volume data represented in the RKH space into a smaller RF space, which facilitates the same-size parameter exchange and enables distributed agents to reach consensus on the function decided by the parameters in the RF space. For fast convergent implementation, we design an iterative algorithm for Decentralized Kernel Learning via Alternating direction method of multipliers (DKLA). Further, we develop a COmmunication-censored KErnel learning (COKE) algorithm to reduce the communication load in DKLA. To do so, we apply a communication-censoring strategy, which prevents an agent from transmitting at every iteration unless its local updates are deemed informative. Theoretical results in terms of linear convergence guarantee and generalization performance analysis of DKLA and COKE are provided. Comprehensive tests with both synthetic and real datasets are conducted to verify the communication efficiency and learning effectiveness of COKE.
Reinforcement Learning-based Autoscaling of Workflows in the Cloud: A Survey
Garí, Yisel, Monge, David A., Pacini, Elina, Mateos, Cristian, Garino, Carlos García
Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision making problems in complex uncertain environments. Basically, RL proposes a computational approach that allows learning through interaction in an environment of stochastic behavior, with agents taking actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited super-human performance in games like Go or Starcraft 2, which led to its adoption in many other domains including Cloud Computing. Particularly, workflow autoscaling exploits the Cloud elasticity to optimize the execution of workflows according to a given optimization criteria. This is a decision-making problem in which it is necessary to establish when and how to scale-up/down computational resources; and how to assign them to the upcoming processing workload. Such actions have to be taken considering some optimization criteria in the Cloud, a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in Cloud. In this work we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and provide a prospective of future research in the area.
Interpreting Cloud Computer Vision Pain-Points: A Mining Study of Stack Overflow
Cummaudo, Alex, Vasa, Rajesh, Barnett, Scott, Grundy, John, Abdelrazek, Mohamed
Intelligent services are becoming increasingly more pervasive; application developers want to leverage the latest advances in areas such as computer vision to provide new services and products to users, and large technology firms enable this via RESTful APIs. While such APIs promise an easy-to-integrate on-demand machine intelligence, their current design, documentation and developer interface hides much of the underlying machine learning techniques that power them. Such APIs look and feel like conventional APIs but abstract away data-driven probabilistic behaviour - the implications of a developer treating these APIs in the same way as other, traditional cloud services, such as cloud storage, is of concern. The objective of this study is to determine the various pain-points developers face when implementing systems that rely on the most mature of these intelligent services, specifically those that provide computer vision. We use Stack Overflow to mine indications of the frustrations that developers appear to face when using computer vision services, classifying their questions against two recent classification taxonomies (documentation-related and general questions). We find that, unlike mature fields like mobile development, there is a contrast in the types of questions asked by developers. These indicate a shallow understanding of the underlying technology that empower such systems. We discuss several implications of these findings via the lens of learning taxonomies to suggest how the software engineering community can improve these services and comment on the nature by which developers use them.