Goto

Collaborating Authors

 South America


Facebook rolled out a chatbot to advise employees on how to answer questions about its controversies

Daily Mail - Science & tech

Consultancy firm Cambridge Analytica had offices in London, New York, Washington, as well as Brazil and Malaysia. The company boasted it can'find your voters and move them to action' through data-driven campaigns and a team that includes data scientists and behavioural psychologists. In 2013, Cambridge professor Aleksandr Kogan used his app, This Is Your Digital Life, to ask 270,000 Facebook users questions about their personalities. By answering them, the users granted Kogan access to not only their profiles but to those of their friends. He subsequently sold that information to Cambridge Analytica for $51million.


The intriguing role of module criticality in the generalization of deep networks

arXiv.org Machine Learning

We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so. 1 Introduction Neural networks have had tremendous practical impact in various domains such as revolutionizing many tasks in computer vision, speech and natural language processing. However, many aspects of their design and analysis have remained mysterious to this date. One of the most important questions is "what makes an architecture work better than others given a specific task?" Extensive research in this area has led to many potential explanations on why some types of architectures have better performance; however, we lack a unified view that provides a complete and satisfactory answer. In order to attain a unified view on superiority of one architecture over another in terms of generalization performance, we need to come up with a measure that effectively captures this. Analyzing the generalization behavior of neural networks has been an active area of research since Baum and Haussler (1989). Many generalization bounds and complexity measures have been proposed so far. Bartlett (1998) emphasized on the norm of the weights in predicting the generalization error.


Direct Mappings between RDF and Property Graph Databases

arXiv.org Artificial Intelligence

RDF [21] and Graph databases [27] are two approaches for data management that are based on modeling, storing and querying graph-like data. The database systems based on these models are gaining relevance in the industry due to their use in various application domains where complex data analytics is required [2]. RDF triplestores and graph database systems are tightly connected as they are based on graph data models. RDF databases are based on the RDF data model [21], their standard query language is SPARQL [15], and RDF Schema [8] allows to describe classes of resources and properties (i.e. the data schema). On the other hand, most graph databases are based on the Property Graph (PG) data model, there is no standard query language, and there is no standard notion of property graph schema [25]. Therefore, RDF and PG database systems are dissimilar in data model, schema constraints and query language.


Extreme Learning Machine design for dealing with unrepresentative features

arXiv.org Machine Learning

Extreme Learning Machines (ELMs) have become a popular tool in the field of Artificial Intelligence due to their very high training speed and generalization capabilities. Another advantage is that they have a single hyper-parameter that must be tuned up: the number of hidden nodes. Most traditional approaches dictate that this parameter should be chosen smaller than the number of available training samples in order to avoid over-fitting. In fact, it has been proved that choosing the number of hidden nodes equal to the number of training samples yields a perfect training classification with probability 1 (w.r.t. the random parameter initialization). In this article we argue that in spite of this, in some cases it may be beneficial to choose a much larger number of hidden nodes, depending on certain properties of the data. We explain why this happens and show some examples to illustrate how the model behaves. In addition, we present a pruning algorithm to cope with the additional computational burden associated to the enlarged ELM. Experimental results using electroencephalography (EEG) signals show an improvement in performance with respect to traditional ELM approaches, while diminishing the extra computing time associated to the use of large architectures.


Binarized Canonical Polyadic Decomposition for Knowledge Graph Completion

arXiv.org Machine Learning

Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storage-inefficient representations, particularly when the number of entities and relations, and the dimensionality of the real-valued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algorithm, which we refer to as B-CP, where real-valued parameters are replaced by binary values to reduce model size. Moreover, we show that a fast score computation technique can be developed with bitwise operations. We prove that B-CP is fully expressive by deriving a bound on the size of its embeddings. Experimental results on several benchmark datasets demonstrate that the proposed method successfully reduces model size by more than an order of magnitude while maintaining task performance at the same level as the real-valued CP model.


Enhancing Stratospheric Weather Analyses and Forecasts by Deploying Sensors from a Weather Balloon

arXiv.org Machine Learning

The ability to analyze and forecast stratospheric weather conditions is fundamental to addressing climate change. However, our capacity to collect data in the stratosphere is limited by sparsely deployed weather balloons. We propose a framework to collect stratospheric data by releasing a contrail of tiny sensor devices as a weather balloon ascends. The key machine learning challenges are determining when and how to deploy a finite collection of sensors to produce a useful data set. We decide when to release sensors by modeling the deviation of a forecast from actual stratospheric conditions as a Gaussian process. We then implement a novel hardware system that is capable of optimally releasing sensors from a rising weather balloon. We show that this data engineering framework is effective through real weather balloon flights, as well as simulations.


Exact asymptotics for phase retrieval and compressed sensing with random generative priors

arXiv.org Machine Learning

We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable.


A probability theoretic approach to drifting data in continuous time domains

arXiv.org Machine Learning

December 5, 2019 Abstract The notion of drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time. Albeit many attempts were made to deal with drift, formal notions of drift are application-dependent and formulated in various degrees of abstraction and mathematical coherence. In this contribution, we provide a probability theoretical framework, that allows a formalization of drift in continuous time, which subsumes popular notions of drift. It gives rise to a new characterization of drift in terms of stochastic dependency between data and time. This particularly intuitive formalization enables us to design a new, efficient drift detection method. Further, it induces a technology, to decompose observed data into a drifting and a non-drifting part. Keywords: Online learning, learning theory, stochastic processes, learning with drift, continuous time models, drift decomposition 1 INTRODUCTION One fundamental assumption in classical machine learning is the fact that observed data are i.i.d. Yet, this assumption is often violated as soon as machine learning faces real world problems: models are subject to seasonal changes, changed demands of individual costumers, ageing of sensors, etc. In such settings, lifelong model adaptation rather than classical batch learning is required for optimum performance. Since drift, i.e. the fact that data is no longer identically distributed, is a major issue in many real-world applications of machine learning, many attempts were made to deal with this setting (Ditzler et al., 2015). Depending on the domain of data and application, the presence of drift is modelled in different ways. As an example, covariate shift refers to the situation of training and test set having different marginal distributions (Gretton et al., 2009). Learning for data streams extends this setting to an unlimited (but usually countable) stream of observed data, mostly in supervised learning scenarios (Gama et al., 2014). Learning technologies for such situations often rely on windowing techniques, and adapt the model based on the characteristics of the data in an observed time window. Active methods explicitly detect drift, usually referring to drift of the classification error, and trigger model adaptation this way, while passive methods continuously adjust the model (Ditzler et al., 2015).


Machine Learning & Python Tryolabs Blog

#artificialintelligence

From 11-15 November 2019, the most important event in the history of artificial intelligence (AI) in Latin America took place in Montevideo, our hometown. Khipu.ai was the first of hopefully many events to come, in which top researchers from all over the world, both from academy and industry, came to the region to share knowledge and promote much needed diversity in the field, pushing Latin America forward. As proud supporters of Khipu, we from Tryolabs got to attend the awesome talks and trainings provided during the event, as well as contributed with own talks and a booth featuring four demos of pose estimation models running in real time on different embedded devices.


New AI Biotech Company Zenerchi Announces Partnership with Pop Life Global

#artificialintelligence

China based Pop Life will work with the developers of Zenerchi in the advancement of consumer and business applications of its physiological simulation and visualization AI platform. This new realm of technology will empower a host of wellness and medical uses that span entertainment, education and exhibition phases. The Zenerchi technology platform has the potential to propel each of these sectors into a new realm of visualization, leveraging and converging the most advanced capabilities of simulation including GUI/UX, AI, VR, AR and 3D technology. These outcomes provide new usages and an entirely new paradigm of technology. Beginning October 2020, the partnership plans to unveil its first public facing technology implementation by producing a series of immersive edutainment exhibition experiences throughout the world.