Representation learning has been proven to play an important role in the unprecedented success of machine learning models in numerous tasks, such as machine translation, face recognition and recommendation. The majority of existing representation learning approaches often require a large number of consistent and noise-free labels. However, due to various reasons such as budget constraints and privacy concerns, labels are very limited in many real-world scenarios. Directly applying standard representation learning approaches on small labeled data sets will easily run into over-fitting problems and lead to sub-optimal solutions. Even worse, in some domains such as education, the limited labels are usually annotated by multiple workers with diverse expertise, which yields noises and inconsistency in such crowdsourcing settings. In this paper, we propose a novel framework which aims to learn effective representations from limited data with crowdsourced labels. Specifically, we design a grouping based deep neural network to learn embeddings from a limited number of training samples and present a Bayesian confidence estimator to capture the inconsistency among crowdsourced labels. Furthermore, to expedite the training process, we develop a hard example selection procedure to adaptively pick up training examples that are misclassified by the model. Extensive experiments conducted on three real-world data sets demonstrate the superiority of our framework on learning representations from limited data with crowdsourced labels, comparing with various state-of-the-art baselines. In addition, we provide a comprehensive analysis on each of the main components of our proposed framework and also introduce the promising results it achieved in our real production to fully understand the proposed framework.
The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Lately there has been a lot of discussion about why deep learning algorithms perform better than we would theoretically suspect. To get insight into this question, it helps to improve our understanding of how learning works. We explore the core problem of generalization and show that long-accepted Occam's razor and parsimony principles are insufficient to ground learning. Instead, we derive and demonstrate a set of relativistic principles that yield clearer insight into the nature and dynamics of learning. We show that concepts of simplicity are fundamentally contingent, that all learning operates relative to an initial guess, and that generalization cannot be measured or strongly inferred, but that it can be expected given enough observation. Using these principles, we reconstruct our understanding in terms of distributed learning systems whose components inherit beliefs and update them. We then apply this perspective to elucidate the nature of some real world inductive processes including deep learning.
The software outlined in this paper, AitiaExplorer, is an exploratory causal analysis tool which uses unsupervised learning for feature selection in order to expedite causal discovery. In this paper the problem space of causality is briefly described and an overview of related research is provided. A problem statement and requirements for the software are outlined. The key requirements in the implementation, the key design decisions and the actual implementation of AitiaExplorer are discussed. Finally, this implementation is evaluated in terms of the problem statement and requirements outlined earlier. It is found that AitiaExplorer meets these requirements and is a useful exploratory causal analysis tool that automatically selects subsets of important features from a dataset and creates causal graph candidates for review based on these features. The software is available at https://github.com/corvideon/aitiaexplorer
Many systems are naturally modeled as Markov Decision Processes (MDPs), combining probabilities and strategic actions. Given a model of a system as an MDP and some logical specification of system behavior, the goal of synthesis is to find a policy that maximizes the probability of achieving this behavior. A popular choice for defining behaviors is Linear Temporal Logic (LTL). Policy synthesis on MDPs for properties specified in LTL has been well studied. LTL, however, is defined over infinite traces, while many properties of interest are inherently finite. Linear Temporal Logic over finite traces (LTLf) has been used to express such properties, but no tools exist to solve policy synthesis for MDP behaviors given finite-trace properties. We present two algorithms for solving this synthesis problem: the first via reduction of LTLf to LTL and the second using native tools for LTLf. We compare the scalability of these two approaches for synthesis and show that the native approach offers better scalability compared to existing automaton generation tools for LTL.
Making decisions freely presupposes that there is some indeterminacy in the environment and in the decision making engine. The former is reflected on the behavioral changes due to communicating: few changes indicate rigid environments; productive changes manifest a moderate indeterminacy, but a large communicating effort with few productive changes characterize a chaotic environment. Hence, communicating, effective decision making and productive behavioral changes are related. The entropy measures the indeterminacy of the environment, and there is an entropy range in which communicating supports effective decision making. This conjecture is referred to here as the The Potential Productivity of Decisions. The computing engine that is causal to decision making should also have some indeterminacy. However, computations performed by standard Turing Machines are predetermined. To overcome this limitation an entropic mode of computing that is called here Relational-Indeterminate is presented. Its implementation in a table format has been used to model an associative memory. The present theory and experiment suggest the Entropy Trade-off: There is an entropy range in which computing is effective but if the entropy is too low computations are too rigid and if it is too high computations are unfeasible. The entropy trade-off of computing engines corresponds to the potential productivity of decisions of the environment. The theory is referred to an Interaction-Oriented Cognitive Architecture. Memory, perception, action and thought involve a level of indeterminacy and decision making may be free in such degree. The overall theory supports an ecological view of rationality. The entropy of the brain has been measured in neuroscience studies and the present theory supports that the brain is an entropic machine. The paper is concluded with a number of predictions that may be tested empirically.
Causal effect identification considers whether an interventional probability distribution can be uniquely determined from a passively observed distribution in a given causal structure. If the generating system induces context-specific independence (CSI) relations, the existing identification procedures and criteria based on do-calculus are inherently incomplete. We show that deciding causal effect non-identifiability is NP-hard in the presence of CSIs. Motivated by this, we design a calculus and an automated search procedure for identifying causal effects in the presence of CSIs. The approach is provably sound and it includes standard do-calculus as a special case. With the approach we can obtain identifying formulas that were unobtainable previously, and demonstrate that a small number of CSI-relations may be sufficient to turn a previously non-identifiable instance to identifiable.
Predicting undesirable events during the execution of a business process instance provides the process participants with an opportunity to intervene and keep the process aligned with its goals. Few approaches for tackling this challenge consider a multi-perspective view, where the flow perspective of the process is combined with its surrounding context. Given the many sources of data in today's world, context can vary widely and have various meanings. This paper addresses the issue of context being cause or effect of the next event and its impact on next event prediction. We leverage previous work on probabilistic models to develop a Dynamic Bayesian Network technique. Probabilistic models are considered comprehensible and they allow the end-user and his or her understanding of the domain to be involved in the prediction. Our technique models context attributes that have either a cause or effect relationship towards the event. We evaluate our technique with two real-life data sets and benchmark it with other techniques from the field of predictive process monitoring. The results show that our solution achieves superior prediction results if context information is correctly introduced into the model.
Supply and demand are two fundamental concepts of sellers and customers. Predicting demand accurately is critical for organizations in order to be able to make plans. In this paper, we propose a new approach for demand prediction on an e-commerce web site. The proposed model differs from earlier models in several ways. The business model used in the e-commerce web site, for which the model is implemented, includes many sellers that sell the same product at the same time at different prices where the company operates a market place model. The demand prediction for such a model should consider the price of the same product sold by competing sellers along the features of these sellers. In this study we first applied different regression algorithms for specific set of products of one department of a company that is one of the most popular online e-commerce companies in Turkey. Then we used stacked generalization or also known as stacking ensemble learning to predict demand. Finally, all the approaches are evaluated on a real world data set obtained from the e-commerce company. The experimental results show that some of the machine learning methods do produce almost as good results as the stacked generalization method.
Baker, Antoine, Biazzo, Indaco, Braunstein, Alfredo, Catania, Giovanni, Dall'Asta, Luca, Ingrosso, Alessandro, Krzakala, Florent, Mazza, Fabio, Mézard, Marc, Muntoni, Anna Paola, Refinetti, Maria, Mannelli, Stefano Sarao, Zdeborová, Lenka
Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing their performance and assessing their impact on the mitigation of the epidemic. We develop Bayesian inference methods to estimate the risk that an individual is infected. This inference is based on the list of his recent contacts and their own risk levels, as well as personal information such as results of tests or presence of syndromes. We propose to use probabilistic risk estimation in order to optimize testing and quarantining strategies for the control of an epidemic. Our results show that in some range of epidemic spreading (typically when the manual tracing of all contacts of infected people becomes practically impossible, but before the fraction of infected people reaches the scale where a lockdown becomes unavoidable), this inference of individuals at risk could be an efficient way to mitigate the epidemic. Our approaches translate into fully distributed algorithms that only require communication between individuals who have recently been in contact. Such communication may be encrypted and anonymized and thus compatible with privacy preserving standards. We conclude that probabilistic risk estimation is capable to enhance performance of digital contact tracing and should be considered in the currently developed mobile applications. Identifying, calling, testing, and if needed quarantining the recent contacts of an individual who has just been tested positive is the standard route for limiting the transmission of a highly contagious virus.