Goto

Collaborating Authors

Information Technology: Overviews


Beginners Guide to Naive Bayes Algorithm in Python

#artificialintelligence

Naive Bayes is a classification algorithm that works based on the Bayes theorem. Before explaining about Naive Bayes, first, we should discuss Bayes Theorem. Bayes theorem is used to find the probability of a hypothesis with given evidence. In this, using Bayes theorem we can find the probability of A, given that B occurred. A is the hypothesis and B is the evidence.


Phylodynamics for cell biologists

Science

Advances in experimental approaches for single-cell analysis allow in situ sequencing, genomic barcoding, and mapping of cell lineages within tissues and organisms. Large amounts of data have thus accumulated and present an analytical challenge. Stadler et al. recognized the need for conceptual and computational approaches to fully exploit these technological advances for the understanding of normal and disease states. The authors review ideas taken from phylodynamics of infectious disease and show how similar tree-building techniques can be applied to monitoring changes in somatic cell lineages for applications ranging from development and differentiation to cancer biology. Science , this issue p. [eaah6266][1] ### BACKGROUND The birth, death, and diversification of individuals are events that drive biological processes across all scales. This is true whether the individuals in question represent nucleic acids, cells, whole organisms, populations, or species. The ancestral relationships of individuals can be visualized as branching trees or phylogenies, which are long-established representations in the fields of evolution, ecology, and epidemiology. Molecular phylogenetics is the discipline concerned with the reconstruction of such trees from gene or genome sequence data. The shape and size of such phylogenies depend on the past birth and death processes that generated them, and in phylodynamics, mathematical models are used to infer and quantify the dynamical behavior of biological populations from ancestral relationships. New technological advances in genetics and cell biology have led to a growing body of data about the molecular state and ancestry of individual cells in multicellular organisms. Ideas from phylogenetics and phylodynamics are being applied to these data to investigate many questions in tissue formation and tumorigenesis. ### ADVANCES Trees offer a valuable framework for tracing cell division and change through time, beginning with individual ancestral stem cells or fertilized eggs and resulting in complex tissues, tumors, or whole organisms (see the figure). They also provide the basis for computational and statistical methods with which to analyze data from cell biology. Our Review explains how “tree-thinking” and phylodynamics can be beneficial to the interpretation of empirical data pertaining to the individual cells of multicellular organisms. We summarize some recent research questions in developmental and cancer biology and briefly introduce the new technologies that allow us to observe the spatiotemporal histories of cell division and change. We provide an overview of the various and sometimes confusing ways in which graphical models, based on or represented by trees, have been applied in cell biology. To provide conceptual clarity, we outline four distinct graphical representations of the history of cell division and differentiation in multicellular organisms. We highlight that cells from an organism cannot be always treated as statistically independent observations but instead are often correlated because of phylogenetic history, and we explain how this can cause difficulties when attempting to infer dynamical behavior from experimental single-cell data. We introduce simple ecological null models for cell populations and illustrate some potential pitfalls in hypothesis testing and the need for quantitative phylodynamic models that explicitly incorporate the dependencies caused by shared ancestry. ### OUTLOOK We expect the rapid growth in the number of cell-level phylogenies to continue, a trend enhanced by ongoing technological advances in cell lineage tracing, genomic barcoding, and in situ sequencing. In particular, we anticipate the generation of exciting datasets that combine phenotypic measurements for individual cells (such as through transcriptome sequencing) with high-resolution reconstructions of the ancestry of the sampled cells. These developments will offer new ways to study developmental, oncogenic, and immunological processes but will require new and appropriate conceptual and computational tools. We discuss how models from phylogenetics and phylodynamics will benefit the interpretation of the data sets generated in the foreseeable future and will aid the development of statistical tests that exploit, and are robust to, cell shared ancestry. We hope that our discussion will initiate the integration of cell-level phylodynamic approaches into experimental and theoretical studies of development, cancer, and immunology. We sketch out some of the theoretical advances that will be required to analyze complex spatiotemporal cell dynamics and encourage explorations of these new directions. Powerful new statistical and computational tools are essential if we are to exploit fully the wealth of new experimental data being generated in cell biology. ![Figure][2] Multicellular organisms develop from a single fertilized egg. The division, apoptosis, and differentiation of cells can be displayed in a development tree, with the fertilized egg being the root of the tree. The development of any particular tissue within an organism can be traced as a subtree of the full developmental tree. Subtrees that represent cancer tumors or B cell clones may exhibit rapid growth and genetic change. Here, we illustrate the developmental tree of a human and expand the subtree representing haematopoiesis (blood formation) in the bone marrow. Stem cells in the bone marrow differentiate, giving rise to the numerous blood cell types in humans. The structure of the tree that underlies haematopoiesis and the formation of all tissues is unclear. Phylogenetic and phylodynamic tools can help to describe and statistically explore questions about this cell differentiation process. Multicellular organisms are composed of cells connected by ancestry and descent from progenitor cells. The dynamics of cell birth, death, and inheritance within an organism give rise to the fundamental processes of development, differentiation, and cancer. Technical advances in molecular biology now allow us to study cellular composition, ancestry, and evolution at the resolution of individual cells within an organism or tissue. Here, we take a phylogenetic and phylodynamic approach to single-cell biology. We explain how “tree thinking” is important to the interpretation of the growing body of cell-level data and how ecological null models can benefit statistical hypothesis testing. Experimental progress in cell biology should be accompanied by theoretical developments if we are to exploit fully the dynamical information in single-cell data. [1]: /lookup/doi/10.1126/science.aah6266 [2]: pending:yes


Optimization: A notorious road to Structured Inefficiency and transition to Combinatorial…

#artificialintelligence

Title of the article is very oxymoronic: having an optimization and inefficiencies in the same context. But it is very true looking at the trend and current practices in the logistics industries. In this article, we are going to discuss how current practice of optimization is contributing significant inefficiencies for the organizations. And, how big firms (Like Amazon, Shopify, Uber) are taking an advantage of advancements in the field of combinatorial/mathematical optimization to identify the new opportunities, and winning the competition over thin margin, by creating a little wiggle room for the profit. Few months ago, I was discussing with my friend about an idea of mathematical formulation for a kitchen which can be efficient enough to make 1500 entirely different recipes (not just a vegetables, sauces and cheese on bread or bun), with less than 100 ingredients in inventory, with the use of minimal kitchen appliances.


Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FEL

arXiv.org Artificial Intelligence

Reinforcement learning holds tremendous promise in accelerator controls. The primary goal of this paper is to show how this approach can be utilised on an operational level on accelerator physics problems. Despite the success of model-free reinforcement learning in several domains, sample-efficiency still is a bottle-neck, which might be encompassed by model-based methods. We compare well-suited purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system. We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the asymptotic performance of the model-free method is slightly superior. The model-based algorithm is implemented in a DYNA-style using an uncertainty aware model, and the model-free algorithm is based on tailored deep Q-learning. In both cases, the algorithms were implemented in a way, which presents increased noise robustness as omnipresent in accelerator control problems. Code is released in https://github.com/MathPhysSim/FERMI_RL_Paper.


Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

arXiv.org Machine Learning

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.


DenseHMM: Learning Hidden Markov Models by Learning Dense Representations

arXiv.org Machine Learning

We propose DenseHMM - a modification of Hidden Markov Models (HMMs) that allows to learn dense representations of both the hidden states and the observables. Compared to the standard HMM, transition probabilities are not atomic but composed of these representations via kernelization. Our approach enables constraint-free and gradient-based optimization. We propose two optimization schemes that make use of this: a modification of the Baum-Welch algorithm and a direct co-occurrence optimization. The latter one is highly scalable and comes empirically without loss of performance compared to standard HMMs. We show that the non-linearity of the kernelization is crucial for the expressiveness of the representations. The properties of the DenseHMM like learned co-occurrences and log-likelihoods are studied empirically on synthetic and biomedical datasets.


Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

arXiv.org Artificial Intelligence

Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This paper proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues.


Continual Lifelong Learning in Natural Language Processing: A Survey

arXiv.org Artificial Intelligence

Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP.


Quality-Diversity Optimization: a novel branch of stochastic optimization

arXiv.org Machine Learning

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.


Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

arXiv.org Artificial Intelligence

This article investigates multilingual evidence retrieval and claim verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. A 400 example mixed language English-Romanian dataset is created for cross-lingual transfer learning evaluation. We make code, datasets, and trained models available upon publication.