It is somewhat surprising that among all the high-flying buzzwords of machine learning, we don't hear much about the one phrase which fuses some of the core concepts of statistical learning, information theory, and natural philosophy into a single three-word-combo. Moreover, it is not just an obscure and pedantic phrase meant for machine learning (ML) Ph.Ds and theoreticians. It has a precise and easily accessible meaning for anyone interested to explore, and a practical pay-off for the practitioners of ML and data science. I am talking about Minimum Description Length. Let's peel the layers off and see how useful it is… We start with (not chronologically) with Reverend Thomas Bayes, who by the way, never published his idea about how to do statistical inference, but was later immortalized by the eponymous theorem.

Tenenbaum, Joshua B., Griffiths, Thomas L.

People routinely make sophisticated causal inferences unconsciously, effortlessly, andfrom very little data - often from just one or a few observations. Weargue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories.

Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measure-theoretic formalisation. This formalisation is obtained through a game-theoretic Ansatz based on modelling the inside and outside of the subject as an extensive-form game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction.

In this paper, we extend the multivalued mapping in the DS theory to a probabilistic one that uses conditional probabilities to express the uncertain associations. In addition, Dempster's rule is used to combine belief update rather than absolute belief to obtain results consistent with Bayes' theorem. The combined belief intervals form probability bounds under two conditional independence assumptions. Our model can be applied to expert systems that contain sets of mutually exclusive and exhaustive hypotheses, which may or may not form hierarchies. I INTRODUCTION Evidence in an expert system is sometimes associated with a group of mutually exclusive hypotheses but says nothing about its constituents. For example, a symptom in CADIAG-2/RHEUh4A (Adlassnig, 1985a)(Adlassnig, 1985b) may be a supportive evidence for rheumatoid arthritis, which consists of two mutually exclusive subclasses: seropositive rheumatoid arthritis and seronegative rheumatoid arthritis. The symptom, however, carries no information in diaerentiating between the two subclasses. Therefore, the representation of ignorance is important for the aggregation of evidence bearing on hypothesis groups.