Rule induction methods axe classified into two categories, induction of deterministic rules and probabilistic ones(Michalski 1986; Pawlak 1991; Tsumoto and Tanaka 1996). While deterministic rules are supported by positive examples, probabilistic ones are supported by large positive examples and small negative samples. That is, both kinds of rules select positively one decision if a case satisfies their conditional parts. However, domain experts do not use only positive reasoning but also negative reasoning, since a domain is not always deterministic. For example, when a patient does not have a headache, migraine should not be suspected: negative reasoning plays an important role in cutting the search space of a differential diagnosis(Tsumoto and Tanaka 1996). 1 Therefore, negative rules should be induced from databases in order to induce rules which will be easier for domain experts to 1The essential point is that if extracted patterns do not reflect experts' reasoning process, domain experts have difficulties in interpreting them. Without interpretation of domain experts, a discovery procedure would not proceed, which also means that the interaction between human experts and computers is indispensable to computer-assisted discovery.

Betancourt, Brenda, Zanella, Giacomo, Miller, Jeffrey W., Wallach, Hanna, Zaidi, Abbas, Steorts, Rebecca C.

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.

van der Hoeven, Dirk, van Erven, Tim, Kotłowski, Wojciech

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.

In part, the critics of AI are driven by the knowledge that'white collar jobs' are the ones that are now under threat. Business leaders are frequently confronted by notions of job-killing automation and headlines on the variation of the theme that "Robots Will Steal Our Jobs." Elon Musk, CEO of Tesla, Silicon Valley figurehead, and champion of technology-driven innovation even goes a step further by suggesting AI is a fundamental threat to human civilisation. The robot on the assembly line is now a familiar image. AI in middle management is new.

Zanella, Giacomo, Betancourt, Brenda, Wallach, Hanna, Miller, Jeffrey, Zaidi, Abbas, Steorts, Rebecca C.

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.