Education
Efficient Loss-Based Decoding On Graphs For Extreme Classification
Evron, Itay, Moroshko, Edward, Crammer, Koby
In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space, and on a general approach for error correcting output coding (ECOC), and introduce a flexible and efficient approach accompanied by bounds. Our framework employs output codes induced by graphs, and offers a tradeoff between accuracy and model size. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows the superiority of our method compared with state-of-the-art algorithms.
Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition
Dorie, Vincent, Hill, Jennifer, Shalit, Uri, Scott, Marc, Cervone, Dan
Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.
Generalization Properties of Doubly Stochastic Learning Algorithms
Lin, Junhong, Rosasco, Lorenzo
Doubly stochastic learning algorithms are scalable kernel methods that perform very well in practice. However, their generalization properties are not well understood and their analysis is challenging since the corresponding learning sequence may not be in the hypothesis space induced by the kernel. In this paper, we provide an in-depth theoretical analysis for different variants of doubly stochastic learning algorithms within the setting of nonparametric regression in a reproducing kernel Hilbert space and considering the square loss. Particularly, we derive convergence results on the generalization error for the studied algorithms either with or without an explicit penalty term. To the best of our knowledge, the derived results for the unregularized variants are the first of this kind, while the results for the regularized variants improve those in the literature. The novelties in our proof are a sample error bound that requires controlling the trace norm of a cumulative operator, and a refined analysis of bounding initial error.
Penalizing Unfairness in Binary Classification
Bechavod, Yahav, Ligett, Katrina
We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions.
Google Teaching Machine Learning and AI For FREE - Techzim
Google is now offering an Introductory course on Artificial Intelligence(AI) and Machine Learning(ML) for free on its new Learn With AI site. Google hopes the site will be a hub of information for AI and ML. The site is intended to be a resource for everyone from beginners to advanced researchers. Google claims the site "will be a place where one can learn about core ML concepts, develop and hone your Machine learning skills, and apply ML to real-world problems" Google understands people are not familiar with these concepts, including those in fields like app development where AI and ML actually matter. The site contains a free crash course that was initially designed for Google employees.
The great rush to data sciences in India FactorDaily
It's 9 am on a February morning and the mercury is just inching past 20 degrees Celsius in Bengaluru. The workday is already two hours old in the metropolis's densely laid-out eastern suburb of Marathahalli. A student batch of both unemployed and working software professionals at Robotek Minds, a tech training institute, has just finished its data science class. Data science is the new buzzword in the tech industry and the code jocks in the Marathahalli class have a singular focus: a job or a leg-up at one of the shiny information technology campuses dotting the city and housing the world's leading tech corporations. Which, they hope, will be a passport to a comfortable salary that will grow in long strides in the years ahead as the use of data in the world economy explodes.
How to Friends with Statistics and MachineLearning Vinod Sharma's Blog
Does Statistics and ML walk together like partners? Is Machine learning just a polished and shined version of statistics? There are many question such like this. At-least in my mind even today I get these questions why I struggle with these two in different areas of my mind. Is Machine learning a computerised or glorified version of statistics as a matter of fact "NO" (In my personal opinion). To my understanding they both complement each other and work like partners.
Teaching computers to guide science: Machine learning method sees forests and trees: 'Iterative Random Forests' will deliver powerful scientific insights, researchers say
In a paper published recently in the Proceedings of the National Academy of Sciences (PNAS), the researchers describe a technique called "iterative Random Forests," which they say could have a transformative effect on any area of science or engineering with complex systems, including biology, precision medicine, materials science, environmental science, and manufacturing, to name a few. "Take a human cell, for example. There are 10170 possible molecular interactions in a single cell. That creates considerable computing challenges in searching for relationships," said Ben Brown, head of Berkeley Lab's Molecular Ecosystems Biology Department. "Our method enables the identification of interactions of high order at the same computational cost as main effects -- even when those interactions are local with weak marginal effects."
Deploying AI to production: 12 tips from the trenches - SC5
I'm Max, and I work on applied AI here at SC5. As a consultancy, SC5 is expected to provide our clients with services that are not only well-designed and functional, but also capable of scaling and withstanding production load. An application isn't much good unless it works in the real world. Machine learning is, in many ways, a completely different beast than "traditional" software engineering. Machine learning solutions also need to be deployed to production to be of any use, and with that comes a special set of considerations.
Top Chatbot Business Use Cases You Might Not Know (Part 1) – Chatbot Pack
Although the term "Chatbot" is still a buzz word to many people, this revolutionary technology has already been considered "the next big thing" due to massive benefits it can bring to companies across various industries. In this article, we will discuss several chatbot use cases to solidify our claim that businesses should embrace chatbots. According to a research conducted by Chatbots Journal, a leading chatbot community, E-Commerce will benefit the most from chatbots compared to other industries. Successful commerce relies much on B2C (Business-to-Consumer) interaction. Chatbots enrich the relationship between online shops and customers with "conversational commerce", making B2C interaction quicker and friendlier than just a mere business transaction.