Goto

Collaborating Authors

 SPE


Why AI Is Still Waiting For Its Ethics Transplant Backchannel

#artificialintelligence

But most of them are lightweight--full of platitudes about "public-private partnerships" and bromides about putting people first. They don't acknowledge the knotty nature of the social dilemmas AI creates, or how tough it will be to untangle them. It takes an unblinking look at a tech industry racing to reshape society along AI lines without any guarantee of reliable and fair results. Scott Rosenberg is an editor at Backchannel. Sign up to get Backchannel's weekly newsletter, and follow us on Facebook, Twitter, and Instagram.


8 Habits of Highly Effective Data Scientists IoT For All

@machinelearnbot

I'm fortunate to have met with some of the pioneers of data science and machine learning early on in my career. Their thoughts shaped my interest in the field and their habits formed my daily routine. The most frequent question I'm asked is some form of, 'How do I build a machine learning or data science career?' It starts with forming some important habits. Here's what works for me and what I've seen build exceptional data scientists.


What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

arXiv.org Artificial Intelligence

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from "order effects" i.e. different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results;specifically, inaccurate topic descriptions and a reduction in the efficacy of text mining classification results. Objective: To provide a method in which distributions generated by LDA are more stable and can be used for further analysis. Method: We use LDADE, a search-based software engineering tool that tunes LDA's parameters using DE (Differential Evolution). LDADE is evaluated on data from a programmer information exchange site (Stackoverflow), title and abstract text of thousands ofSoftware Engineering (SE) papers, and software defect reports from NASA. Results were collected across different implementations of LDA (Python+Scikit-Learn, Scala+Spark); across different platforms (Linux, Macintosh) and for different kinds of LDAs (VEM,or using Gibbs sampling). Results were scored via topic stability and text mining classification accuracy. Results: In all treatments: (i) standard LDA exhibits very large topic instability; (ii) LDADE's tunings dramatically reduce cluster instability; (iii) LDADE also leads to improved performances for supervised as well as unsupervised learning. Conclusion: Due to topic instability, using standard LDA with its "off-the-shelf" settings should now be depreciated. Also, in future, we should require SE papers that use LDA to test and (if needed) mitigate LDA topic instability. Finally, LDADE is a candidate technology for effectively and efficiently reducing that instability.


Off-policy evaluation for slate recommendation

arXiv.org Artificial Intelligence

This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased---these conditions are weaker than prior heuristics for slate evaluation---and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.


A Deep Reinforcement Learning Chatbot

arXiv.org Machine Learning

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.


When to Categorize Continuous Predictor in a Regression Model?

@machinelearnbot

Research fields usually follow the practice of categorizing continuous predictor variables, and they are the same who mostly use ANOVA. They often do it through median splits, the high value above the median and the low values below the median. The way out of this dilemma is to be able to conclude whether to treat an independent variable as categorical or continuous. Data analysts are empowered to find real results which otherwise they might miss, is by knowing when it is appropriate, followed with the understanding of how it will affect the interpretation of parameters. Let's understand and accept the fact that general linear model is not concerned if the predictor you used is continuous or categorical. But you as a data analyst should choose the information you need from the analysis based on the coding of the predictor.


News Article / Advertising Week - New York [ Sep 25 - 29 2017 ]

#artificialintelligence

Jordan Bitterman, CMO of IBM Watson Content & IoT Platform, explores in this seminar the power and promise of the new cognitive era and how it will enable marketers make better decisions, with more confidence and less risk. While AI is expected to create 15 million new jobs over the next 10 years, experts also anticipate 25 million jobs will be replaced by automation in that time period. Peter Spande, CRO of Business Insider, and four other panelists take part in a thoughtful debate about risks and rewards of the technology set to transform our lives, for better or worse. Right now, Artificial Intelligence only has the equivalent of a couple of hundred brain neurons as compared to the 100-billion of our brains. See an insightful discussion lead by Zach Seward, SVP of Product and Executive Editor at Quartz, about the future of A.I. for the advertising industry.


Scaling Text with the Class Affinity Model

arXiv.org Machine Learning

Probabilistic methods for classifying text form a rich tradition in machine learning and natural language processing. For many important problems, however, class prediction is uninteresting because the class is known, and instead the focus shifts to estimating latent quantities related to the text, such as affect or ideology. We focus on one such problem of interest, estimating the ideological positions of 55 Irish legislators in the 1991 D\'ail confidence vote. To solve the D\'ail scaling problem and others like it, we develop a text modeling framework that allows actors to take latent positions on a "gray" spectrum between "black" and "white" polar opposites. We are able to validate results from this model by measuring the influences exhibited by individual words, and we are able to quantify the uncertainty in the scaling estimates by using a sentence-level block bootstrap. Applying our method to the D\'ail debate, we are able to scale the legislators between extreme pro-government and pro-opposition in a way that reveals nuances in their speeches not captured by their votes or party affiliations.


Why open source should drive AI development in Life Sciences - OpenText Blogs

#artificialintelligence

We stand on the verge of a revolution in Life Sciences. Artificial Intelligence (AI) has the power to change everything. It can handle the vast amounts of data being created, continuously learn even as it's exposed to more data and deliver actionable insights for better decision-making. There is little doubt that the next few years are going to bring some incredible developments to the sector. How best do we get there?


Number of AI roles in Britain up 485% since 2014, Indeed reveals - Recruitment International

#artificialintelligence

The number jobs in artificial intelligence (AI) in the UK has risen dramatically in the last three years, according to Indeed. Since 2014, the number of available AI roles in Britain has increased by 485% - representing a significant spike in demand for employees with the appropriate skills for the job. Yet Indeed's data also reveals there are over two times as many AI jobs available than there are suitable applicants, with a ratio of 2.3 roles available per candidate searching in the last quarter. Interest in AI roles has risen more steadily by 178% in the past three and a half years, not quite high enough to meet the fivefold surge in postings. The popularity of software in innovations including smart home devices and customer service chat bots demonstrate how the industry is developing at pace.