#artificialintelligence
AI will Take your Jobs
I've been spending a lot of time on Quora, where you can ask real questions to amateur experts all around the world. I'm a big believer that we are on the cusp of an age of automation. Algorithms influencing how we use the web and access content online, is just the beginning. I'm not an expert, but I do think about the future a lot. As an amateur futurist, it's my responsibility to point out the obvious to people that might not be thinking about technology and its impact upon society 24/7 like I do.
Course – Cognitive technologies: The real opportunities for business
Artificial intelligence (AI) may sound like science fiction, but it is real, and becoming increasingly important to companies in every sector. The field of artificial intelligence has produced a wide variety of "cognitive technologies" that simulate human reasoning and perceptual skills, giving businesses entirely new capabilities and enabling organizations to break prevailing tradeoffs between speed, cost, and quality. Aimed at a general business audience, this course demystifies artificial intelligence, provides an overview of a wide range of cognitive technologies, and offers a framework to help you understand their business implications. Some experts have called artificial intelligence "more important than anything since the industrial revolution." That makes this course essential for professionals working in business, operations, strategy, IT, and other disciplines.
MetaMind Competes with IBM Watson Analytics and Microsoft Azure Machine Learning
Last month I wrote an article describing the interfaces and capabilities of Microsoft and IBM's new cloud data science products. I observed that Azure ML presents a user-friendly drag and drop data mining app for businesses, while Watson Analytics focuses on natural language queries but is still too nascent for use. A similar query for "IBM Watson Analytics" turns up 730,000 documents. Amid the deluge of coverage on both services, one could lose sight of the many upstart companies offering cloud machine learning services. However, new product categories are typically pioneered by startups.
Microsoft Azure Machine Learning Algorithm Cheat Sheet
Azure Machine Learning Studio comes with a large number of machine learning algorithms that you can use to build your predictive analytics solutions. These algorithms fall into the general machine learning categories of regression, classification, clustering, and anomaly detection, and each one is designed to address a different type of machine learning problem. The question is, is there something that can help me quickly figure out how to choose a machine learning algorithm for my specific solution? The Microsoft Azure Machine Learning Algorithm Cheat Sheet is designed to help you sift through the available machine learning algorithms and choose the appropriate one to use for your predictive analytics solution. The cheat sheet asks you questions about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.
R Users Will Now Inevitably Become Bayesians
There are several reasons why everyone isn't using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can't assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling -- the bedrock of practical Bayesian modeling -- can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm.
Machine Learning Algorithm Identifies Tweets Sent Under the Influence of Alcohol
Interesting article posted recently in MIT Technology Reviews. What kind of metrics would help detect such tweets? Which algorithm would you use? I am thinking about tweets indexation, The same NLP (natural language processing) technique can be applied to email messages and other texts produced by users, maybe even to detect if a piece of code was written when the programmer was drunk.This would require the use of a training set, to train the algorithm. But no matter the ML algorithm used, you will need to work with a training set anyway.
The machine learning problem of the next decade
A few months ago, my company, CrowdFlower, ran a machine learning competition on Kaggle. It perfectly highlighted the biggest opportunity (and challenge) with machine learning: What do you do with an 80% accurate algorithm? We uploaded data collected on our platform and Kaggle sent it out to over 1,000 data scientists, who competed to see who could build the best search model. The simplest approach gave a baseline accuracy of 32%. By the next morning, one team already had a 53% accurate model.
Lei Liu is dreaming big at HP Labs
When HP Labs research scientist Lei Liu was a child in XianYang, China, he read a newspaper article detailing how HP originated in a garage in Palo Alto. "That inspired me," he recalls. "Silicon Valley was clearly somewhere where you could have a dream, incubate it, and see it come true." Today, Lei is living that dream as a member of HP's Print and 3D Lab. After studying for his B.S. and M.S. in computer science at the Beijing University of Posts and Telecommunications, he moved to Michigan State University where he received his Ph.D. in Computer Science and Engineering, focusing on data mining and machine learning.
What are effective preprocessing methods for reducing data set size (e.g., removing records) without losing information for machine learning problems?
Sometimes the simplest methods are best... Random sampling is easy to understand, hard to screw up, and unlikely to introduce bias into your process. Building a training pipeline using a random sample (without replacement) of your dataset is a good way to work faster. Once you have a pipeline you're satisfied with, you can then run it again over your entire dataset to estimate the gain in performance from using the entire dataset. If your training pipeline is robust, your results should not change too much, and although your performance might rise, it will tend to do so very slowly as you add more data. The basic intuition here is that the strongest signals in your data will show up even with relatively small samples of the data, almost by definition (if they didn't, they wouldn't be strong!).
Sour grapes at Facebook over Google's AI victory
Just a few months ago, the social network thought that its AI experts were on the cusp of a breakthrough, making a computer that could play Go faster than any previous machine. Then Google came along and blew them out of the water, revealing first that it had built a Go computer capable of defeating a professional human player, and then going on to beat Lee Sedol, the greatest player of the last decade, 4-1 over the course of a week. Facebook has already tried to spoil Google's thunder once, with Mark Zuckerberg releasing a coincidentally timed statement on the company's Go progress just one day before Google announced its victory over the European champion Fan Hui (and one day after Google had already revealed to the press that the victory had occurred). Zuckerberg himself has been more conciliatory this time round, posting after a message of congratulations after AlphaGo's third victory in a row: "Congrats to the Google DeepMind team on this historic milestone in AI research – a third straight victory over Go grandmaster Lee Sedol. We live in exciting times."