Data Mining: Overviews


A Guide to Solving Social Problems with Machine Learning

#artificialintelligence

You sit down to watch a movie and ask Netflix for help. Zoolander 2?") The Netflix recommendation algorithm predicts what movie you'd like by mining data on millions of previous movie-watchers using sophisticated machine learning tools. And then the next day you go to work and every one of your agencies will make hiring decisions with little idea of which candidates would be good workers; community college students will be largely left to their own devices to decide which courses are too hard or too easy for them; and your social service system will implement a reactive rather than preventive approach to homelessness because they don't believe it's possible to forecast which families will wind up on the streets. You'd love to move your city's use of predictive analytics into the 21st century, or at least into the 20th century. You just hired a pair of 24-year-old computer programmers to run your data science team. But should they be the ones to decide which problems are amenable to these tools? Or to decide what success looks like?


Machine Learning: The New AI (The MIT Press Essential Knowledge Series): Ethem Alpaydin: 9780262529518: Amazon.com: Books

#artificialintelligence

This book is an introductory overview of Ethem's detailed text on ML. The text itself has gotten mostly mixed or bad reviews due to a lot of math and algorithms notated without a lot of detailed explanations, however, this is a general reader intro and doesn't go into math, algos in detail, trees, Bayesian logic or even pseudocode, it is more an up to date overview of the field as it exists at this writing. Alpaydin's expensive text, btw, is also available in a very inexpensive Asian edition here on Amazon if you want to brave that difficult book without a lot of investment (Introduction To Machine Learning 3Rd Edition). The present volume is sortof a "ML for Dummies" only updated for the current craze with big data management. There is a lot of history and background that an experienced ML person will find too basic, but as a High School intro or general interested reader intro it is excellent.


2019 - The Year AI Will Move Into The Mainstream

#artificialintelligence

Business leaders have traditionally had a somewhat complicated relationship with technology. Many of them instinctively know that its deployment could be transformative for their business although they lack the deep knowledge required to fully understand how and when to invest. Recent research from Fujitsu suggests that levels of uncertainty around the way businesses should plan for imminent technologically-driven change are so high that business leaders around the world favour a co-ordinated, global approach led by intergovernmental bodies and governments. Whereas I do not think these levels of uncertainty and doubt are going to disappear, I do believe that when it comes to the use of AI, data analytics and data science, 2019 will be the year when we see a sharp increase in its use by organisations of all sizes. Central to the rise of data analytics are open source tools, which I believe are doing more to democratise the field of data science than anything else.


Understanding the Potential of Artificial Intelligence

#artificialintelligence

In 2008, Daniel Hulme started Satalia, a company that uses data science, machine learning, and optimization (making the best use of resources) to build customized platforms that solve tough logistics problems involving products, services, and people. Lately, Hulme has spent a good portion of his time explaining the ins and outs of artificial intelligence to other CEOs. He sees a big information gap at the top of most companies -- yet this is where technology investment decisions are made. Misunderstanding AI, Hulme believes, can mean both overestimating its value and underestimating its impact. Satalia's work is a leading example of what AI is currently good at. Not coincidentally, it is also the commercialization of Hulme's research at University College London (UCL), where he is the director of the business analytics master's degree program. Satalia's clients are household names in the U.K.; they include Tesco, DFS, and the British Broadcasting Corporation. PwC's Global CEO Survey: Providing unique insight into the thinking of corporate leaders around the world, PwC's annual Global CEO Survey covers issues such as the prospects for economic growth, the challenges of building a workforce, the threats facing companies today, and the impact of AI. www.ceosurvey.pwc The increasingly competitive market for AI expertise is both a blessing and a curse for Satalia.


An Overview of Business Problems and Data Science Solutions -- Part 2

#artificialintelligence

There is an important distinction related to data mining. First the difference between mining the data to find patterns and build models, and second using the results of data mining. Data Mining results inform the data mining process itself. Cross-industry standard process for data mining, known as CRISP-DM, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model and breaks the process of data mining into six major process.


Quantum Machine Learning: A look at myths, realities, and future projections

#artificialintelligence

Despite recent advances and press regarding the field, quantum computing is still veiled in mystery and myth, even within the field of data science and technology. Even those within the field of quantum computing and quantum machine learning are still learning the potential for progress and the stark limitations of current systems. However, quantum computing has arrived in its infancy, and many major companies are pouring money into related R&D efforts. D-Wave's system has been commercially available for a couple of years already (albeit at a price tag of $10 million), and other systems have been opened for research purposes and commercial partnerships with quantum machine learning companies. Quantum computing hardware theoretically can take on several different forms, each of which is suited to a different type of machine learning problem.


DataCamp's Data Science And Machine Learning Programs: A Review

#artificialintelligence

One of my favorite places to learn data science is an under-the-radar educational website, DataCamp. DataCamp doesn't get nearly the attention that some of the larger, more well-funded online coding schools get, but, I often find myself on one of their tutorials whenever I'm learning something new related to statistics or machine learning. Over the past few months, I've dedicated at least a few hours a week to learning the underpinnings of automation and, where I find something interesting, to blog about my experience. Unlike almost every other school or tutorial I've encountered, DataCamp has a delightfully distinct and powerful approach to education: every single piece of instruction is paired with a simple example and interactive tutorial. There are no long lectures; there are no complicated diagrams.


Top September Stories: Essential Math for Data Science: Why and How; Machine Learning Cheat Sheets

#artificialintelligence

Here are the most popular posts in KDnuggets in September, based on the number of unique page views (UPV), and social share counts from Facebook, Twitter, and Addthis. Most Shareable (Viral) Blogs Among the top blogs, here are the 5 blogs with the highest ratio of shares/unique views, which suggests that people who read it really liked it. You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo A Winning Game Plan For Building Your Data Science Team, by William Schmarzo What on earth is data science?, by Cassie Kozyrkov Everything You Need to Know About AutoML and Neural Architecture Search, by George Seif The Data Science of "Someone Like You" or Sentiment Analysis of Adele's Songs, by Preetish Panda How many data scientists are there and is there a shortage?, by Gregory Piatetsky Neural Networks and Deep Learning: A Textbook, by Charu Aggarwal 5 Resources to Inspire Your Next Data Science Project, by Conor Dewey Hadoop for Beginners, by Aafreen Dabhoiwala 6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study, by John Sullivan Deep Learning for NLP: An Overview of Recent Trends, by Elvis Saravia (*) Ultimate Guide to Getting Started with TensorFlow, by Brian Zhang (*) How many data scientists are there and is there a shortage?, by Gregory Piatetsky Essential Math for Data Science: 'Why' and'How', by Tirthajyoti Sarkar Journey to Machine Learning - 100 Days of ML Code, by Avik Jain You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo Neural Networks and Deep Learning: A Textbook, by Charu Aggarwal (*) You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo How many data scientists are there and is there a shortage?, by Gregory Piatetsky You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo A Winning Game Plan For Building Your Data Science Team, by William Schmarzo What on earth is data science?, by Cassie Kozyrkov Everything You Need to Know About AutoML and Neural Architecture Search, by George Seif The Data Science of "Someone Like You" or Sentiment Analysis of Adele's Songs, by Preetish Panda You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo What on earth is data science?, by Cassie Kozyrkov


Machine learning and AI – ensuring fairness in smart cities

#artificialintelligence

Digital technologies and AI offer a new wave of opportunities to turn data into actionable insights – creating a balance between social, environmental, and economic opportunities. In 2018, it's safe to say that the Internet, the World Wide Web, and the myriad of technologies derived from their development are all here to stay. With the ceaseless amalgamation of these various innovations, engineers are creating a cyber-physical world where pervasively interconnected objects, things, and processes can potentially unlock a breadth of unprecedented opportunities. However, I should point out that encapsulating the entire medley of possibilities afforded by these technologies is a considerable endeavour requiring a far longer and more comprehensive overview – perhaps in the form of a book, or three – than this article can offer in isolation. More specifically, I'll be focusing on the potential for us to optimally – and transparently – manage and operate city-wide infrastructure.


Towards Differentially Private Truth Discovery for Crowd Sensing Systems

arXiv.org Artificial Intelligence

Nowadays, crowd sensing becomes increasingly more popular due to the ubiquitous usage of mobile devices. However, the quality of such human-generated sensory data varies significantly among different users. To better utilize sensory data, the problem of truth discovery, whose goal is to estimate user quality and infer reliable aggregated results through quality-aware data aggregation, has emerged as a hot topic. Although the existing truth discovery approaches can provide reliable aggregated results, they fail to protect the private information of individual users. Moreover, crowd sensing systems typically involve a large number of participants, making encryption or secure multi-party computation based solutions difficult to deploy. To address these challenges, in this paper, we propose an efficient privacy-preserving truth discovery mechanism with theoretical guarantees of both utility and privacy. The key idea of the proposed mechanism is to perturb data from each user independently and then conduct weighted aggregation among users' perturbed data. The proposed approach is able to assign user weights based on information quality, and thus the aggregated results will not deviate much from the true results even when large noise is added. We adapt local differential privacy definition to this privacy-preserving task and demonstrate the proposed mechanism can satisfy local differential privacy while preserving high aggregation accuracy. We formally quantify utility and privacy trade-off and further verify the claim by experiments on both synthetic data and a real-world crowd sensing system.