Here are the most popular posts in KDnuggets in September, based on the number of unique page views (UPV), and social share counts from Facebook, Twitter, and Addthis. Most Shareable (Viral) Blogs Among the top blogs, here are the 5 blogs with the highest ratio of shares/unique views, which suggests that people who read it really liked it. You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo A Winning Game Plan For Building Your Data Science Team, by William Schmarzo What on earth is data science?, by Cassie Kozyrkov Everything You Need to Know About AutoML and Neural Architecture Search, by George Seif The Data Science of "Someone Like You" or Sentiment Analysis of Adele's Songs, by Preetish Panda How many data scientists are there and is there a shortage?, by Gregory Piatetsky Neural Networks and Deep Learning: A Textbook, by Charu Aggarwal 5 Resources to Inspire Your Next Data Science Project, by Conor Dewey Hadoop for Beginners, by Aafreen Dabhoiwala 6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study, by John Sullivan Deep Learning for NLP: An Overview of Recent Trends, by Elvis Saravia (*) Ultimate Guide to Getting Started with TensorFlow, by Brian Zhang (*) How many data scientists are there and is there a shortage?, by Gregory Piatetsky Essential Math for Data Science: 'Why' and'How', by Tirthajyoti Sarkar Journey to Machine Learning - 100 Days of ML Code, by Avik Jain You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo Neural Networks and Deep Learning: A Textbook, by Charu Aggarwal (*) You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo How many data scientists are there and is there a shortage?, by Gregory Piatetsky You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo A Winning Game Plan For Building Your Data Science Team, by William Schmarzo What on earth is data science?, by Cassie Kozyrkov Everything You Need to Know About AutoML and Neural Architecture Search, by George Seif The Data Science of "Someone Like You" or Sentiment Analysis of Adele's Songs, by Preetish Panda You Aren't So Smart: Cognitive Biases are Making Sure of It, by Matthew Mayo What on earth is data science?, by Cassie Kozyrkov
Digital technologies and AI offer a new wave of opportunities to turn data into actionable insights – creating a balance between social, environmental, and economic opportunities. In 2018, it's safe to say that the Internet, the World Wide Web, and the myriad of technologies derived from their development are all here to stay. With the ceaseless amalgamation of these various innovations, engineers are creating a cyber-physical world where pervasively interconnected objects, things, and processes can potentially unlock a breadth of unprecedented opportunities. However, I should point out that encapsulating the entire medley of possibilities afforded by these technologies is a considerable endeavour requiring a far longer and more comprehensive overview – perhaps in the form of a book, or three – than this article can offer in isolation. More specifically, I'll be focusing on the potential for us to optimally – and transparently – manage and operate city-wide infrastructure.
Nowadays, crowd sensing becomes increasingly more popular due to the ubiquitous usage of mobile devices. However, the quality of such human-generated sensory data varies significantly among different users. To better utilize sensory data, the problem of truth discovery, whose goal is to estimate user quality and infer reliable aggregated results through quality-aware data aggregation, has emerged as a hot topic. Although the existing truth discovery approaches can provide reliable aggregated results, they fail to protect the private information of individual users. Moreover, crowd sensing systems typically involve a large number of participants, making encryption or secure multi-party computation based solutions difficult to deploy. To address these challenges, in this paper, we propose an efficient privacy-preserving truth discovery mechanism with theoretical guarantees of both utility and privacy. The key idea of the proposed mechanism is to perturb data from each user independently and then conduct weighted aggregation among users' perturbed data. The proposed approach is able to assign user weights based on information quality, and thus the aggregated results will not deviate much from the true results even when large noise is added. We adapt local differential privacy definition to this privacy-preserving task and demonstrate the proposed mechanism can satisfy local differential privacy while preserving high aggregation accuracy. We formally quantify utility and privacy trade-off and further verify the claim by experiments on both synthetic data and a real-world crowd sensing system.
GIS is virtual world, a world that is represented by points, polygon, line and graph. Processing of these datasets has always been a challenge since the day GIS got established as a field. Processing of huge data has always been a long standing problem not only in traditional Information and Technology(IT) sectors but also in the Geo-Spatial domain. However recent development in the both hardware and software infrastructure has enabled processing of huge data sets. This has given big push and new direction to those industries which were marred by slow data processing capabilities.
Artificial Intelligence and Machine Learning are the hottest jobs in the industry right now. For instance, did you know that more than 50,000 positions related to Data and Analytics are currently vacant in India? We are excited to release a comprehensive report together with Great Learning on how AI, ML and Big Data are changing and evolving the world around us. Additionally, this report aims to provide an overview of the kind of career opportunities available in these fields right now, and the different roles we might see in the future. The aim behind creating this report is to provide our Data Science community with the context of changes happening at a macro level, and how they can best prepare for these upcoming changes.
Detecting anomalous activity in human mobility data has a number of applications including road hazard sensing, telematic based insurance, and fraud detection in taxi services and ride sharing. In this paper we address two challenges that arise in the study of anomalous human trajectories: 1) a lack of ground truth data on what defines an anomaly and 2) the dependence of existing methods on significant pre-processing and feature engineering. While generative adversarial networks seem like a natural fit for addressing these challenges, we find that existing GAN based anomaly detection algorithms perform poorly due to their inability to handle multimodal patterns. For this purpose we introduce an infinite Gaussian mixture model coupled with (bi-directional) generative adversarial networks, IGMM-GAN, that is able to generate synthetic, yet realistic, human mobility data and simultaneously facilitates multimodal anomaly detection. Through estimation of a generative probability density on the space of human trajectories, we are able to generate realistic synthetic datasets that can be used to benchmark existing anomaly detection methods. The estimated multimodal density also allows for a natural definition of outlier that we use for detecting anomalous trajectories. We illustrate our methodology and its improvement over existing GAN anomaly detection on several human mobility datasets, along with MNIST.
Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the "Causal Roadmap", a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.
Sessions are subject to change. Day & Time: Wednesday, 11:00 AM – 11:45 AM, Grand Oaks A&B Presenter: Jim Breen, Doug Foster Need to get your team trained on InterSystems products quickly? Attend this session to learn how you can get your employees up to speed and add value to your company – fast! Hear how other InterSystems' clients have created successful teams using Learning Services content as one piece of the puzzle, and how you can too! Takeaway: InterSystems Learning Services can help me quickly onboard new employees and grow the skill sets of existing employees. Day & Time: Monday, 2:00 PM – 2:45 PM, Grand Oaks E&F Tuesday, 2:00 PM – 2:45 PM, Grand Oaks C&D Presenter: Andreas Dieckow This session provides an overview of what it takes to move an existing Caché or Ensemble application to InterSystems IRIS Data Platform. You will learn that migration is not urgent (unless you want to take advantage of new features in InterSystems IRIS) but that it is often less complex than you might expect.
This is an eclectic collection of interesting blog posts, software announcements and data applications I've noted over the past month or so. ONNX Model Zoo is now available, providing a library of pre-trained state-of-the-art models in deep learning in the ONNX format. In the 2018 IEEE Spectrum Top Programming Language rankings, Python takes the top spot and R ranks #7. Julia 1.0 has been released, marking the stabilization of the scientific computing language and promising forwards compatibility. Google announces Cloud AutoML, a beta service to train vision, text categorization, or language translation models from provided data.
The Dragonfly Machine Learning Engine (MLE) provides the machine learning and data science capabilities included within OPNids. Data science and machine learning promise to counteract the dynamic threat environment created by growing network traffic and increasing threat actor sophistication. This post will provide an overview of the MLE engine itself, reasoning for why data science and cybersecurity go together, and some insight into using the MLE as part of the OPNids system. The Dragonfly MLE is available as part of OPNids. The Dragonfly MLE provides a powerful framework for deploying anomaly detection algorithms, threat intelligence lookups, and machine learning predictions within a network security infrastructure.