Data Science


Big Data Science with the BD2K-LINCS Data Coordination and Integration Center Coursera

@machinelearnbot

About this course: The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund program. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of single genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.


AI washing muddies the artificial intelligence products market

@machinelearnbot

More than 1,000 vendors with applications and platforms describe themselves as artificial intelligence products vendors, or say they employ AI in their products, according to the research firm. When a technology is labelled AI, the vendor must provide information that makes it clear how AI is used as a differentiator and what problems it solves that can't be solved by other technologies, explained Jim Hare, a research VP at Gartner, who focuses on analytics and data science. Companies that want to answer a specific question or problem should use business analytics tools. Over 50% of respondents to Gartner's 2017 AI development strategies survey said the lack of necessary staff skills was the top challenge to AI adoption.


Understanding overfitting: an inaccurate meme in supervised learning

#artificialintelligence

It seems like, a kind of an urban legend or a meme, a folklore is circulating in data science or allied fields with the following statement: Applying cross-validation prevents overfitting and a good out-of-sample performance, low generalisation error in unseen data, indicates not an overfit. Aim In this post, we will give an intuition on why model validation as approximating generalization error of a model fit and detection of overfitting can not be resolved simultaneously on a single model. Let's use the following functional form, from classic text of Bishop, but with an added Gaussian noise $$ f(x) sin(2\pi x) \mathcal{N}(0,0.1).$$ We generate large enough set, 100 points to avoid sample size issue discussed in Bishop's book, see Figure 2. Overtraining is not overfitting Overtraining means a model performance degrades in learning model parameters against an objective variable that effects how model is build, for example, an objective variable can be a training data size or iteration cycle in neural network.


The R Programming Environment Coursera

@machinelearnbot

About this course: This course provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Whether you are part of a data science team or working individually within a community of developers, this course will give you the knowledge of R needed to make useful contributions in those settings. We cover basic R concepts and language fundamentals, key concepts like tidy data and related "tidyverse" tools, processing and manipulation of complex and large datasets, handling textual data, and basic data science tasks. Upon completing this course, learners will have fluency at the R console and will be able to create tidy datasets from a wide range of possible data sources.


Top 10 Essential Books for the Data Enthusiast

@machinelearnbot

The true data enthusiast has a lot to read about: big data, machine learning, data science, data mining, etc. There are a lot of lists available of the top books in particular categories related to data. In fact, KDnuggets has previously, and rather recently, put together such lists on data mining, databases & big data, statistics, AI & machine learning, and neural networks. This inclusive list of essential books for the data enthusiast (or practitioner) recommends a top paid and free resource in each of 10 categories.


Amazon Macie automates cloud data protection with machine learning

#artificialintelligence

This year, it's Amazon Macie, a security service designed to automatically discover and protect sensitive data stored in AWS. As organizations move more of their data to Amazon's various cloud offerings, security teams have the unenviable task of continuously tracking the data to identify, classify and protect sensitive pieces of information such as personally identifiable information (PII), personal health information (PHI), regulatory documents, API keys, secret key material and intellectual property. As Amazon Macie recognizes personally identifiable information (PII), organizations can use the Macie dashboard to show compliance with GDPR regulations around encryption and pseudonymization of data. In contrast, Microsoft has integrated management tools in its Azure platform and Google offers many security offerings by default in Google Cloud Platform.


IBM's Watson is Becoming a Crime Fighter - Learn How It is Helping the Financial Industry

#artificialintelligence

IBM's newest cognitive computing offering is Financial Crimes Insight with Watson, which is designed to help banks spot financial crimes such as money laundering. The mission of this latest incarnation of Watson, the brainchild of the company's newly formed Watson Financial Services division, is to "[help] organizations efficiently manage financial investigation efforts through streamlined research and analysis of unstructured and structured data." This new suite of Watson products is aimed at helping financial institutions manage their regulatory and fiduciary obligations. For example, in addition to the Financial Crimes Insight with Watson product, IBM is also offering Watson Regulatory Compliance, which focuses on assisting financial institutions in understanding and addressing constantly changing regulatory requirements.


What Types of Questions Can Data Science Answer?

@machinelearnbot

As you may have gathered, the families of two-class classification, multi-class classification, anomaly detection, and regression are all closely related. Entirely different sets of data science questions belong in the extended algorithm families of unsupervised and reinforcement learning. Another family of unsupervised learning algorithms are called dimensionality reduction techniques. These are called reinforcement learning (RL) algorithms.


Education in a Digital Age – Hacker Noon

#artificialintelligence

For a start, everyone is going to need a much better theoretical understanding of the technologies surrounding computers, communication networks, artificial intelligence and big data. Dynamic analysis of complex situations and the ability to communicate solutions, in presentations or in video form, will be key. The ability to work in a team, constantly adapting to new situations and working patterns, becomes crucial. Partly, this reflects my own preference for biology metaphors for understanding recent changes in the business world.


The use of AI in politics is not going away anytime soon

#artificialintelligence

The next level will be using artificial intelligence in election campaigns and political life. This highly sophisticated micro-targeting operation relied on big data and machine learning to influence people's emotions. Typically disguised as ordinary human accounts, bots spread misinformation and contribute to an acrimonious political climate on sites like Twitter and Facebook. For example, if a person is interested in environment policy, an AI targeting tool could be used to help them find out what each party has to say about the environment.