Collaborating Authors

Inductive Learning

Feature Engineering in SQL and Python: A Hybrid Approach - KDnuggets


I knew SQL long before learning about Pandas, and I was intrigued by the way Pandas faithfully emulates SQL. Stereotypically, SQL is for analysts, who crunch data into informative reports, whereas Python is for data scientists, who use data to build (and overfit) models. Although they are almost functionally equivalent, I'd argue both tools are essential for a data scientist to work efficiently. From my experience with Pandas, I've noticed the following: Those problems are naturally solved when I began feature engineering directly in SQL. If you know a little bit of SQL, it's time to put it into good use.

Google Brain's SimCLRv2 Achieves New SOTA in Semi-Supervised Learning


Following on the February release of its contrastive learning framework SimCLR, the same team of Google Brain researchers guided by Turing Award honouree Dr. Geoffrey Hinton has presented SimCLRv2, an upgraded approach that boosts the SOTA results by 21.6 percent. The updated framework takes the "unsupervised pretrain, supervised fine-tune" paradigm popular in natural language processing and applies it to image recognition. Unlabelled data is learned in a task-agnostic way in the pretraining phase, which means the model has no prior classification knowledge. The researchers find that using a deep and wide neural network can be more label-efficient and greatly improve accuracy. Unlike SimCLR, whose largest model is ResNet-50, SimCLRv2's largest model is a 152-layer ResNet, which is three times wider in channels and selective kernels.

Seth Meyers blasts Trump as a 'one-man super spreader' of coronavirus


"At what point can we say Trump is actively putting people in harm's way?" asked Late Night host Seth Meyers on Thursday. "He holds indoor rallies, refuses to wear a mask, and wants to cut back on testing. While numerous other countries have successfully suppressed their coronavirus outbreaks, the pandemic has reached a new peak in the U.S. The country recorded 39,327 new cases on Thursday, beating the previous single-day record set on Wednesday. Meanwhile, Trump's administration intends to stop funding coronavirus testing sites at the end of June. "That's like a pilot turning off the seatbelt sign after they graze a mountain," said Meyers. "'Don't worry folks, we just nicked one of the Rockies.

Self Supervised Representation Learning in NLP


While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while. Language Models have existed since the 90's even before the phrase "self-supervised learning" was termed. The Word2Vec paper from 2013 popularized this paradigm and the field has rapidly progressed applying these self-supervised methods across many problems. At the core of these self-supervised methods lies a framing called "pretext task" that allows us to use the data itself to generate labels and use supervised methods to solve unsupervised problems. These are also referred to as "auxiliary task" or "pre-training task". The representations learned by performing this task can be used as a starting point for our downstream supervised tasks.

Get More Out of Your Annotated Medical Images with Self-Supervised Learning


Data scarcity is a perennial problem when applying deep learning (DL) to medical imaging. In vision tasks related to natural images, DL practitioners often have access to astoundingly large annotated data sets on which they can train. However, due to privacy concerns and the expense of creating them, access to large annotated data sets is rare in medical imaging. The natural follow-up question is: How can practitioners in the field of medical imaging best use DL given limited data? In this article, I'll discuss one approach to stretch the use of available data, called self-supervised learning.

Florida Coronavirus Cases Set Record; Positive Tests Also Up

U.S. News

Gov. Ron DeSantis last week said the upward trend in confirmed cases is mostly a reflection of more testing being conducted combined with some spikes in some agriculture communities, but the number of tests conducted daily peaked three weeks ago and the percentage of positive tests is now over 6%, more than double the rate of 2.3% in late May.

Does Machine Learning Really Work?

AI Magazine

Does machine learning really work? Over the past decade, machine learning has evolved from a field of laboratory demonstrations to a field of significant commercial value. Machine-learning algorithms have now learned to detect credit card fraud by mining data on past transactions, learned to steer vehicles driving autonomously on public highways at 70 miles an hour, and learned the reading interests of many individuals to assemble personally customized electronic newsAbstracts. A new computational theory of learning is beginning to shed light on fundamental issues, such as the trade-off among the number of training examples available, the number of hypotheses considered, and the likely accuracy of the learned hypothesis. Newer research is beginning to explore issues such as long-term learning of new representations, the integration of Bayesian inference and induction, and life-long cumulative learning.

[Links of the Day] 12/05/2020 : Learning From Unlabeled Data, Fast Dataset Classifier, Azure Bad Rollout guardian


Thang present a novel method for learning from unlabeled data and more specifically semi-supervised learning methods. These methods were used to generate Google Meena Chatbot model. Like Snorkel this is used to quickly building classifiers of datasets that would be otherwise extremely time-consuming (and expensive) to label by hand for training purposes. Gandalf: Azure machine learning system trained to catch bad rollout deployment. The aims of this system is to catch bad deployment before they can have ripple effects across the whole system.

Data Science Tropes: Cowboys and Sirens


In this figure, the leaf nodes are the scores generated by the model, as inferred from the training examples residing in those leaf nodes. In this decision tree, $123.56 foreign transaction is scored at a risk level of 200, while a $123.57 For a one-penny difference in amount, the gradient is then 70,000 score points/dollar, an output variance that is wildly disproportionate to 1¢ difference in inputs. This example vividly illustrates that with the data science--and business--world's increasing emphasis on the explainability, reliability, and consistency of models used for decisioning, decision trees simply lose a lot of their palatability.

Towards Knowledgeable Supervised Lifelong Learning Systems

Journal of Artificial Intelligence Research

Learning a sequence of tasks is a long-standing challenge in machine learning. This setting applies to learning systems that observe examples of a range of tasks at different points in time. A learning system should become more knowledgeable as more related tasks are learned. Although the problem of learning sequentially was acknowledged for the first time decades ago, the research in this area has been rather limited. Research in transfer learning, multitask learning, metalearning and deep learning has studied some challenges of these kinds of systems. Recent research in lifelong machine learning and continual learning has revived interest in this problem. We propose Proficiente, a full framework for long-term learning systems. Proficiente relies on knowledge transferred between hypotheses learned with Support Vector Machines. The first component of the framework is focused on transferring forward selectively from a set of existing hypotheses or functions representing knowledge acquired during previous tasks to a new target task. A second component of Proficiente is focused on transferring backward, a novel ability of long-term learning systems that aim to exploit knowledge derived from recent tasks to encourage refinement of existing knowledge. We propose a method that transfers selectively from a task learned recently to existing hypotheses representing previous tasks. The method encourages retention of existing knowledge whilst refining. We analyse the theoretical properties of the proposed framework. Proficiente is accompanied by an agnostic metric that can be used to determine if a long-term learning system is becoming more knowledgeable. We evaluate Proficiente in both synthetic and real-world datasets, and demonstrate scenarios where knowledgeable supervised learning systems can be achieved by means of transfer.