Collaborating Authors


Identifying Sponsored Content in News Sites With Machine Learning


Researchers from the Netherlands have developed a new machine learning method that's capable of distinguishing sponsored or otherwise paid content within news platforms, to an accuracy of more than 90%, in response to growing interest from advertisers in'native' advertising formats that are difficult to distinguish from'real' journalistic output. The new paper, titled Distinguishing Commercial from Editorial Content in News, comes from researchers at Leiden University. The authors observe that though more serious publications, which can more easily dictate terms to advertisers, will make a reasonable effort to distinguish'partner content' from the general run of news and analysis, the standards are slowly but inexorably shifting to increased integration between editorial and commercial teams on an outlet, which they consider an alarming and negative trend. 'The ability to disguise content, willingly or unwillingly, and the probability that advertorials are not recognized as such even if properly labelled is significant. Marketers call it native [advertising] for a reason.'

Mobile Price Classification - Projects Based Learning


Bob has started his own mobile company. He wants to give a tough fight to big companies like Apple, Samsung etc. He does not know how to estimate the price of mobiles his company creates. In this competitive mobile phone market, you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies.

"Artificial Intelligence" Science-Research, November 2021 -- summary from OSTI GOV, DOE Pages…


The report records the DOE Town Halls held during 2019 at Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, and in Washington, DC. The AI for Science city center conversations concentrated on recording the transformational usages of AI that utilize HPC and/or information analysis, leveraging data collections from HPC simulations or instruments and customer centers, and dealing with scientific challenges one-of-akind to DOE user facilities and the company's comprehensive basic and used scientific research venture. Artificial intelligence and machine learning systems have the potential to influence the future layout and implementation of cybersecurity systems for the power grid. Artificial intelligence is the research of intelligence agents as shown by machines. Commonly used supervised learning strategies include deep learning and other machine learning methods that call for less information than deep learning, e. G. Support vector machines, random forests.

Machine Learning Project Predict Will it Rain Tomorrow in Australia - Projects Based Learning


In this project we will be working with a data set, indicating whether it rain the next day in Australia, Yes or No? This column is Yes if the rain for that day was 1mm or more. We will try to create a model that will predict using the available data. Welcome to this project on predict whether it will rain tomorrow in Australia in Apache Spark Machine Learning using Databricks platform community edition server which allows you to execute your spark code, free of cost on their server just by registering through email id. In this project, we explore Apache Spark and Machine Learning on the Databricks platform.

Top 20 machine learning interview questions


Machine learning (ML) is the process of training a computer-related program that helps to create a statistical model based on data. It automatically learns programs from data. Machine learning is one of the by-products of artificial intelligence (AI). Nowadays, almost 80% of enterprises already adopt machine learning and artificial intelligence and have gained enormous financial advantages from it. So, let us quickly look into these top 20 interview questions with answers which may help you to crack your interview.

Glass Identification - Projects Based Learning


From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. The study of the classification of types of glass was motivated by the criminological investigation. At the scene of the crime, the glass left can be used as evidence…if it is correctly identified! Convert String data to Numeric format so we can process the data in Apache Spark ML Library. Welcome to this project on predicting the type of Glass in Apache Spark Machine Learning using Databricks platform community edition server which allows you to execute your spark code, free of cost on their server just by registering through email id.

A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification Machine Learning

Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.

Learned Benchmarks for Subseasonal Forecasting Machine Learning

We develop a subseasonal forecasting toolkit of simple learned benchmark models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. Our new models include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., our Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble. Overall, we find that augmenting traditional forecasting approaches with learned enhancements yields an effective and computationally inexpensive strategy for building the next generation of subseasonal forecasting benchmarks.

Personalized Online Machine Learning Machine Learning

In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

Deep Quantile Regression for Uncertainty Estimation in Unsupervised and Supervised Lesion Detection Machine Learning

Despite impressive state-of-the-art performance on a wide variety of machine learning tasks in multiple applications, deep learning methods can produce over-confident predictions, particularly with limited training data. Therefore, quantifying uncertainty is particularly important in critical applications such as anomaly or lesion detection and clinical diagnosis, where a realistic assessment of uncertainty is essential in determining surgical margins, disease status and appropriate treatment. In this work, we focus on using quantile regression to estimate aleatoric uncertainty and use it for estimating uncertainty in both supervised and unsupervised lesion detection problems. In the unsupervised settings, we apply quantile regression to a lesion detection task using Variational AutoEncoder (VAE). The VAE models the output as a conditionally independent Gaussian characterized by means and variances for each output dimension. Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance. We describe an alternative VAE model, Quantile-Regression VAE (QR-VAE), that avoids this variance shrinkage problem by estimating conditional quantiles for the given input image. Using the estimated quantiles, we compute the conditional mean and variance for input images under the conditionally Gaussian model. We then compute reconstruction probability using this model as a principled approach to outlier or anomaly detection applications. In the supervised setting, we develop binary quantile regression (BQR) for the supervised lesion segmentation task. BQR segmentation can capture uncertainty in label boundaries. We show how quantile regression can be used to characterize expert disagreement in the location of lesion boundaries.