Throughout the research world, artificial intelligence is increasingly being applied to scanning complicated scientific literature more quickly than humans alone can do. At Utrecht University, Prof. Rens van de Schoot and his team are part of an international research community now applying that technology to COVID-19 publications. In an edited email exchange with Diane M. Fresquez of Science Business, van de Schoot talks about his work, and search for collaborators (have you got coding talent?) – initially, while under lockdown with his three children, aged six and under, who played quietly (or not so quietly) underfoot. Q. Tell us about your COVID-19 project. With an increase in COVID-19 research literature, and an urgency to find cures and treatments, it is essential that data collection is done real-time.
International Conference on Learning Representations (ICLR) and Consultative Group on International Agricultural Research (CGIAR) jointly conducted a challenge where over 800 data scientists globally competed to detect diseases in crops based on close shot pictures. The objective of this challenge is to build a machine learning algorithm to correctly classify if a plant is healthy, has stem rust, or has leaf rust. Wheat rust is a devastating plant disease affecting many crops, reducing yields and affecting the livelihoods of farmers and decreasing food security across Africa. The disease is difficult to monitor at a large scale, making it difficult to control and eradicate. An accurate image recognition model that can detect wheat rust from any image will enable a crowd-sourced approach to monitor crops. The imagery data came from a variety of sources.
Coming with the ever growing computational power of mobile devices, mobile visual search have undergone an evolution in techniques and applications. A significant trend is low bit rate visual search, where compact visual descriptors are extracted directly over a mobile and delivered as queries rather than raw images to reduce the query transmission latency. In this article, we introduce our work on low bit rate mobile landmark search, in which a compact yet discriminative landmark image descriptor is extracted by using location context such as GPS, crowd-sourced hotspot WLAN, and cell tower locations. The compactness originates from the bag-of-words image representation, with an offline learning from geotagged photos from online photo sharing websites including Flickr and Panoramio. The learning process involves segmenting the landmark photo collection by discrete geographical regions using Gaussian mixture model, and then boosting a ranking sensitive vocabulary within each region, with an "entropy" based descriptor compactness feedback to refine both phases iteratively.
FIDE CM Kingscrusher goes over a game featuring An imprisoned bishop Highly Evolved Leela vs Mighty Stockfish TCEC Season 17 Rd 34 Play turn style chess at http://bit.ly/chessworld FIDE CM Kingscrusher goes over amazing games of Chess every day, with a focus recently on chess champions such as Magnus Carlsen or even games of Neural Networks which are opening up new concepts for how chess could be played more effectively. The Game qualities that kingscrusher looks for are generally amazing games with some awesome or astonishing features to them. Many brilliant games are being played every year in Chess and this channel helps to find and explain them in a clear way. There are classic games, crushing and dynamic games. There are exceptionally elegant games.
Alfredo joined Element AI as a Research Engineer in the AI for Good lab in London, working on applications that enable NGOs and non-profits. He is one of the primary co-authors of the first technical report made in partnership with Amnesty International, on the large-scale study of online abuse against women on Twitter from crowd-sourced data. He's been a Machine Learning mentor at NASA's Frontier Development Program, helping teams apply AI for scientific space problems. More recently, he led the joint-research with Mila Montreal on Multi-Frame Super-Resolution, which was awarded by the European Space Agency for their top performance on the PROBA-V Super-Resolution challenge. His research interests lie in computer vision for satellite imagery, probabilistic modeling, and AI for Social Good.
Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e.g. videos) or expensive (e.g. medical imaging). This motivates the use of an active sampling algorithm that chooses only the most informative pairs for comparison. In this paper we propose ASAP, an active sampling algorithm based on approximate message passing and expected information gain maximization. Unlike most existing methods, which rely on partial updates of the posterior distribution, we are able to perform full updates and therefore much improve the accuracy of the inferred scores. The algorithm relies on three techniques for reducing computational cost: inference based on approximate message passing, selective evaluations of the information gain, and selecting pairs in a batch that forms a minimum spanning tree of the inverse of information gain. We demonstrate, with real and synthetic data, that ASAP offers the highest accuracy of inferred scores compared to the existing methods. We also provide an open-source GPU implementation of ASAP for large-scale experiments.
Social media, especially Twitter, is being increasingly used for research with predictive analytics. In social media studies, natural language processing (NLP) techniques are used in conjunction with expert-based, manual and qualitative analyses. However, social media data are unstructured and must undergo complex manipulation for research use. The manual annotation is the most resource and time-consuming process that multiple expert raters have to reach consensus on every item, but is essential to create gold-standard datasets for training NLP-based machine learning classifiers. To reduce the burden of the manual annotation, yet maintaining its reliability, we devised a crowdsourcing pipeline combined with active learning strategies. We demonstrated its effectiveness through a case study that identifies job loss events from individual tweets. We used Amazon Mechanical Turk platform to recruit annotators from the Internet and designed a number of quality control measures to assure annotation accuracy. We evaluated 4 different active learning strategies (i.e., least confident, entropy, vote entropy, and Kullback-Leibler divergence). The active learning strategies aim at reducing the number of tweets needed to reach a desired performance of automated classification. Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets, although there was no substantial difference among the strategies tested.
Data is the bedrock of all machine learning systems. As such, working with the right data collection company is critical in order to solve a supervised machine learning problem. If you don't have a particular goal or project in mind, there is a wealth of open data available on the web to practice with. However, if you're looking to tackle a specific problem, chances are you'll need to collect data yourself or work with a company that can collect data for you. There are many data collection companies that provide crowdsourcing services to help individuals and corporations gather data at scale.
Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems started being trained with user-generated data, using categorical data as true labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be motivated to provide incorrect labels in order to improve their own utility from the system. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to bias and manipulation by the user. We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without querying oracle. Using this framework, data mining practitioners can detect a subjective class at an early stage of their projects, and avoid wasting their precious time and resources by dealing with subjective class problem with traditional machine learning techniques.
Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al, while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers' reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions.