Overview
Fruit Ripeness Classification: a Survey
Rizzo, Matteo, Marcuzzo, Matteo, Zangari, Alessandro, Gasparetto, Andrea, Albarelli, Andrea
Fruit is a key crop in worldwide agriculture feeding millions of people. The standard supply chain of fruit products involves quality checks to guarantee freshness, taste, and, most of all, safety. An important factor that determines fruit quality is its stage of ripening. This is usually manually classified by field experts, making it a labor-intensive and error-prone process. Thus, there is an arising need for automation in fruit ripeness classification. Many automatic methods have been proposed that employ a variety of feature descriptors for the food item to be graded. Machine learning and deep learning techniques dominate the top-performing methods. Furthermore, deep learning can operate on raw data and thus relieve the users from having to compute complex engineered features, which are often crop-specific. In this survey, we review the latest methods proposed in the literature to automatize fruit ripeness classification, highlighting the most common feature descriptors they operate on.
Artificial Intelligence for Dementia Research Methods Optimization
Bucholc, Magda, James, Charlotte, Khleifat, Ahmad Al, Badhwar, AmanPreet, Clarke, Natasha, Dehsarvi, Amir, Madan, Christopher R., Marzi, Sarah J., Shand, Cameron, Schilder, Brian M., Tamburin, Stefano, Tantiangco, Hanz M., Lourida, Ilianna, Llewellyn, David J., Ranson, Janice M.
Introduction: Machine learning (ML) has been extremely successful in identifying key features from high-dimensional datasets and executing complicated tasks with human expert levels of accuracy or greater. Methods: We summarize and critically evaluate current applications of ML in dementia research and highlight directions for future research. Results: We present an overview of ML algorithms most frequently used in dementia research and highlight future opportunities for the use of ML in clinical practice, experimental medicine, and clinical trials. We discuss issues of reproducibility, replicability and interpretability and how these impact the clinical applicability of dementia research. Finally, we give examples of how state-of-the-art methods, such as transfer learning, multi-task learning, and reinforcement learning, may be applied to overcome these issues and aid the translation of research to clinical practice in the future. Discussion: ML-based models hold great promise to advance our understanding of the underlying causes and pathological mechanisms of dementia.
Autonomous Reflectance Transformation Imaging by a Team of Unmanned Aerial Vehicles
Krátký, Vít, Petráček, Pavel, Spurný, Vojtěch, Saska, Martin
A Reflectance Transformation Imaging technique (RTI) realized by multi-rotor Unmanned Aerial Vehicles (UAVs) with a focus on deployment in difficult to access buildings is presented in this letter. RTI is a computational photographic method that captures a surface shape and color of a subject and enables its interactive re-lighting from any direction in a software viewer, revealing details that are not visible with the naked eye. The input of RTI is a set of images captured by a static camera, each one under illumination from a different known direction. We present an innovative approach applying two multi-rotor UAVs to perform this scanning procedure in locations that are hardly accessible or even inaccessible for people. The proposed system is designed for its safe deployment within real-world scenarios in historical buildings with priceless historical value.
End-to-End Speech Recognition: A Survey
Prabhavalkar, Rohit, Hori, Takaaki, Sainath, Tara N., Schlüter, Ralf, Watanabe, Shinji
Within components (models, knowledge sources) of an ASR system the classical approach, deep learning has been introduced before coming to a decision. This is in line with Bayes' to acoustic and language modeling. In acoustic modeling, decision rule, which exactly requires a single global decision deep learning replaced Gaussian mixture distributions (hybrid integrating all available knowledge sources. HMM [3], [4]) or augmented the acoustic feature set c) Joint Training: In terms of model training, E2E suggests (nonlinear disciminant/tandem approach [5], [6]). In language estimating all parameters of all components of a model modeling, deep learning replaced count-based approaches [7], jointly using a single objective function that is consistent with [8], [9]. However, when introducing deep learning, the classical the task at hand, which in case of ASR means minimizing the ASR architecture was not yet touched. Classical stateof-the-art expected word error rate. ASR systems today are composed of many separate d) Training Data: Joint training of an integrated model components and knowledge sources, especially speech signal implies using a single kind of training data, which in case preprocessing, methods for robustness w.r.t.
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Hayat, Nasir, Geras, Krzysztof J., Shamout, Farah E.
Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, where samples consist of "paired" modalities, data in healthcare is often collected asynchronously. Hence, requiring the presence of all modalities for a given sample is not realistic for clinical tasks and significantly limits the size of the dataset during training. In this paper, we propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input. We evaluate the fusion method and introduce new benchmark results for in-hospital mortality prediction and phenotype classification, using clinical time-series data in the MIMIC-IV dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more complex multi-modal fusion strategies, MedFuse provides a performance improvement by a large margin on the fully paired test set. It also remains robust across the partially paired test set containing samples with missing chest X-ray images. We release our code for reproducibility and to enable the evaluation of competing models in the future.
TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Jansen, Peter A., Côté, Marc-Alexandre
Text-based games offer a challenging test bed to evaluate virtual agents at language understanding, multi-step problem-solving, and common-sense reasoning. However, speed is a major limitation of current text-based games, capping at 300 steps per second, mainly due to the use of legacy tooling. In this work we present TextWorldExpress, a high-performance simulator that includes implementations of three common text game benchmarks that increases simulation throughput by approximately three orders of magnitude, reaching over one million steps per second on common desktop hardware. This significantly reduces experiment runtime, enabling billion-step-scale experiments in about one day.
Sensors
Weeds are one of the most important factors affecting agricultural production. The waste and pollution of farmland ecological environment caused by full-coverage chemical herbicide spraying are becoming increasingly evident. With the continuous improvement in the agricultural production level, accurately distinguishing crops from weeds and achieving precise spraying only for weeds are important. However, precise spraying depends on accurately identifying and locating weeds and crops. In recent years, some scholars have used various computer vision methods to achieve this purpose. This review elaborates the two aspects of using traditional image-processing methods and deep learning-based methods to solve weed detection problems. It provides an overview of various methods for weed detection in recent years, analyzes the advantages and disadvantages of existing methods, and introduces several related plant leaves, weed datasets, and weeding machinery. Lastly, the problems and difficulties of the existing weed detection methods are analyzed, and the development trend of future research is prospected.
Brief Review -- LiT: Zero-Shot Transfer with Locked-image text Tuning
The proposed model significantly outperforms the previous state-of-the-art methods at ImageNet zero-shot classification. There are 8.3% and 8.1% improvement over CLIP and ALIGN, respectively. With a pre-trained image model, the proposed setup converges significantly faster than the standard from-scratch setups reported in the literature. LiT provides a way to reuse the already pre-trained models in the literature. It is evident that locking the image tower almost always works best and using a pre-trained image tower significantly helps across the board, whereas using a pre-trained text tower only marginally improves performance, and locking the text tower does not work well.
A Scalable Space-efficient In-database Interpretability Framework for Embedding-based Semantic SQL Queries
Kudva, Prabhakar, Bordawekar, Rajesh, Nitsure, Apoorva
AI-Powered database (AI-DB) is a novel relational database system that uses a self-supervised neural network, database embedding, to enable semantic SQL queries on relational tables. In this paper, we describe an architecture and implementation of in-database interpretability infrastructure designed to provide simple, transparent, and relatable insights into ranked results of semantic SQL queries supported by AI-DB. We introduce a new co-occurrence based interpretability approach to capture relationships between relational entities and describe a space-efficient probabilistic Sketch implementation to store and process co-occurrence counts. Our approach provides both query-agnostic (global) and query-specific (local) interpretabilities. Experimental evaluation demonstrate that our in-database probabilistic approach provides the same interpretability quality as the precise space-inefficient approach, while providing scalable and space efficient runtime behavior (up to 8X space savings), without any user intervention.
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
Vemula, Anirudh, Song, Yuda, Singh, Aarti, Bagnell, J. Andrew, Choudhury, Sanjiban
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that optimizing the expected policy advantage in the learned model under an exploration distribution is sufficient for policy computation, resulting in a significant boost in computational efficiency compared to traditional planning methods. Additionally, the unified objective uses a value moment matching term for model fitting, which is aligned with the model's usage during policy computation. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains compared to existing MBRL methods through simulated benchmarks.