Overview
Vision-and-Language Pretraining
Nguyen, Thong, Nguyen, Cong-Duy, Wu, Xiaobao, Ng, See-Kiong, Luu, Anh Tuan
With the burgeoning amount of data of image-text pairs and diversity of Vision-and-Language (V\&L) tasks, scholars have introduced an abundance of deep learning models in this research domain. Furthermore, in recent years, transfer learning has also shown tremendous success in Computer Vision for tasks such as Image Classification, Object Detection, etc., and in Natural Language Processing for Question Answering, Machine Translation, etc. Inheriting the spirit of Transfer Learning, research works in V\&L have devised multiple pretraining techniques on large-scale datasets in order to enhance the performance of downstream tasks. The aim of this article is to provide a comprehensive revision of contemporary V\&L pretraining models. In particular, we categorize and delineate pretraining approaches, along with the summary of state-of-the-art vision-and-language pretrained models. Moreover, a list of training datasets and downstream tasks is supplied to further polish the perspective into V\&L pretraining. Lastly, we decided to take a further step to discuss numerous directions for future research.
Map Point Selection for Visual SLAM
Mรผller, Christiaan J., van Daalen, Cornรฉ E.
Simultaneous localisation and mapping (SLAM) play a vital role in autonomous robotics. Robotic platforms are often resource-constrained, and this limitation motivates resource-efficient SLAM implementations. While sparse visual SLAM algorithms offer good accuracy for modest hardware requirements, even these more scalable sparse approaches face limitations when applied to large-scale and long-term scenarios. A contributing factor is that the point clouds resulting from SLAM are inefficient to use and contain significant redundancy. This paper proposes the use of subset selection algorithms to reduce the map produced by sparse visual SLAM algorithms. Information-theoretic techniques have been applied to simpler related problems before, but they do not scale if applied to the full visual SLAM problem. This paper proposes a number of novel information\hyp{}theoretic utility functions for map point selection and optimises these functions using greedy algorithms. The reduced maps are evaluated using practical data alongside an existing visual SLAM implementation (ORB-SLAM 2). Approximate selection techniques proposed in this paper achieve trajectory accuracy comparable to an offline baseline while being suitable for online use. These techniques enable the practical reduction of maps for visual SLAM with competitive trajectory accuracy. Results also demonstrate that SLAM front-end performance can significantly impact the performance of map point selection. This shows the importance of testing map point selection with a front-end implementation. To exploit this, this paper proposes an approach that includes a model of the front-end in the utility function when additional information is available. This approach outperforms alternatives on applicable datasets and highlights future research directions.
Recent Developments in Recommender Systems: A Survey
Li, Yang, Liu, Kangbo, Satapathy, Ranjan, Wang, Suhang, Cambria, Erik
In this technical survey, we comprehensively summarize the latest advancements in the field of recommender systems. The objective of this study is to provide an overview of the current state-of-the-art in the field and highlight the latest trends in the development of recommender systems. The study starts with a comprehensive summary of the main taxonomy of recommender systems, including personalized and group recommender systems, and then delves into the category of knowledge-based recommender systems. In addition, the survey analyzes the robustness, data bias, and fairness issues in recommender systems, summarizing the evaluation metrics used to assess the performance of these systems. Finally, the study provides insights into the latest trends in the development of recommender systems and highlights the new directions for future research in the field.
Inertial Navigation Meets Deep Learning: A Survey of Current Trends and Future Directions
Inertial sensing is used in many applications and platforms, ranging from day-to-day devices such as smartphones to very complex ones such as autonomous vehicles. In recent years, the development of machine learning and deep learning techniques has increased significantly in the field of inertial sensing. This is due to the development of efficient computing hardware and the accessibility of publicly available sensor data. These data-driven approaches are used to empower model-based navigation and sensor fusion algorithms. This paper provides an in-depth review of those deep learning methods. We examine separately, each vehicle operation domain including land, air, and sea. Each domain is divided into pure inertial advances and improvements based on filter parameters learning. In addition, we review deep learning approaches for calibrating and denoising inertial sensors. Throughout the paper, we discuss these trends and future directions. We also provide statistics on the commonly used approaches to illustrate their efficiency and stimulate further research in deep learning embedded in inertial navigation and fusion.
AmicroN: A Framework for Generating Annotations for Human Activity Recognition with Granular Micro-Activities
Chatterjee, Soumyajit, Mitra, Bivas, Chakraborty, Sandip
Efficient human activity recognition (HAR) using sensor data needs a significant volume of annotated data. The growing volume of unlabelled sensor data has challenged conventional practices for gathering HAR annotations with human-in-the-loop approaches, often leading to the collection of shallower annotations. These shallower annotations ignore the fine-grained micro-activities that constitute any complex activities of daily living (ADL). Understanding this, we, in this paper, first analyze this lack of granular annotations from available pre-annotated datasets to understand the practical inconsistencies and also perform a detailed survey to look into the human perception surrounding annotations. Drawing motivations from these, we next develop the framework AmicroN that can automatically generate micro-activity annotations using locomotive signatures and the available coarse-grain macro-activity labels. In the backend, AmicroN applies change-point detection followed by zero-shot learning with activity embeddings to identify the unseen micro-activities in an unsupervised manner. Rigorous evaluation on publicly available datasets shows that AmicroN can accurately generate micro-activity annotations with a median F1-score of >0.75. Additionally, we also show that AmicroN can be used in a plug-and-play manner with Large Language Models (LLMs) to obtain the micro-activity labels, thus making it more practical for realistic applications.
Semi-automated extraction of research topics and trends from NCI funding in radiological sciences from 2000-2020
Nguyen, Mark, Beidler, Peter, Tsai, Joseph, Anderson, August, Chen, Daniel, Kinahan, Paul, Kang, John
Investigators, funders, and the public desire knowledge on topics and trends in publicly funded research but current efforts in manual categorization are limited in scale and understanding. We developed a semi-automated approach to extract and name research topics, and applied this to \$1.9B of NCI funding over 21 years in the radiological sciences to determine micro- and macro-scale research topics and funding trends. Our method relies on sequential clustering of existing biomedical-based word embeddings, naming using subject matter experts, and visualization to discover trends at a macroscopic scale above individual topics. We present results using 15 and 60 cluster topics, where we found that 2D projection of grant embeddings reveals two dominant axes: physics-biology and therapeutic-diagnostic. For our dataset, we found that funding for therapeutics- and physics-based research have outpaced diagnostics- and biology-based research, respectively. We hope these results may (1) give insight to funders on the appropriateness of their funding allocation, (2) assist investigators in contextualizing their work and explore neighboring research domains, and (3) allow the public to review where their tax dollars are being allocated.
Towards Explainable Evaluation Metrics for Machine Translation
Leiter, Christoph, Lertvittayakumjorn, Piyawat, Fomicheva, Marina, Zhao, Wei, Gao, Yang, Eger, Steffen
Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language explanations. We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent machine translation systems.
Don't Treat the Symptom, Find the Cause! Efficient Artificial-Intelligence Methods for (Interactive) Debugging
In the modern world, we are permanently using, leveraging, interacting with, and relying upon systems of ever higher sophistication, ranging from our cars, recommender systems in e-commerce, and networks when we go online, to integrated circuits when using our PCs and smartphones, the power grid to ensure our energy supply, security-critical software when accessing our bank accounts, and spreadsheets for financial planning and decision making. The complexity of these systems coupled with our high dependency on them implies both a non-negligible likelihood of system failures, and a high potential that such failures have significant negative effects on our everyday life. For that reason, it is a vital requirement to keep the harm of emerging failures to a minimum, which means minimizing the system downtime as well as the cost of system repair. This is where model-based diagnosis comes into play. Model-based diagnosis is a principled, domain-independent approach that can be generally applied to troubleshoot systems of a wide variety of types, including all the ones mentioned above, and many more. It exploits and orchestrates i.a. techniques for knowledge representation, automated reasoning, heuristic problem solving, intelligent search, optimization, stochastics, statistics, decision making under uncertainty, machine learning, as well as calculus, combinatorics and set theory to detect, localize, and fix faults in abnormally behaving systems. In this thesis, we will give an introduction to the topic of model-based diagnosis, point out the major challenges in the field, and discuss a selection of approaches from our research addressing these issues.
Natural Language Processing in Electronic Health Records in Relation to Healthcare Decision-making: A Systematic Review
Hossain, Elias, Rana, Rajib, Higgins, Niall, Soar, Jeffrey, Barua, Prabal Datta, Pisani, Anthony R., D, Ph., Turner}, Kathryn
Background: Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. Methodology: After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: 1) medical note classification, 2) clinical entity recognition, 3) text summarisation, 4) deep learning (DL) and transfer learning architecture, 5) information extraction, 6) Medical language translation and 7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Result and Discussion: EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. Conclusion: We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness
Coletta, Andrea, Jerome, Joseph, Savani, Rahul, Vyetrenko, Svitlana
LOBs [22] are a fundamental market mechanism, which are used across a significant proportion of financial markets, including all major stock and derivatives exchanges. The benefits of having robust and realistic simulators for these markets are numerous. For example, they would allow the study of markets under different assumptions, and the investigation of AI techniques for training trading strategies. In a LOB market, matched orders result in trades and unmatched orders are stored in the two parts of the LOB, a collection of buy orders called bids (the bid book), and a collection of sell orders called asks (the ask book). Typically, each side of the LOB will contains hundreds of individual orders, and a real market would be updated at micro-second time resolution, driven by a wide range of market participants and facilitated by "high-frequency" market makers [45]. The development of AI-based automated trading strategies for LOB markets has been a growth area in recent years, both within academia and industry, spurred on in part by developments in deep learning and reinforcement learning. Two typical LOB trading problems that have been investigated are market making, where the goal is to provide liquidity to the market by being continually willing to buy and sell an asset (see, e.g., Spooner et al. [50], Jerome et al. [28], Gasperov and Kostanjcar 1