Overview
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Zhang, Chaoning, Zhang, Chenshuang, Song, Junha, Yi, John Seon Keun, Zhang, Kang, Kweon, In So
Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have been buried by their discriminative counterparts (like contrastive learning); however, the success of mask image modeling has revived the masking autoencoder (often termed denoising autoencoder in the past). As a milestone to bridge the gap with BERT in NLP, masked autoencoder has attracted unprecedented attention for SSL in vision and beyond. This work conducts a comprehensive survey of masked autoencoders to shed insight on a promising direction of SSL. As the first to review SSL with masked autoencoders, this work focuses on its application in vision by discussing its historical developments, recent progress, and implications for diverse applications.
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Feder, Amir, Keith, Katherine A., Manzoor, Emaad, Pryzant, Reid, Sridhar, Dhanya, Wood-Doughty, Zach, Eisenstein, Jacob, Grimmer, Justin, Reichart, Roi, Roberts, Margaret E., Stewart, Brandon M., Veitch, Victor, Yang, Diyi
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.
VLP: A Survey on Vision-Language Pre-training
Chen, Feilong, Zhang, Duzhen, Han, Minglun, Chen, Xiuyi, Shi, Jing, Xu, Shuang, Xu, Bo
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.
June 2022: "Top 40" New CRAN Packages
One hundred eighty-nine new packages made it to CRAN in June. Here are my โTop 40โ selections in eleven categories: Computational Methods, Data, Ecology, Genomics, Machine Learning, Mathematics, Medicine, Statistics, Time Series, Utilities, and Visualizations. Computational Methods itp v1.2.0: Implements the interpolate, truncate, project root-finding algorithm developed by Oliveira & Takahashi (2021). The vignette provides an overview. QR v0..1.3: Provides a function to perform QR factorization without pivoting to a real or complex matrix. It is based on LAPACK. See the vignette. qsplines v1.0.0: Provides functions to create quaterion splines. See Barry & Goldman (1988) and Kochanek & Bartels (1984) for the details and look here for an example. VMDecomp v1.0.1: Implements the variational mode decomposition and two-dimensional variational mode decomposition algorithm. See Dragomiretskiy & Zosso (2014) for background and the vignette for examples. Data cmch v0.2.0: Implements a wrapper around the Canadian Mortgage and Housing Corporation web interface and enables programmatic and reproducible access to a wide variety of housing data. See the vignette for examples. EDIutils v1.0.1: Implements a client for the Environmental Data Initiative repository REST API and provides access to ecological data and metadata. There are five short vignettes: Evaluate & upload, Citation Metrics, Download Metrics, Search andaccess, and Tests. globaltrends v0.0.12: Provides functions to access global search volumes from the Google Trends portal. This working paper outlines the packageโs methodological foundations and potential applications. See the vignette to get started. kaigiroku v0.5: Allows users to search and download data from the API for Japanese Diet proceedings. Look here for examples. NasdaqDataLink v1.0.0: Provides functions to interact directly with the Nasdaq Data Link API and obtain data in a number of formats. Look here for API documentation and here for package information. stortingscrape v0.1.1: Provides functions for retrieving data from the Norwegian Parliament, through the Norwegian Parliament API. See the vingette for an introduction. Ecology PointedSDMs v1.0.6: Provides tools to build integrated species distribution models and includes tools to run spatial cross-validation and plotting. See Issac et al. (2020) for and introduction to the methods. There is a Setophaga Example and an example for the Solitary Tinamou. restoptr v1.0.1: Implements a flexible framework for ecological restoration planning that aims to identify priority areas for restoration efforts using optimization algorithms described in Justeau-Allaire et al. 2021. See the vignette to get started. Genomics scapGNN v0.1.1: Implements a single cell active pathway analysis tool based on the graph neural network algorithm described in Scarselli et al. (2009) and Kipf & Welling (2017). This may be used to construct a gene-cell association network, infer pathway activity scores from different single cell modalities data and more. See the vignette for an overview and examples. SRTsim v0.99.2: Implements an independent, reproducible, and flexible Spatially Resolved Transcriptomics simulation framework that can be used to facilitate the development analytical methods and for a wide variety of SRT-specific analyses. See the vignette. xQTLbiolinks v1.1.1: Implements tools to query, download, and visualize of molecular quantitative trait locus and gene expression data from public resources through the GTEx API. There is a Quick Start Guide and vignettes on Colocalization, Specivicity, and Visualization. Machine Learning agua v0.0.1: Enables users to specify h2o as an engine for several tidymodels modeling methods. See README for examples. MagmaClustR V1.0.0: Implements two main algorithms, called Magma (Leroy et al. (2022) and MagmaClust (Leroy et al. (2020)), using a multi-task Gaussian processes (GP) model to perform predictions for supervised learning problems. See README for examples. openai v0.1.0: Provides a wrapper for OpenAI API endpoints including engines, completions, edits, files, fine-tunes, embeddings and legacy searches, classifications, and answers endpoints. See README to get started. sketching v0.1.0: Provides functions to construct sketches of data via random subspace embeddings. See Lee & Ng (2022) for the theory and the vignette for examples. webmorphR v0..1.1: Provides functions to create reproducible image stimuli, specialised for face images with psychomorph or webmorph templates. See README to get started. Mathematics GeneralizedWendland v0.5-2: Implements the fully parameterized generalized Wendland covariance function for use in Gaussian process models, as well as multiple methods for approximating it via covariance interpolation. The available methods are linear interpolation, polynomial interpolation, and cubic spline interpolation. See Bevilacqua et al. (2022) and the vignette for examples. jacobi v2.0.0: Evaluates Jacobi theta functions and related functions including the Weierstrass elliptic function, the Weierstrass sigma function, the Weierstrass zeta function, the Klein j-function, the Dedekind eta function, the lambda modular function, Jacobi elliptic functions, Neville theta functions, and the Eisenstein series for real and complex variables. Look here for some images. Medicine clinicalsignificance v1.0.0: Implements the clinical significance algorithm proposed by Jacobson et al. (1984) to determine if an intervention has a meaningful practical effect. There is a Getting Started Guide and vignettes on Cutoffs and Plots. PlatformDesign v1.0.1: Provides functions to calculate design parameters for an optimal two-period, multi-arm platform design allowing pre-planned deferred arms to be added during the trial. See Dunnett (1955) for background and the vignette for some theory and examples. Statistics bayesassurance v0.1.0: Provides functions to compute Bayesian assurance under various settings characterized by different assumptions and objectives, including precision-based conditions, credible intervals, and goal functions. See Pan & Banerjee (2021) for the theory. There are vignettes for using closed form solutions, the conjugate linear model, and precision based conditions. DSSP v0.1.1: Provides functions to draw samples from the direct sampling spatial prior model as described in White, Sun, & Speckman (2019). See the vignette for examples. edibble v0.1.0: Implements a system to facilitate designing comparative experiments using the grammar of experimental designs. See the edibble-book for documentation. mixgb v0.1.0: Implements a method for multiple imputation using XGBoost, bootstrapping and predictive mean matching as described in Deng and Lumley (2021). There is an Introduction and a vignette on Imputing new data with a saved imputer. outerbase v0.1.0: Implements in new method for high-dimensional regression using outer product models. See Plumlee (2014) and Plumlee et al. (2021) for background. There is a Getting started guide, a Base walkthrough, and vignettes on Learning from data and Speeding up inference. PFIM v5.0: Provides functions to evaluate or optimize designs for nonlinear mixed effects models using the Fisher Information matrix. See Malle & Baccar D (1997) and Retout et al. (2007) for background and the vignettes Design evaluation and optimixation (01), Design evaluation and optimixation (02), and Library of models for examples. VirtualPop v1.0.2: Provides functions to generate lifespans and fertility histories in continuous time using individual-level state transition (multi-state) models and data. See the vignettes on Simulation of life histories, Sampling from waiting time distributions, Simulation of individual fertility careers, and Validation. Time Series kssa v0.0.1: Implements the known sub-sequence algorithm described in Benavides et al. (2022), which helps to automatically identify and validate the best method for missing data imputation in a time series. Look here for examples. ts2net v0.1.0: Implements methods to transform time series into networks, a technique which may be useful for complex systems modeling, time series data mining, or time series analysis using networks. For an introduction to the topic and descriptions of the methods see Mitchell (2006), Silva & Zhao (2016), and Silva et al. (2021). See README to get started. Utilities cppchedkR Allows users to run Cppcheck on C/C++ files as an R command or an RStudio addin. See README. . gtExtras v0.4.1: Provides additional functions for creating tables with gt. See README for examples. . Visualization ggpie v0.2.2: Provides functions for creating pie, donut and rose pie plots with ggplot2. See the vignette. ggtrace v0.2.0: Provides ggplot2 geoms that allow groups of data points to be outlined or highlighted for emphasis. See the vignettes Trace lines and Trace points. Morphoscape v1.0.0: Implements adaptive landscape methods first described by Polly et al. (2016) for the integration, analysis and visualization of biological trait data on a phenotypic morphospace which are typically defined by shape metrics. See the vignette. r3js v0.0.1: Provides R and JavaScript functions to allow WebGL-based 3D plotting using the three.js library. See the vignettes: Getting Started, Creating a plot from scratch, and Grouping plot elements. rgl2gltf v1.0.0: Provides functions to work with glTF files which are used to describe 3D models. See the vignette for examples.. . shapviz v0.2.0: Provides functions to visualize SHapley Additive exPlanations (SHAP), such as waterfall plots, force plots, various types of importance plots, and dependence plots. See Lundberg & Lee (2017) for background and the vignette for examples.
Learning Disentangled Representations in the Imaging Domain
Liu, Xiao, Sanchez, Pedro, Thermos, Spyridon, O'Neil, Alison Q., Tsaftaris, Sotirios A.
Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations. We survey applications in medical imaging emphasising choices made in exemplar key works, and then discuss links to computer vision applications. We conclude by presenting limitations, challenges, and opportunities.
"Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking
Jacqmin, Lรฉo, Rojas-Barahona, Lina M., Favre, Benoit
While communicating with a user, a task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking (DST) is crucial because it directly informs the downstream dialogue policy. DST has received a lot of interest in recent years with the text-to-text paradigm emerging as the favored approach. In this review paper, we first present the task and its associated datasets. Then, considering a large number of recent publications, we identify highlights and advances of research in 2021-2022. Although neural approaches have enabled significant progress, we argue that some critical aspects of dialogue systems such as generalizability are still underexplored. To motivate future studies, we propose several research avenues.
When Machine Learning Meets Privacy: A Survey and Outlook: ACM Computing Surveys: Vol 54, No 2
The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning are still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This article surveys the state of the art in privacy issues and solutions for machine learning.
Federated Learning for IoUT: Concepts, Applications, Challenges and Opportunities
Victor, Nancy, C, Rajeswari., Alazab, Mamoun, Bhattacharya, Sweta, Magnusson, Sindri, Maddikunta, Praveen Kumar Reddy, Ramana, Kadiyala, Gadekallu, Thippa Reddy
Internet of Underwater Things (IoUT) have gained rapid momentum over the past decade with applications spanning from environmental monitoring and exploration, defence applications, etc. The traditional IoUT systems use machine learning (ML) approaches which cater the needs of reliability, efficiency and timeliness. However, an extensive review of the various studies conducted highlight the significance of data privacy and security in IoUT frameworks as a predominant factor in achieving desired outcomes in mission critical applications. Federated learning (FL) is a secured, decentralized framework which is a recent development in machine learning, that will help in fulfilling the challenges faced by conventional ML approaches in IoUT. This paper presents an overview of the various applications of FL in IoUT, its challenges, open issues and indicates direction of future research prospects.
Unsupervised Frequent Pattern Mining for CEP
Complex Event Processing (CEP) is a set of methods that allow efficient knowledge extraction from massive data streams using complex and highly descriptive patterns. Numerous applications, such as online finance, healthcare monitoring and fraud detection use CEP technologies to capture critical alerts, potential threats, or vital notifications in real time. As of today, in many fields, patterns are manually defined by human experts. However, desired patterns often contain convoluted relations that are difficult for humans to detect, and human expertise is scarce in many domains. We present REDEEMER (REinforcement baseD cEp pattErn MinER), a novel reinforcement and active learning approach aimed at mining CEP patterns that allow expansion of the knowledge extracted while reducing the human effort required. This approach includes a novel policy gradient method for vast multivariate spaces and a new way to combine reinforcement and active learning for CEP rule learning while minimizing the number of labels needed for training. REDEEMER aims to enable CEP integration in domains that could not utilize it before. To the best of our knowledge, REDEEMER is the first system that suggests new CEP rules that were not observed beforehand, and is the first method aimed for increasing pattern knowledge in fields where experts do not possess sufficient information required for CEP tools. Our experiments on diverse data-sets demonstrate that REDEEMER is able to extend pattern knowledge while outperforming several state-of-the-art reinforcement learning methods for pattern mining.
Graph Neural Networks to Predict Sports Outcomes
Xenopoulos, Peter, Silva, Claudio
Predicting outcomes in sports is important for teams, leagues, bettors, media, and fans. Given the growing amount of player tracking data, sports analytics models are increasingly utilizing spatially-derived features built upon player tracking data. However, player-specific information, such as location, cannot readily be included as features themselves, since common modeling techniques rely on vector input. Accordingly, spatially-derived features are commonly constructed in relation to anchor objects, such as the distance to a ball or goal, through global feature aggregations, or via role-assignment schemes, where players are designated a distinct role in the game. In doing so, we sacrifice inter-player and local relationships in favor of global ones. To address this issue, we introduce a sport-agnostic graph-based representation of game states. We then use our proposed graph representation as input to graph neural networks to predict sports outcomes. Our approach preserves permutation invariance and allows for flexible player interaction weights. We demonstrate how our method provides statistically significant improvements over the state of the art for prediction tasks in both American football and esports, reducing test set loss by 9% and 20%, respectively. Additionally, we show how our model can be used to answer "what if" questions in sports and to visualize relationships between players.