Plotting


A Appendix for OPERA Contents A.1 Datasets Overview i A.2 Implementation Details viii A.3 Pretraining Results

Neural Information Processing Systems

A.1 Datasets Overview We have used 11 datasets in our benchmark. Their statistics are summarized in Table 1 and Table 2 in the main paper. It can be noted that all datasets contain an audio set and a metadata part. Audio data used are anonymous and the metadata do not contain personally identifiable information or offensive content. The COVID-19 Sounds dataset consists of 53,449 audio samples (over 552 hours in total) crowd-sourced from 36,116 participants through the COVID-19 Sounds app. This dataset is comprehensive in terms of demographics and spectrum of health conditions. It also provides participants' self-reported COVID-19 testing status with 2,106 samples tested positive. It consists of three modalities including breathing, cough, and voice recordings. Only breathing and cough modalities are used in this paper. This dataset is crowdsourced through the COVID-19 Sounds project, approved by the Ethics Committee of the Department of Computer Science and Technology at the University of Cambridge. Informed consent was obtained from all the participants.


Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Neural Information Processing Systems

Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets ( 136K samples, over 400 hours), pretrain three pioneering generalizable acoustic models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The OPERA website can be found at opera-benchmark.github.io




A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation

Neural Information Processing Systems

The goal of generalized few-shot semantic segmentation (GFSS) is to recognize novel-class objects through training with a few annotated examples and the baseclass model that learned the knowledge about the base classes. Unlike the classic few-shot semantic segmentation, GFSS aims to classify pixels into both base and novel classes, meaning it is a more practical setting. Current GFSS methods rely on several techniques such as using combinations of customized modules, carefully designed loss functions, meta-learning, and transductive learning. However, we found that a simple rule and standard supervised learning substantially improve the GFSS performance. In this paper, we propose a simple yet effective method for GFSS that does not use the techniques mentioned above. Also, we theoretically show that our method perfectly maintains the segmentation performance of the base-class model over most of the base classes. Through numerical experiments, we demonstrated the effectiveness of our method. It improved in novel-class segmentation performance in the 1-shot scenario by 6.1% on the PASCAL-5



Respond to Reviewer 1 A common bias is that meta-learning should tackle transfer learning or few-shot learning of our paper is to improve the general supervised learning performance via meta-learning

Neural Information Processing Systems

As pointed out by ICLR 2019 AnonReviewer3 of the Table 1: Updated results for regression. MAXL paper, "Moreover, since the method is not a metalearning To facilitate experiments, we resize images to 64 64 resolution. For regression results, we provide results of kNN in Table 1, which are Imagenet. We hope our response can address most of your concerns and sincerely hope you can re-consider your score. Respond to Reviewer 2 In fact, we didn't observe optimization difficulties when training all variables together due Besides, our model is not sensitive to the choice of datasets.