Performance Analysis
End-to-End Intelligent Framework for Rockfall Detection
Zoumpekas, Thanasis, Puig, Anna, Salamó, Maria, García-Sellés, David, Nuñez, Laura Blanco, Guinau, Marta
Rockfall detection is a crucial procedure in the field of geology, which helps to reduce the associated risks. Currently, geologists identify rockfall events almost manually utilizing point cloud and imagery data obtained from different caption devices such as Terrestrial Laser Scanner or digital cameras. Multi-temporal comparison of the point clouds obtained with these techniques requires a tedious visual inspection to identify rockfall events which implies inaccuracies that depend on several factors such as human expertise and the sensibility of the sensors. This paper addresses this issue and provides an intelligent framework for rockfall event detection for any individual working in the intersection of the geology domain and decision support systems. The development of such an analysis framework poses significant research challenges and justifies intensive experimental analysis. In particular, we propose an intelligent system that utilizes multiple machine learning algorithms to detect rockfall clusters of point cloud data. Due to the extremely imbalanced nature of the problem, a plethora of state-of-the-art resampling techniques accompanied by multiple models and feature selection procedures are being investigated. Various machine learning pipeline combinations have been benchmarked and compared applying well-known metrics to be incorporated into our system. Specifically, we developed statistical and machine learning techniques and applied them to analyze point cloud data extracted from Terrestrial Laser Scanner in two distinct case studies, involving different geological contexts: the basaltic cliff of Castellfollit de la Roca and the conglomerate Montserrat Massif, both located in Spain. Our experimental data suggest that some of the above-mentioned machine learning pipelines can be utilized to detect rockfall incidents on mountain walls, with experimentally proven accuracy.
Technical Challenges for Training Fair Neural Networks
Cherepanova, Valeriia, Nanda, Vedant, Goldblum, Micah, Dickerson, John P., Goldstein, Tom
As machine learning algorithms have been widely deployed across applications, many concerns have been raised over the fairness of their predictions, especially in high stakes settings (such as facial recognition and medical imaging). To respond to these concerns, the community has proposed and formalized various notions of fairness as well as methods for rectifying unfair behavior. While fairness constraints have been studied extensively for classical models, the effectiveness of methods for imposing fairness on deep neural networks is unclear. In this paper, we observe that these large models overfit to fairness objectives, and produce a range of unintended and undesirable consequences. We conduct our experiments on both facial recognition and automated medical diagnosis datasets using state-of-the-art architectures.
Markov model with machine learning integration for fraud detection in health insurance
Gupta, Rohan Yashraj, Mudigonda, Satya Sai, Baruah, Pallav Kumar, Kandala, Phani Krishna
Fraud has led to a huge addition of expenses in health insurance sector in India. The work is aimed to provide methods applied to health insurance fraud detection. The work presents two approaches - a markov model and an improved markov model using gradient boosting method in health insurance claims. The dataset 382,587 claims of which 38,082 claims are fraudulent. The markov based model gave the accuracy of 94.07% with F1-score at 0.6683. However, the improved markov model performed much better in comparison with the accuracy of 97.10% and F1-score of 0.8546. It was observed that the improved markov model gave much lower false positives compared to markov model.
Causal Discovery of a River Network from its Extremes
Tran, Ngoc Mai, Buck, Johannes, Klüppelberg, Claudia
Causal inference for extremes has only be considered during the past few years. That observations of climate extremes such as floods, hurricanes, and droughts, but also man-made catastrophes like industry fire, terrorist attacks, or crashes of financial markets have been in the focus of research is convincingly documented in the journal Extremes. On the other hand, it is a fundamental problem to assess causality of risks. Often rare events are interconnected; for example, floods disseminate through a river network, and credit markets might fail due to some endogenous systemic risk propagation. Hence, it is necessary to not only understand dependencies between rare events, but also their causal structure.
Quadric hypersurface intersection for manifold learning in feature space
Pavutnitskiy, Fedor, Ivanov, Sergei O., Abramov, Evgeny, Borovitskiy, Viacheslav, Klochkov, Artem, Vialov, Viktor, Zaikovskii, Anatolii, Petiushko, Aleksandr
The knowledge that data lies close to a particular submanifold of the ambient Euclidean space may be useful in a number of ways. For instance, one may want to automatically mark any point far away from the submanifold as an outlier, or to use its geodesic distance to measure similarity between points. Classical problems for manifold learning are often posed in a very high dimension, e.g. for spaces of images or spaces of representations of words. Today, with deep representation learning on the rise in areas such as computer vision and natural language processing, many problems of this kind may be transformed into problems of moderately high dimension, typically of the order of hundreds. Motivated by this, we propose a manifold learning technique suitable for moderately high dimension and large datasets. The manifold is learned from the training data in the form of an intersection of quadric hypersurfaces -- simple but expressive objects. At test time, this manifold can be used to introduce an outlier score for arbitrary new points and to improve a given similarity metric by incorporating learned geometric structure into it.
Fairness-Aware Learning from Corrupted Data
Konstantinov, Nikola, Lampert, Christoph H.
Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. While many approaches have been developed for training fair models from data, little is known about the effects of data corruption on these methods. In this work we consider fairness-aware learning under arbitrary data manipulations. We show that an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data. We also provide upper bounds that match these hardness results up to constant factors, by proving that two natural learning algorithms achieve order-optimal guarantees in terms of both accuracy and fairness under adversarial data manipulations.
CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions
Nassif, Ali Bou, Shahin, Ismail, Hamsa, Shibani, Nemmour, Nawel, Hirose, Keikichi
This work aims at intensifying text-independent speaker identification performance in real application situations such as noisy and emotional talking conditions. This is achieved by incorporating two different modules: a Computational Auditory Scene Analysis CASA based pre-processing module for noise reduction and cascaded Gaussian Mixture Model Convolutional Neural Network GMM-CNN classifier for speaker identification followed by emotion recognition. This research proposes and evaluates a novel algorithm to improve the accuracy of speaker identification in emotional and highly-noise susceptible conditions. Experiments demonstrate that the proposed model yields promising results in comparison with other classifiers when Speech Under Simulated and Actual Stress SUSAS database, Emirati Speech Database ESD, the Ryerson Audio-Visual Database of Emotional Speech and Song RAVDESS database and the Fluent Speech Commands database are used in a noisy environment.
UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading
Barbano, Carlo Alberto, Perlo, Daniele, Tartaglione, Enzo, Fiandrotti, Attilio, Bertero, Luca, Cassoni, Paola, Grangetto, Marco
Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images. We introduce UniToPatho, an annotated dataset of 9536 hematoxylin and eosin (H&E) stained patches extracted from 292 whole-slide images, meant for training deep neural networks for colorectal polyps classification and adenomas grading. We present our dataset and provide insights on how to tackle the problem of automatic colorectal polyps characterization.
Causal Inference for Time series Analysis: Problems, Methods and Evaluation
Moraffah, Raha, Sheth, Paras, Karami, Mansooreh, Bhattacharya, Anchit, Wang, Qianru, Tahir, Anique, Raglin, Adrienne, Liu, Huan
Time series data is a collection of chronological observations which is generated by several domains such as medical and financial fields. Over the years, different tasks such as classification, forecasting, and clustering have been proposed to analyze this type of data. Time series data has been also used to study the effect of interventions over time. Moreover, in many fields of science, learning the causal structure of dynamic systems and time series data is considered an interesting task which plays an important role in scientific discoveries. Estimating the effect of an intervention and identifying the causal relations from the data can be performed via causal inference. Existing surveys on time series discuss traditional tasks such as classification and forecasting or explain the details of the approaches proposed to solve a specific task. In this paper, we focus on two causal inference tasks, i.e., treatment effect estimation and causal discovery for time series data, and provide a comprehensive review of the approaches in each task. Furthermore, we curate a list of commonly used evaluation metrics and datasets for each task and provide in-depth insight. These metrics and datasets can serve as benchmarks for research in the field.
Scale Normalized Image Pyramids with AutoFocus for Object Detection
Singh, Bharat, Najibi, Mahyar, Sharma, Abhishek, Davis, Larry S.
We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it's akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.