Country
Tensor Completion for Weakly-dependent Data on Graph for Metro Passenger Flow Prediction
Li, Ziyue, Sergin, Nurettin Dorukhan, Yan, Hao, Zhang, Chen, Tsung, Fugee
Low-rank tensor decomposition and completion have attracted significant interest from academia given the ubiquity of tensor data. However, the low-rank structure is a global property, which will not be fulfilled when the data presents complex and weak dependencies given specific graph structures. One particular application that motivates this study is the spatiotemporal data analysis. As shown in the preliminary study, weakly dependencies can worsen the low-rank tensor completion performance. In this paper, we propose a novel low-rank CANDECOMP / PARAFAC (CP) tensor decomposition and completion framework by introducing the $L_{1}$-norm penalty and Graph Laplacian penalty to model the weakly dependency on graph. We further propose an efficient optimization algorithm based on the Block Coordinate Descent for efficient estimation. A case study based on the metro passenger flow data in Hong Kong is conducted to demonstrate improved performance over the regular tensor completion methods.
The Use of Machine Learning and Big Five Personality Taxonomy to Predict Construction Workers' Safety Behaviour
Gao, Yifan, Gonzalez, Vicente A., Yiu, Tak Wing, Cabrera-Guerrerod, Guillermo
Research has found that many occupational accidents are foreseeable, being the result of people's unsafe behaviour from a retrospective point of view. The prediction of workers' safety behaviour will enable the prior insights into each worker's behavioural tendency and will be useful in the design of management practices prior to the occurrence of accidents and contribute to the reduction of injury rates. In recent years, researchers have found that people do have stable predispositions to engage in certain safety behavioural patterns which vary among individuals as a function of personality features. In this study, an innovative forecasting model, which employs machine learning algorithms, is developed to estimate construction workers' behavioural tendency based on the Big Five personality taxonomy. The data-driven nature of machine learning technique enabled a reliable estimate of the personality-safety behaviour relationship, which allowed this study to provide novel insight that nonlinearity may exist in the relationship between construction workers' personality traits and safety behaviour. The developed model is found to be sufficient to have satisfactory accuracy in explaining and predicting workers' safety behaviour. This finding provides the empirical evidence to support the usefulness of personality traits as effective predictors of people's safety behaviour at work. In addition, this study could have practical implications. The machine learning model developed can help identify vulnerable workers who are more prone to undertake unsafe behaviours, which is proven to have good prediction accuracy and is thereby potentially useful for decision making and safety management on construction sites.
Bitopological Duality for Algebras of Fittings logic and Natural Duality extension
Das, Litan Kumar, Ray, Kumar Sankar
In this paper, we investigate a bitopological duality for algebras of Fitting's multi-valued logic. We also extend the natural duality theory for ISP I( L) by developing a duality for ISP(L), where L is a finite algebra in which underlying lattice is bounded distributive. Keywords: Bitopology, Fitting's logic, Natural duality theory. 1 Introduction Stone's pioneering work in the mid 1930 [19] on the dual equivalence between the category of Boolean algebras and homomorphism, and the category of Stone spaces(compact zero-dimensional Hausdorff spaces) and continuous maps, is being considered as the origin of duality theory. Stone further developed a general work [12] for the category of bounded distributive lattices in 1937. Priestley in 1970 [18] investigate another duality for the category of bounded distributive lattices with the help of ordered Stone spaces(known as Priesley spaces), which overcome difficulties in Stone's work [12].
$\Sigma$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction
Schlemper, Jo, Qin, Chen, Duan, Jinming, Summers, Ronald M., Hammernik, Kerstin
We explore an ensembled $\Sigma$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $\Sigma$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal mappings, gradient descent and variable splitting. A semi-supervised finetuning scheme allows us to adapt to the k-space data at test time, which, however, decreases the quantitative metrics, although generating the visually most textured and sharp images. For this challenge, we focused on robust and high SSIM scores, which we achieved by ensembling all models to a $\Sigma$-net.
Identifying Mislabeled Instances in Classification Datasets
Müller, Nicolas Michael, Markert, Karla
A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be performed by human domain experts. This process tends to be expensive both in time and money, and is prone to errors. Additionally, reviewing an entire labeled dataset manually is often prohibitively costly, so many real world datasets contain mislabeled instances. To address this issue, we present in this paper a non-parametric end-to-end pipeline to find mislabeled instances in numerical, image and natural language datasets. We evaluate our system quantitatively by adding a small number of label noise to 29 datasets, and show that we find mislabeled instances with an average precision of more than 0.84 when reviewing our system's top 1\% recommendation. We then apply our system to publicly available datasets and find mislabeled instances in CIFAR-100, Fashion-MNIST, and others. Finally, we publish the code and an applicable implementation of our approach.
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
Wan, Yu, Yang, Baosong, Wong, Derek F., Chao, Lidia S., Du, Haihua, Ao, Ben C. H.
As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.
Callisto: Entropy based test generation and data quality assessment for Machine Learning Systems
Udeshi, Sakshi, Jiang, Xingbin, Chattopadhyay, Sudipta
Machine Learning (ML) has seen massive progress in the last decade and as a result, there is a pressing need for validating ML-based systems. To this end, we propose, design and evaluate CALLISTO - a novel test generation and data quality assessment framework. To the best of our knowledge, CALLISTO is the first blackbox framework to leverage the uncertainty in the prediction and systematically generate new test cases for ML classifiers. Our evaluation of CALLISTO on four real world data sets reveals thousands of errors. We also show that leveraging the uncertainty in prediction can increase the number of erroneous test cases up to a factor of 20, as compared to when no such knowledge is used for testing. CALLISTO has the capability to detect low quality data in the datasets that may contain mislabelled data. We conduct and present an extensive user study to validate the results of CALLISTO on identifying low quality data from four state-of-the-art real world datasets.
Unwanted Advances in Higher Education: Uncovering Sexual Harassment Experiences in Academia with Text Mining
Karami, Amir, White, Cynthia Nicole, Ford, Kayla, Swan, Suzanne, Spinel, Melek Yildiz
Sexual harassment in academia is often a hidden problem because victims are usually reluctant to report their experiences. Recently, a web survey was developed to provide an opportunity to share thousands of sexual harassment experiences in academia. Using an efficient approach, this study collected and investigated more than 2,000 sexual harassment experiences to better understand these unwanted advances in higher education. This paper utilized text mining to disclose hidden topics and explore their weight across three variables: harasser gender, institution type, and victim's field of study. We mapped the topics on five themes drawn from the sexual harassment literature and found that more than 50% of the topics were assigned to the unwanted sexual attention theme. Fourteen percent of the topics were in the gender harassment theme, in which insulting, sexist, or degrading comments or behavior was directed towards women. Five percent of the topics involved sexual coercion (a benefit is offered in exchange for sexual favors), 5% involved sex discrimination, and 7% of the topics discussed retaliation against the victim for reporting the harassment, or for simply not complying with the harasser. Findings highlight the power differential between faculty and students, and the toll on students when professors abuse their power. While some topics did differ based on type of institution, there were no differences between the topics based on gender of harasser or field of study. This research can be beneficial to researchers in further investigation of this paper's dataset, and to policymakers in improving existing policies to create a safe and supportive environment in academia.