Accuracy
TextDefense: Adversarial Text Detection based on Word Importance Entropy
Shen, Lujia, Zhang, Xuhong, Ji, Shouling, Pu, Yuwen, Ge, Chunpeng, Yang, Xing, Feng, Yanghe
Currently, natural language processing (NLP) models are wildly used in various scenarios. However, NLP models, like all deep models, are vulnerable to adversarially generated text. Numerous works have been working on mitigating the vulnerability from adversarial attacks. Nevertheless, there is no comprehensive defense in existing works where each work targets a specific attack category or suffers from the limitation of computation overhead, irresistible to adaptive attack, etc. In this paper, we exhaustively investigate the adversarial attack algorithms in NLP, and our empirical studies have discovered that the attack algorithms mainly disrupt the importance distribution of words in a text. A well-trained model can distinguish subtle importance distribution differences between clean and adversarial texts. Based on this intuition, we propose TextDefense, a new adversarial example detection framework that utilizes the target model's capability to defend against adversarial attacks while requiring no prior knowledge. TextDefense differs from previous approaches, where it utilizes the target model for detection and thus is attack type agnostic. Our extensive experiments show that TextDefense can be applied to different architectures, datasets, and attack methods and outperforms existing methods. We also discover that the leading factor influencing the performance of TextDefense is the target model's generalizability. By analyzing the property of the target model and the property of the adversarial example, we provide our insights into the adversarial attacks in NLP and the principles of our defense method.
SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection
Barbalau, Antonio, Ionescu, Radu Tudor, Georgescu, Mariana-Iuliana, Dueholm, Jacob, Ramachandra, Bharathkumar, Nasrollahi, Kamal, Khan, Fahad Shahbaz, Moeslund, Thomas B., Shah, Mubarak
A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.
Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP
Han, Xudong, Baldwin, Timothy, Cohn, Trevor
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct. However current progress is hampered by a plurality of definitions of bias, means of quantification, and oftentimes vague relation between debiasing algorithms and theoretical measures of bias. This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning, with two key contributions: (1) making clear inter-relations among the current gamut of methods, and their relation to fairness theory; and (2) addressing the practical problem of model selection, which involves a trade-off between fairness and accuracy and has led to systemic issues in fairness research. Putting them together, we make several recommendations to help shape future work.
Emotion Detection From Social Media Posts
Rahman, Md Mahbubur, Shova, Shaila
Over the last few years, social media has evolved into a medium for expressing personal views, emotions, and even business and political proposals, recommendations, and advertisements. We address the topic of identifying emotions from text data obtained from social media posts like Twitter in this research. We have deployed different traditional machine learning techniques such as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and Random Forest, as well as deep neural network models such as LSTM, CNN, GRU, BiLSTM, BiGRU to classify these tweets into four emotion categories (Fear, Anger, Joy, and Sadness). Furthermore, we have constructed a BiLSTM and BiGRU ensemble model. The evaluation result shows that the deep neural network models(BiGRU, to be specific) produce the most promising results compared to traditional machine learning models, with an 87.53 % accuracy rate. The ensemble model performs even better (87.66 %), albeit the difference is not significant. This result will aid in the development of a decision-making tool that visualizes emotional fluctuations.
TPE-Net: Track Point Extraction and Association Network for Rail Path Proposal Generation
Kang, Jungwon, Ghorbanalivakili, Mohammadjavad, Sohn, Gunho, Beach, David, Marin, Veronica
One essential feature of an autonomous train is minimizing collision risks with third-party objects. To estimate the risk, the control system must identify topological information of all the rail routes ahead on which the train can possibly move, especially within merging or diverging rails. This way, the train can figure out the status of potential obstacles with respect to its route and hence, make a timely decision. Numerous studies have successfully extracted all rail tracks as a whole within forward-looking images without considering element instances. Still, some image-based methods have employed hard-coded prior knowledge of railway geometry on 3D data to associate left-right rails and generate rail route instances. However, we propose a rail path extraction pipeline in which left-right rail pixels of each rail route instance are extracted and associated through a fully convolutional encoder-decoder architecture called TPE-Net. Two different regression branches for TPE-Net are proposed to regress the locations of center points of each rail route, along with their corresponding left-right pixels. Extracted rail pixels are then spatially clustered to generate topological information of all the possible train routes (ego-paths), discarding non-ego-path ones. Experimental results on a challenging, publicly released benchmark show true-positive-pixel level average precision and recall of 0.9207 and 0.8721, respectively, at about 12 frames per second. Even though our evaluation results are not higher than the SOTA, the proposed regression pipeline performs remarkably in extracting the correspondences by looking once at the image. It generates strong rail route hypotheses without reliance on camera parameters, 3D data, and geometrical constraints.
Bootstrapping Multilingual Semantic Parsers using Large Language Models
Awasthi, Abhijeet, Gupta, Nitish, Samanta, Bidisha, Dave, Shachi, Sarawagi, Sunita, Talukdar, Partha
Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, translation services may continue to be brittle due to domain mismatch between task-specific input text and general-purpose text used for training translation models. For multilingual semantic parsing, we demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting. Through extensive comparisons on two public datasets, MTOP and MASSIVE, spanning 50 languages and several domains, we show that our method of translating data using LLMs outperforms a strong translate-train baseline on 41 out of 50 languages. We study the key design choices that enable more effective multilingual data translation via prompted LLMs.
Evaluation of Data Augmentation and Loss Functions in Semantic Image Segmentation for Drilling Tool Wear Detection
Schlager, Elke, Windisch, Andreas, Hanna, Lukas, Klรผnsner, Thomas, Hagendorfer, Elias Jan, Teppernegg, Tamara
Tool wear monitoring is crucial for quality control and cost reduction in manufacturing processes, of which drilling applications are one example. In this paper, we present a U-Net based semantic image segmentation pipeline, deployed on microscopy images of cutting inserts, for the purpose of wear detection. The wear area is differentiated in two different types, resulting in a multiclass classification problem. Joining the two wear types in one general wear class, on the other hand, allows the problem to be formulated as a binary classification task. Apart from the comparison of the binary and multiclass problem, also different loss functions, i. e., Cross Entropy, Focal Cross Entropy, and a loss based on the Intersection over Union (IoU), are investigated. Furthermore, models are trained on image tiles of different sizes, and augmentation techniques of varying intensities are deployed. We find, that the best performing models are binary models, trained on data with moderate augmentation and an IoU-based loss function.
CCDN: Checkerboard Corner Detection Network for Robust Camera Calibration
Chen, Ben, Xiong, Caihua, Zhang, Qi
Aiming to improve the checkerboard corner detection robustness against the images with poor quality, such as lens distortion, extreme poses, and noise, we propose a novel detection algorithm which can maintain high accuracy on inputs under multiply scenarios without any prior knowledge of the checkerboard pattern. This whole algorithm includes a checkerboard corner detection network and some post-processing techniques. The network model is a fully convolutional network with improvements of loss function and learning rate, which can deal with the images of arbitrary size and produce correspondingly-sized output with a corner score on each pixel by efficient inference and learning. Besides, in order to remove the false positives, we employ three post-processing techniques including threshold related to maximum response, non-maximum suppression, and clustering. Evaluations on two different datasets show its superior robustness, accuracy and wide applicability in quantitative comparisons with the state-of-the-art methods, like MATE, ChESS, ROCHADE and OCamCalib.
Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images
Zhang, Deyun, Geng, Shijia, Zhou, Yang, Xu, Weilun, Wei, Guodong, Wang, Kai, Yu, Jie, Zhu, Qiang, Li, Yongkui, Zhao, Yonghong, Chen, Xingyue, Zhang, Rui, Fu, Zhaoji, Zhou, Rongbo, E, Yanqi, Fan, Sumei, Zhao, Qinghao, Cheng, Chuandong, Peng, Nan, Zhang, Liang, Zheng, Linlin, Chu, Jianjun, Xu, Hongbin, Tan, Chen, Liu, Jian, Tao, Huayue, Liu, Tong, Chen, Kangyin, Jiang, Chenyang, Liu, Xingpeng, Hong, Shenda
The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this study, we present an AI system developed to detect and screen cardiac abnormalities (CAs) from real-world ECG images. The system was evaluated on a large dataset of 52,357 patients from multiple regions and populations across the world. On the detection task, the AI system obtained area under the receiver operating curve (AUC) of 0.996 (hold-out test), 0.994 (external test 1), 0.984 (external test 2), and 0.979 (external test 3), respectively. Meanwhile, the detection results of AI system showed a strong correlation with the diagnosis of cardiologists (cardiologist 1 (R=0.794, p<1e-3), cardiologist 2 (R=0.812, p<1e-3)). On the screening task, the AI system achieved AUCs of 0.894 (hold-out test) and 0.850 (external test). The screening performance of the AI system was better than that of the cardiologists (AI system (0.846) vs. cardiologist 1 (0.520) vs. cardiologist 2 (0.480)). Our study demonstrates the feasibility of an accurate, objective, easy-to-use, fast, and low-cost AI system for CA detection and screening. The system has the potential to be used by healthcare professionals, caregivers, and general users to assess CAs based on real-world ECG images.
Sequential Strategic Screening
Cohen, Lee, Sharifi-Malvajerdi, Saeed, Stangl, Kevin, Vakilian, Ali, Ziani, Juba
We initiate the study of strategic behavior in screening processes with multiple classifiers. We focus on two contrasting settings: a conjunctive setting in which an individual must satisfy all classifiers simultaneously, and a sequential setting in which an individual to succeed must satisfy classifiers one at a time. In other words, we introduce the combination of strategic classification with screening processes. We show that sequential screening pipelines exhibit new and surprising behavior where individuals can exploit the sequential ordering of the tests to zig-zag between classifiers without having to simultaneously satisfy all of them. We demonstrate an individual can obtain a positive outcome using a limited manipulation budget even when far from the intersection of the positive regions of every classifier. Finally, we consider a learner whose goal is to design a sequential screening process that is robust to such manipulations, and provide a construction for the learner that optimizes a natural objective.