Goto

Collaborating Authors

 Aneja, Nagender


Predictive linguistic cues for fake news: a societal artificial intelligence problem

arXiv.org Artificial Intelligence

Media news are making a large part of public opinion and, therefore, must not be fake. News on web sites, blogs, and social media must be analyzed before being published. In this paper, we present linguistic characteristics of media news items to differentiate between fake news and real news using machine learning algorithms. Neural fake news generation, headlines created by machines, semantic incongruities in text and image captions generated by machine are other types of fake news problems. These problems use neural networks which mainly control distributional features rather than evidence. We propose applying correlation between features set and class, and correlation among the features to compute correlation attribute evaluation metric and covariance metric to compute variance of attributes over the news items. Features unique, negative, positive, and cardinal numbers with high values on the metrics are observed to provide a high area under the curve (AUC) and F1-score.


NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv.org Artificial Intelligence

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (\url{https://github.com/GEM-benchmark/NL-Augmenter}).


Network Traffic Analysis based IoT Device Identification

arXiv.org Artificial Intelligence

Device identification is the process of identifying a device on Internet without using its assigned network or other credentials. The sharp rise of usage in Internet of Things (IoT) devices has imposed new challenges in device identification due to a wide variety of devices, protocols and control interfaces. In a network, conventional IoT devices identify each other by utilizing IP or MAC addresses, which are prone to spoofing. Moreover, IoT devices are low power devices with minimal embedded security solution. To mitigate the issue in IoT devices, fingerprint (DFP) for device identification can be used. DFP identifies a device by using implicit identifiers, such as network traffic (or packets), radio signal, which a device used for its communication over the network. These identifiers are closely related to the device hardware and software features. In this paper, we exploit TCP/IP packet header features to create a device fingerprint utilizing device originated network packets. We present a set of three metrics which separate some features from a packet which contribute actively for device identification. To evaluate our approach, we used publicly accessible two datasets. We observed the accuracy of device genre classification 99.37% and 83.35% of accuracy in the identification of an individual device from IoT Sentinel dataset. However, using UNSW dataset device type identification accuracy reached up to 97.78%.


Neural Machine Translation model for University Email Application

arXiv.org Artificial Intelligence

Machine translation has many applications such as news translation, email translation, official letter translation etc. Commercial translators, e.g. Google Translation lags in regional vocabulary and are unable to learn the bilingual text in the source and target languages within the input. In this paper, a regional vocabulary-based application-oriented Neural Machine Translation (NMT) model is proposed over the data set of emails used at the University for communication over a period of three years. A state-of-the-art Sequence-to-Sequence Neural Network for ML -> EN and EN -> ML translations is compared with Google Translate using Gated Recurrent Unit Recurrent Neural Network machine translation model with attention decoder. The low BLEU score of Google Translation in comparison to our model indicates that the application based regional models are better. The low BLEU score of EN -> ML of our model and Google Translation indicates that the Malay Language has complex language features corresponding to English.


Semi-Supervised Learning for Cancer Detection of Lymph Node Metastases

arXiv.org Artificial Intelligence

Pathologists find tedious to examine the status of the sentinel lymph node on a large number of pathological scans. The examination process of such lymph node which encompasses metastasized cancer cells is histopathologically organized. However, the task of finding metastatic tissues is gradual which is often challenging. In this work, we present our deep convolutional neural network based model validated on PatchCamelyon (PCam) benchmark dataset for fundamental machine learning research in histopathology diagnosis. We find that our proposed model trained with a semi-supervised learning approach by using pseudo labels on PCam-level significantly leads to better performances to strong CNN baseline on the AUC metric.