Goto

Collaborating Authors

 Performance Analysis


Improving Fake News Detection of Influential Domain via Domain- and Instance-Level Transfer

arXiv.org Artificial Intelligence

Both real and fake news in various domains, such as politics, health, and entertainment are spread via online social media every day, necessitating fake news detection for multiple domains. Among them, fake news in specific domains like politics and health has more serious potential negative impacts on the real world (e.g., the infodemic led by COVID-19 misinformation). Previous studies focus on multi-domain fake news detection, by equally mining and modeling the correlation between domains. However, these multi-domain methods suffer from a seesaw problem: the performance of some domains is often improved at the cost of hurting the performance of other domains, which could lead to an unsatisfying performance in specific domains. To address this issue, we propose a Domain- and Instance-level Transfer Framework for Fake News Detection (DITFEND), which could improve the performance of specific target domains. To transfer coarse-grained domain-level knowledge, we train a general model with data of all domains from the meta-learning perspective. To transfer fine-grained instance-level knowledge and adapt the general model to a target domain, we train a language model on the target domain to evaluate the transferability of each data instance in source domains and re-weigh each instance's contribution. Offline experiments on two datasets demonstrate the effectiveness of DITFEND. Online experiments show that DITFEND brings additional improvements over the base models in a real-world scenario.


Quantifying Social Biases Using Templates is Unreliable

arXiv.org Artificial Intelligence

Recently, there has been an increase in efforts to understand how large language models (LLMs) propagate and amplify social biases. Several works have utilized templates for fairness evaluation, which allow researchers to quantify social biases in the absence of test sets with protected attribute labels. While template evaluation can be a convenient and helpful diagnostic tool to understand model deficiencies, it often uses a simplistic and limited set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking. Specifically, we investigate the instability of bias measurements by manually modifying templates proposed in previous works in a semantically-preserving manner and measuring bias across these modifications. We find that bias values and resulting conclusions vary considerably across template modifications on four tasks, ranging from an 81% reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution.


Inspection-L: Self-Supervised GNN Node Embeddings for Money Laundering Detection in Bitcoin

arXiv.org Artificial Intelligence

Criminals have become increasingly experienced in using cryptocurrencies, such as Bitcoin, for money laundering. The use of cryptocurrencies can hide criminal identities and transfer hundreds of millions of dollars of dirty funds through their criminal digital wallets. However, this is considered a paradox because cryptocurrencies are goldmines for open-source intelligence, giving law enforcement agencies more power when conducting forensic analyses. This paper proposed Inspection-L, a graph neural network (GNN) framework based on a self-supervised Deep Graph Infomax (DGI) and Graph Isomorphism Network (GIN), with supervised learning algorithms, namely Random Forest (RF), to detect illicit transactions for anti-money laundering (AML). To the best of our knowledge, our proposal is the first to apply self-supervised GNNs to the problem of AML in Bitcoin. The proposed method was evaluated on the Elliptic dataset and shows that our approach outperforms the state-of-the-art in terms of key classification metrics, which demonstrates the potential of self-supervised GNN in the detection of illicit cryptocurrency transactions.


FDA Publishes Updated List With 521 Authorized AI/ML Enabled Devices

#artificialintelligence

Since 1995, the FDA has authorized more than 500 AI/ML-enabled medical devices via 510(k) clearance, granted De Novo request, or approved PMA. This week the FDA published an updated list with 178 new devices that were authorized through July 2022. According to the FDA, their list is based on publicly available information and is not a comprehensive resource of FDA approved AI/ML-enabled medical devices. In today's DeepTech newsletter I'm sharing a high level analysis of the 521 devices on the list, charts to visualize the data, and a summary of milestones. Note: According to the FDA their list is based on publicly available information and is not a comprehensive resource of approved AI/ML-enabled medical devices.


Modeling Dependent Structure for Utterances in ASR Evaluation

arXiv.org Artificial Intelligence

The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluation. To deal with dependent speech data, the blockwise bootstrap approach is also introduced. By dividing utterances into uncorrelated blocks, this approach resamples these blocks instead of original data. However, it is typically nontrivial to uncover the dependent structure among utterances and identify the blocks, which might lead to subjective conclusions in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate uncorrelated blocks of utterances in a rigorous way, after which blockwise bootstrap is applied on top of the inferred blocks. We show the resulting variance estimator of WER in ASR evaluation is statistically consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech dataset.


Performances of Symmetric Loss for Private Data from Exponential Mechanism

arXiv.org Artificial Intelligence

This study explores the robustness of learning by symmetric loss on private data. Specifically, we leverage exponential mechanism (EM) on private labels. First, we theoretically re-discussed properties of EM when it is used for private learning with symmetric loss. Then, we propose numerical guidance of privacy budgets corresponding to different data scales and utility guarantees. Further, we conducted experiments on the CIFAR-10 dataset to present the traits of symmetric loss. Since EM is a more generic differential privacy (DP) technique, it being robust has the potential for it to be generalized, and to make other DP techniques more robust.


Detecting Label Errors in Token Classification Data

arXiv.org Artificial Intelligence

Mislabeled examples are a common issue in real-world data, particularly for tasks like token classification where many labels must be chosen on a fine-grained basis. Here we consider the task of finding sentences that contain label errors in token classification datasets. We study 11 different straightforward methods that score tokens/sentences based on the predicted class probabilities output by a (any) token classification model (trained via any procedure). In precision-recall evaluations based on real-world label errors in entity recognition data from CoNLL-2003, we identify a simple and effective method that consistently detects those sentences containing label errors when applied with different token classification models.


The fight against money laundering: Machine learning is a game changer

#artificialintelligence

The volume of money laundering and other financial crimes is growing worldwide--and the techniques used to evade their detection are becoming ever more sophisticated. This has elicited a vigorous response from banks, which, collectively, are investing billions each year to improve their defenses against financial crime (in 2020, institutions spent an estimated $214 billion on financial-crime compliance). 1 1. What's more, the resulting regulatory fines related to compliance are surging year over year as regulator's impose tougher penalties. But banks' traditional rule- and scenario-based approaches to fighting financial crimes has always seemed a step behind the bad guys, making the fight against money laundering an ongoing challenge for compliance, monitoring, and risk organizations. Now, there is an opportunity for banks to get out in front.


ROC and AUC for Model Evaluation

#artificialintelligence

ROC or Receiver Operating Characteristic Curve is the most frequently used tool for evaluating the binary or multi-class classification model. Unlike other metrics, it is calculated on prediction scores like Precision-Recall Curve instead of prediction class. In my previous post, the importance of the precision-recall curve is highlighted as how to plot for multi-class classification. To understand ROC Curve, let's quickly refresh our memory on the possible outcomes in a binary classification problem by referring to the Confusion Matrix. ROC Curve is a plot of True Positive Rate(TPR) plotted against False Positive Rate(FPR) at various threshold values. It helps to visualize how threshold affects classifier performance.


Constructing Prediction Intervals with Neural Networks: An Empirical Evaluation of Bootstrapping and Conformal Inference Methods

arXiv.org Artificial Intelligence

Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs. A two-step experiment is executed across 11 data sets, including an imaged-based data set. Two distribution-free methods for constructing PIs, bootstrapping and conformal inference, are considered. The results of the first experimental step reveal that the choices inherent to building an ANN affect PI performance. Guidance is provided for optimizing PI performance with respect to each network feature and PI method. In the second step, 20 algorithms for constructing PIs, each using the principles of bootstrapping or conformal inference, are implemented to determine which provides the best performance while maintaining reasonable computational burden. In general, this trade-off is optimized when implementing the cross-conformal method, which maintained interval coverage and efficiency with decreased computational burden.