AITopics | Banff

Collaborating Authors

Banff

Measuring the Mixing of Contextual Information in the Transformer

Ferrando, Javier, Gállego, Gerard I., Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceOct-21-2022

The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model. Additionally, recent works have demonstrated that attention weights alone are not enough to describe the flow of information. In this paper, we consider the whole attention block -- multi-head attention, residual connection, and layer normalization -- and define a metric to measure token-to-token interactions within each layer. Then, we aggregate layer-wise interpretations to provide input attribution scores for model predictions. Experimentally, we show that our method, ALTI (Aggregation of Layer-wise Token-to-token Interactions), provides more faithful explanations and increased robustness than gradient-based methods.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2203.04212

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Hong Kong (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Transformer-based Entity Typing in Knowledge Graphs

Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceOct-20-2022

We investigate the knowledge graph entity typing task which aims at inferring plausible entity types. In this paper, we propose a novel Transformer-based Entity Typing (TET) approach, effectively encoding the content of neighbors of an entity. More precisely, TET is composed of three different mechanisms: a local transformer allowing to infer missing types of an entity by independently encoding the information provided by each of its neighbors; a global transformer aggregating the information of all neighbors of an entity into a single long sequence to reason about more complex entity types; and a context transformer integrating neighbors content based on their contribution to the type inference through information exchange between neighbor pairs. Furthermore, TET uses information about class membership of types to semantically strengthen the representation of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.11151

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(21 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

Sachdeva, Rachneet, Puerto, Haritz, Baumgärtner, Tim, Tariverdian, Sewin, Zhang, Hao, Wang, Kexin, Saadi, Hossain Shaikh, Ribeiro, Leonardo F. R., Gurevych, Iryna

arXiv.org Artificial IntelligenceOct-20-2022

Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de.

machine learning, natural language, prediction, (19 more...)

arXiv.org Artificial Intelligence

2208.09316

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(9 more...)

Genre: Research Report (0.70)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation

Mao, Xin, Wang, Wenting, Wu, Yuanbin, Lan, Man

arXiv.org Artificial IntelligenceOct-20-2022

Entity Alignment (EA) aims to find equivalent entity pairs between KGs, which is the core step of bridging and integrating multi-source KGs. In this paper, we argue that existing GNN-based EA methods inherit the inborn defects from their neural network lineage: weak scalability and poor interpretability. Inspired by recent studies, we reinvent the Label Propagation algorithm to effectively run on KGs and propose a non-neural EA framework -- LightEA, consisting of three efficient components: (i) Random Orthogonal Label Generation, (ii) Three-view Label Propagation, and (iii) Sparse Sinkhorn Iteration. According to the extensive experiments on public datasets, LightEA has impressive scalability, robustness, and interpretability. With a mere tenth of time consumption, LightEA achieves comparable results to state-of-the-art methods across all datasets and even surpasses them on many.

lightea, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.10436

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Virginia (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval

Weng, Chung-Wei, Yakimenka, Yauhen, Lin, Hsuan-Yin, Rosnes, Eirik, Kliewer, Joerg

arXiv.org Artificial IntelligenceOct-19-2022

We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIFS.2022.3203320

2012.03902

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
(30 more...)

Genre: Research Report (0.63)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Perils of Cascading Robust Classifiers

Mangal, Ravi, Wang, Zifan, Zhang, Chi, Leino, Klas, Pasareanu, Corina, Fredrikson, Matt

arXiv.org Artificial IntelligenceOct-19-2022

Ensembling certifiably robust neural networks is a promising approach for improving the \emph{certified robust accuracy} of neural models. Black-box ensembles that assume only query-access to the constituent models (and their robustness certifiers) during prediction are particularly attractive due to their modular structure. Cascading ensembles are a popular instance of black-box ensembles that appear to improve certified robust accuracies in practice. However, we show that the robustness certifier used by a cascading ensemble is unsound. That is, when a cascading ensemble is certified as locally robust at an input $x$ (with respect to $\epsilon$), there can be inputs $x'$ in the $\epsilon$-ball centered at $x$, such that the cascade's prediction at $x'$ is different from $x$ and thus the ensemble is not locally robust. Our theoretical findings are accompanied by empirical results that further demonstrate this unsoundness. We present \emph{cascade attack} (CasA), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set. Our work reveals a critical pitfall of cascading certifiably robust models by showing that the seemingly beneficial strategy of cascading can actually hurt the robustness of the resulting ensemble. Our code is available at \url{https://github.com/TristaChi/ensembleKW}.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.00278

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Government (0.48)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Face Pasting Attack

Bunzel, Niklas, Graner, Lukas

arXiv.org Artificial IntelligenceOct-19-2022

Cujo AI and Adversa AI hosted the MLSec face recognition challenge. The goal was to attack a black box face recognition model with targeted attacks. The model returned the confidence of the target class and a stealthiness score. For an attack to be considered successful the target class has to have the highest confidence among all classes and the stealthiness has to be at least 0.5. In our approach we paste the face of a target into a source image. By utilizing position, scaling, rotation and transparency attributes we reached 3rd place. Our approach took approximately 200 queries per attack for the final highest score and about ~7.7 queries minimum for a successful attack. The code is available at https://github.com/bunni90/FacePastingAttack .

artificial intelligence, machine learning, stealthiness, (14 more...)

arXiv.org Artificial Intelligence

2210.09153

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Lethal Dose Conjecture on Data Poisoning

Wang, Wenxiao, Levine, Alexander, Feizi, Soheil

arXiv.org Artificial IntelligenceOct-18-2022

Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.

artificial intelligence, learner, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2208.03309

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
(9 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unsupervised Optimal Power Flow Using Graph Neural Networks

Owerko, Damian, Gama, Fernando, Ribeiro, Alejandro

arXiv.org Artificial IntelligenceOct-17-2022

Optimal power flow (OPF) is a critical optimization problem that allocates power to the generators in order to satisfy the demand at a minimum cost. Solving this problem exactly is computationally infeasible in the general case. In this work, we propose to leverage graph signal processing and machine learning. More specifically, we use a graph neural network to learn a nonlinear parametrization between the power demanded and the corresponding allocation. We learn the solution in an unsupervised manner, minimizing the cost directly. In order to take into account the electrical constraints of the grid, we propose a novel barrier method that is differentiable and works on initially infeasible points. We show through simulations that the use of GNNs in this unsupervised learning context leads to solutions comparable to standard solvers while being computationally efficient and avoiding constraint violations most of the time.

artificial intelligence, constraint, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2210.09277

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Disentangled Representation Learning for RF Fingerprint Extraction under Unknown Channel Statistics

Xie, Renjie, Xu, Wei, Yu, Jiabao, Hu, Aiqun, Ng, Derrick Wing Kwan, Swindlehurst, A. Lee

arXiv.org Artificial IntelligenceOct-17-2022

Deep learning (DL) applied to a device's radio-frequency fingerprint~(RFF) has attracted significant attention in physical-layer authentication due to its extraordinary classification performance. Conventional DL-RFF techniques are trained by adopting maximum likelihood estimation~(MLE). Although their discriminability has recently been extended to unknown devices in open-set scenarios, they still tend to overfit the channel statistics embedded in the training dataset. This restricts their practical applications as it is challenging to collect sufficient training data capturing the characteristics of all possible wireless channel environments. To address this challenge, we propose a DL framework of disentangled representation~(DR) learning that first learns to factor the signals into a device-relevant component and a device-irrelevant component via adversarial learning. Then, it shuffles these two parts within a dataset for implicit data augmentation, which imposes a strong regularization on RFF extractor learning to avoid the possible overfitting of device-irrelevant channel statistics, without collecting additional data from unknown channels. Experiments validate that the proposed approach, referred to as DR-based RFF, outperforms conventional methods in terms of generalizability to unknown devices even under unknown complicated propagation environments, e.g., dispersive multipath fading channels, even though all the training data are collected in a simple environment with dominated direct line-of-sight~(LoS) propagation paths.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2208.02724

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(13 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback