AITopics | Gittens, Alex

Plotting

Gittens, Alex

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Replacing Paths with Connection-Biased Attention for Knowledge Graph Completion

Dutta, Sharmishtha, Gittens, Alex, Zaki, Mohammed J., Aggarwal, Charu C.

arXiv.org Artificial IntelligenceDec-19-2024

Knowledge graph (KG) completion aims to identify additional facts that can be inferred from the existing facts in the KG. Recent developments in this field have explored this task in the inductive setting, where at test time one sees entities that were not present during training; the most performant models in the inductive setting have employed path encoding modules in addition to standard subgraph encoding modules. This work similarly focuses on KG completion in the inductive setting, without the explicit use of path encodings, which can be time-consuming and introduces several hyperparameters that require costly hyperparameter optimization. Our approach uses a Transformer-based subgraph encoding module only; we introduce connection-biased attention and entity role embeddings into the subgraph encoding module to eliminate the need for an expensive and time-consuming path encoding module. Evaluations on standard inductive KG completion benchmark datasets demonstrate that our \textbf{C}onnection-\textbf{B}iased \textbf{Li}nk \textbf{P}rediction (CBLiP) model has superior performance to models that do not use path information. Compared to models that utilize path information, CBLiP shows competitive or superior performance while being faster. Additionally, to show that the effectiveness of connection-biased attention and entity role embeddings also holds in the transductive setting, we compare CBLiP's performance on the relation prediction task in the transductive setting.

machine learning, natural language, relation, (20 more...)

arXiv.org Artificial Intelligence

2410.00876

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Aligners: Decoupling LLMs and Alignment

Ngweta, Lilian, Agarwal, Mayank, Maity, Subha, Gittens, Alex, Sun, Yuekai, Yurochkin, Mikhail

arXiv.org Artificial IntelligenceJun-16-2024

Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications. Alignment is challenging, costly, and needs to be repeated for every LLM and alignment criterion. We propose to decouple LLMs and alignment by training aligner models that can be used to align any LLM for a given criteria on an as-needed basis, thus also reducing the potential negative impacts of alignment on performance. Our recipe for training the aligner models solely relies on synthetic data generated with a (prompted) LLM and can be easily adjusted for a variety of alignment criteria. We use the same synthetic data to train inspectors, binary miss-alignment classification models to guide a "squad" of multiple aligners. Our empirical results demonstrate consistent improvements when applying aligner squad to various LLMs, including chat-aligned models, across several instruction-following and red-teaming datasets.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.04224

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Law Enforcement & Public Safety (1.00)
Law (0.93)
Information Technology (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction

Saha, Anik, Hassanzadeh, Oktie, Gittens, Alex, Ni, Jian, Srinivas, Kavitha, Yener, Bulent

arXiv.org Artificial IntelligenceAug-7-2023

Causal knowledge extraction is the task of extracting relevant causes and effects from text by detecting the causal relation. Although this task is important for language understanding and knowledge discovery, recent works in this domain have largely focused on binary classification of a text segment as causal or non-causal. In this regard, we perform a thorough analysis of three sequence tagging models for causal knowledge extraction and compare it with a span based approach to causality extraction. Our experiments show that embeddings from pre-trained language models (e.g. BERT) provide a significant performance boost on this task compared to previous state-of-the-art models with complex architectures. We observe that span based models perform better than simple sequence tagging models based on BERT across all 4 data sets from diverse domains with different types of cause-effect phrases.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.03891

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning

Koyuncu, Deniz, Gittens, Alex, Yener, Bülent, Yung, Moti

arXiv.org Artificial IntelligenceMay-31-2023

Inference of causal structures from observational data is a key component of causal machine learning; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate causal structural models (SCMs). However, when the data can be audited for correctness (e.g., it is crytographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner. Theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given for Gaussian SCMs. Experimental validation of these approaches on real and synthetic data sets demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.

artificial intelligence, machine learning, missingness mechanism, (18 more...)

arXiv.org Artificial Intelligence

2305.20043

Country:

North America > United States > New York (0.14)
North America > United States > Virginia (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Simple Disentanglement of Style and Content in Visual Representations

Ngweta, Lilian, Maity, Subha, Gittens, Alex, Sun, Yuekai, Yurochkin, Mikhail

arXiv.org Artificial IntelligenceMay-31-2023

Learning visual representations with interpretable features, i.e., disentangled representations, remains a challenging problem. Existing methods demonstrate some success but are hard to apply to large-scale vision datasets like ImageNet. In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models. We model the pre-trained features probabilistically as linearly entangled combinations of the latent content and style factors and develop a simple disentanglement algorithm based on the probabilistic model. We show that the method provably disentangles content and style features and verify its efficacy empirically. Our post-processed features yield significant domain generalization performance improvements when the distribution shift occurs due to style changes or style-related spurious correlations.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.09795

Country:

North America > United States > Michigan (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > New York (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Word Sense Induction with Knowledge Distillation from BERT

Saha, Anik, Gittens, Alex, Yener, Bulent

arXiv.org Artificial IntelligenceApr-20-2023

Bülent Yener Department of Computer Science Rensselaer Polytechnic Institute 110 8th St, Troy, NY, USA yener@cs.rpi.edu Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings on multiple benchmark data sets, and experiments with an embedding-based topic model (ETM) demonstrates the benefits of using this multi-sense embedding in a downstream application. While modern deep contextual word embeddings have dramatically improved the state-of-the-art in natural language understanding (NLU) tasks, shallow noncontextual representation of words are more practical solution in settings constrained by compute power or latency. In single-sense embeddings such as word2vec or GloVe, the different meanings of a word are represented by the same vector, which leads to the meaning conflation problem in the presence of polysemy.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.10642

Country:

North America > United States > New York > Rensselaer County > Troy (0.24)
North America > United States > Colorado (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Fair Canonical Polyadical Decompositions using a Kernel Independence Criterion

Kim, Kevin, Gittens, Alex

arXiv.org Machine LearningApr-27-2021

This work proposes to learn fair low-rank tensor decompositions by regularizing the Canonical Polyadic Decomposition factorization with the kernel Hilbert-Schmidt independence criterion (KHSIC). It is shown, theoretically and empirically, that a small KHSIC between a latent factor and the sensitive features guarantees approximate statistical parity. The proposed algorithm surpasses the stateof-the-art algorithm, FATR (Zhu et al., 2018), in controlling the trade-off between fairness and residual fit on synthetic and real data sets. Tensor factorizations are used in many machine learning applications including link prediction (Dunlavy et al., 2011), clustering (Shashua et al., 2006), and recommendation (Kutty et al., 2012), where they are used to find vector representations (embeddings) of entities. With the widespread use of tensor factorization, we hope that decisions made from using tensor data are accurate but fair.

artificial intelligence, fairness, machine learning, (16 more...)

arXiv.org Machine Learning

2104.13504

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Information Prediction using Knowledge Graphs for Contextual Malware Threat Intelligence

Rastogi, Nidhi, Dutta, Sharmishtha, Christian, Ryan, Zaki, Mohammad, Gittens, Alex, Aggarwal, Charu

arXiv.org Artificial IntelligenceFeb-18-2021

Large amounts of threat intelligence information about mal-ware attacks are available in disparate, typically unstructured, formats. Knowledge graphs can capture this information and its context using RDF triples represented by entities and relations. Sparse or inaccurate threat information, however, leads to challenges such as incomplete or erroneous triples. Named entity recognition (NER) and relation extraction (RE) models used to populate the knowledge graph cannot fully guaran-tee accurate information retrieval, further exacerbating this problem. This paper proposes an end-to-end approach to generate a Malware Knowledge Graph called MalKG, the first open-source automated knowledge graph for malware threat intelligence. MalKG dataset called MT40K1 contains approximately 40,000 triples generated from 27,354 unique entities and 34 relations. We demonstrate the application of MalKGin predicting missing malware threat intelligence information in the knowledge graph. For ground truth, we manually curate a knowledge graph called MT3K, with 3,027 triples generated from 5,741 unique entities and 22 relations. For entity prediction via a state-of-the-art entity prediction model(TuckER), our approach achieves 80.4 for the hits@10 metric (predicts the top 10 options for missing entities in the knowledge graph), and 0.75 for the MRR (mean reciprocal rank). We also propose a framework to automate the extraction of thousands of entities and relations into RDF triples, both manually and automatically, at the sentence level from1,100 malware threat intelligence reports and from the com-mon vulnerabilities and exposures (CVE) database.

cyberwarfare, deep learning, relation, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.12526.54083

2102.05571

Country:

Asia (0.46)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

MALOnt: An Ontology for Malware Threat Intelligence

Rastogi, Nidhi, Dutta, Sharmishtha, Zaki, Mohammed J., Gittens, Alex, Aggarwal, Charu

arXiv.org Artificial IntelligenceJun-19-2020

Malware threat intelligence uncovers deep information about malware, threat actors, and their tactics, Indicators of Compromise(IoC), and vulnerabilities in different platforms from scattered threat sources. This collective information can guide decision making in cyber defense applications utilized by security operation centers(SoCs). In this paper, we introduce an open-source malware ontology - MALOnt that allows the structured extraction of information and knowledge graph generation, especially for threat intelligence. The knowledge graph that uses MALOnt is instantiated from a corpus comprising hundreds of annotated malware threat reports. The knowledge graph enables the analysis, detection, classification, and attribution of cyber threats caused by malware. We also demonstrate the annotation process using MALOnt on exemplar threat intelligence reports. A work in progress, this research is part of a larger effort towards auto-generation of knowledge graphs (KGs)for gathering malware threat intelligence from heterogeneous online resources.

law enforcement, ontology, public safety, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.16426.64962

2006.11446

Country: North America > United States > New York > Rensselaer County > Troy (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Fast Fixed Dimension L2-Subspace Embeddings of Arbitrary Accuracy, With Application to L1 and L2 Tasks

Magdon-Ismail, Malik, Gittens, Alex

arXiv.org Machine LearningSep-27-2019

September 30, 2019 Abstract We give a fast oblivious null 2-embedding of A R n d to A R r d satisfying (1 ε) nullA x null 2 2 null A x null 2 2 (1 ε)null A xnull 2 2. Our embedding dimension r equals d, a constant independent of the distortion ε . We use as a black-box any null 2-embedding Π t A and inherit its runtime and accuracy, effectively decoupling the dimension r from runtime and accuracy, allowing downstream machine learning applications to benefit from both a low dimension and high accuracy (in prior embeddings higher accuracy means higher dimension). We give applications of our null 2 embedding to regression, PCA and statistical leverage scores. We also give applications to null 1: (i) An oblivious null 1 embedding with dimension d O (d ln 1 η d) and distortion O (( d ln d)/ ln ln d), with application to constructing well -conditioned bases; (ii) Fast approximation of null 1 Lewis weights using our null 2 embedding to quickly approximate null 2-leverage scores. 1 ...

artificial intelligence, machine learning, null 1, (19 more...)

arXiv.org Machine Learning

1909.1258

Country: North America > United States (0.48)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback