AITopics | Titov, Ivan

Collaborating Authors

Titov, Ivan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-Modal Conceptualization in Bottleneck Models

Alukaev, Danis, Kiselev, Semen, Pershin, Ilya, Ibragimov, Bulat, Ivanov, Vladimir, Kornaev, Alexey, Titov, Ivan

arXiv.org Artificial IntelligenceDec-17-2023

Concept Bottleneck Models (CBMs) (Koh et al., 2020) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. The main difficulty in using CBMs comes from having to choose concepts that are predictive of the label and then having to label training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images in training, to guide the induction of concepts. Our cross-modal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets Figure 1: In the XCB framework, during training we (e.g., synthetic images with generated descriptions) promote agreement between the text and visual models' to realistic medical imaging datasets, we discrete latent representations. Moreover, we introduce demonstrate that cross-modal learning encourages sparsity regularizers in the text model to encourage the induction of interpretable concepts disentangled and human-interpretable latent representations.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.emnlp-main.318

2310.14805

Country:

Europe > Italy (0.14)
North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Compositional Generalization for Data-to-Text Generation

Xu, Xinnuo, Titov, Ivan, Lapata, Mirella

arXiv.org Artificial IntelligenceDec-5-2023

Data-to-text generation involves transforming structured data, often represented as predicate-argument tuples, into coherent textual descriptions. Despite recent advances, systems still struggle when confronted with unseen combinations of predicates, producing unfaithful descriptions (e.g. hallucinations or omissions). We refer to this issue as compositional generalisation, and it encouraged us to create a benchmark for assessing the performance of different approaches on this specific problem. Furthermore, we propose a novel model that addresses compositional generalization by clustering predicates into groups. Our model generates text in a sentence-by-sentence manner, relying on one cluster of predicates at a time. This approach significantly outperforms T5~baselines across all evaluation metrics.Notably, it achieved a 31% improvement over T5 in terms of a metric focused on maintaining faithfulness to the input.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.02748

Country:

Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.96)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study

Züfle, Maike, Dankers, Verna, Titov, Ivan

arXiv.org Artificial IntelligenceNov-16-2023

With the ever-growing presence of social media platforms comes the increased spread of harmful content and the need for robust hate speech detection systems. Such systems easily overfit to specific targets and keywords, and evaluating them without considering distribution shifts that might occur between train and test data overestimates their benefit. We challenge hate speech models via new train-test splits of existing datasets that rely on the clustering of models' hidden representations. We present two split variants (Subset-Sum-Split and Closest-Split) that, when applied to two datasets using four pretrained models, reveal how models catastrophically fail on blind spots in the latent space. This result generalises when developing a split with one model and evaluating it on another. Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance, which underscores that task difficulty is not always humanly interpretable. We recommend incorporating latent feature-based splits in model development and release two splits via the GenBench benchmark.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2311.10236

Country:

Europe (0.93)
Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation

Dankers, Verna, Titov, Ivan, Hupkes, Dieuwke

arXiv.org Artificial IntelligenceNov-9-2023

When training a neural network, it will quickly memorise some source-target mappings from your dataset but never learn some others. Yet, memorisation is not easily expressed as a binary feature that is good or bad: individual datapoints lie on a memorisation-generalisation continuum. What determines a datapoint's position on that spectrum, and how does that spectrum influence neural models' performance? We address these two questions for neural machine translation (NMT) models. We use the counterfactual memorisation metric to (1) build a resource that places 5M NMT datapoints on a memorisation-generalisation map, (2) illustrate how the datapoints' surface-level characteristics and a models' per-datum training signals are predictive of memorisation in NMT, (3) and describe the influence that subsets of that map have on NMT systems' performance.

machine learning, memorisation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2311.05379

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Müller-Eberstein, Max, van der Goot, Rob, Plank, Barbara, Titov, Ivan

arXiv.org Artificial IntelligenceOct-25-2023

Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.

artificial intelligence, linguistic information emerge, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.16484

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

On the Transferability of Visually Grounded PCFGs

Zhao, Yanpeng, Titov, Ivan

arXiv.org Artificial IntelligenceOct-21-2023

There has been a significant surge of interest in visually grounded grammar induction in recent times. While a variety of models have been developed for the task and have demonstrated impressive performance, they have not been evaluated on text domains that are different from the training domain, so it is unclear if the improvements brought by visual groundings are transferable. Our study aims to fill this gap and assess the degree of transferability. We start by extending VC-PCFG (short for Visually-grounded Compound PCFG~\citep{zhao-titov-2020-visually}) in such a way that it can transfer across text domains. We consider a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains, without any further training. Our experimental results suggest that: the benefits from using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains. Further, we conduct data and result analysis; we find that the lexicon overlap between the source domain and the target domain is the most important factor in the transferability of VC-PCFG.

artificial intelligence, enweb, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.14107

Country:

Europe (0.67)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Empirical Study of Compound PCFGs

Zhao, Yanpeng, Titov, Ivan

arXiv.org Artificial IntelligenceOct-21-2023

Compound probabilistic context-free grammars (C-PCFGs) have recently established a new state of the art for unsupervised phrase-structure grammar induction. However, due to the high space and time complexities of chart-based representation and inference, it is difficult to investigate C-PCFGs comprehensively. In this work, we rely on a fast implementation of C-PCFGs to conduct an evaluation complementary to that of~\citet{kim-etal-2019-compound}. We start by analyzing and ablating C-PCFGs on English treebanks. Our findings suggest that (1) C-PCFGs are data-efficient and can generalize to unseen sentence/constituent lengths; and (2) C-PCFGs make the best use of sentence-level information in generating preterminal rule probabilities. We further conduct a multilingual evaluation of C-PCFGs. The experimental results show that the best configurations of C-PCFGs, which are tuned on English, do not always generalize to morphology-rich languages.

artificial intelligence, c-pcfg, natural language, (16 more...)

arXiv.org Artificial Intelligence

2103.02298

Country:

Europe (0.68)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Cache & Distil: Optimising API Calls to Large Language Models

Ramírez, Guillem, Lindemann, Matthias, Birch, Alexandra, Titov, Ivan

arXiv.org Artificial IntelligenceOct-20-2023

Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a process we term neural caching. The crucial element in neural caching is a policy that decides which requests should be processed by the student alone and which should be redirected to the LLM, subsequently aiding the student's learning. In this study, we focus on classification tasks, and we consider a range of classic active learning-based selection criteria as the policy. Our experiments suggest that Margin Sampling and Query by Committee bring consistent benefits across tasks and budgets.

artificial intelligence, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.13561

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation

Lindemann, Matthias, Koller, Alexander, Titov, Ivan

arXiv.org Artificial IntelligenceOct-1-2023

Before describing our procedure for sampling deterministic FSTs, we briefly establish notation. An FST is a tuple Q, Σ, Γ, I, F,, where Q is a finite set of states, Σ is the input alphabet, Γ is the output alphabet, I Q is a set of initial states, F Q is a set of final states and Q (Σ {ϵ}) (Γ {ϵ}) Q are the transitions. We assume Σ = Γ and call it V for vocabulary. For technical reasons, we exclude the three characters [, ] and \ from the vocabulary as they are interpreted as special characters by OpenFST, which we use for constructing and representing FSTs. In addition to the shorthand for identity transitions (id), we also have shorthands for converting upper case to lower case and vice-versa (lower-to-upper, upper-to-lower). We describe our procedure to generate a deterministic FST with pseudocode in Algorithm 1.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.00796

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Autoencoding Conditional Neural Processes for Representation Learning

Prokhorov, Victor, Titov, Ivan, Siddharth, N.

arXiv.org Artificial IntelligenceMay-29-2023

Conditional neural processes (CNPs) are a flexible and efficient family of models that learn to learn a stochastic process from observations. In the visual domain, they have seen particular application in contextual image completion - observing pixel values at some locations to predict a distribution over values at other unobserved locations. However, the choice of pixels in learning such a CNP is typically either random or derived from a simple statistical measure (e.g. pixel variance). Here, we turn the problem on its head and ask: which pixels would a CNP like to observe? That is, which pixels allow fitting CNP, and do such pixels tell us something about the underlying image? Viewing the context provided to the CNP as fixed-size latent representations, we construct an amortised variational framework, Partial Pixel Space Variational Autoencoder (PPS-VAE), for predicting this context simultaneously with learning a CNP. We evaluate PPS-VAE on a set of vision datasets, and find that not only is it possible to learn context points while also fitting CNPs, but that their spatial arrangement and values provides strong signal for the information contained in the image - evaluated through the lens of classification. We believe the PPS-VAE provides a promising avenue to explore learning interpretable and effective visual representations.

artificial intelligence, machine learning, pp-vae, (19 more...)

arXiv.org Artificial Intelligence

2305.18485

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback