AITopics

2402.0208

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-3-2024

Neural Scaling Laws on Graphs

Liu, Jingzhe, Mao, Haitao, Chen, Zhikai, Zhao, Tong, Shah, Neil, Tang, Jiliang

Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the scaling properties of deep graph models have not been systematically investigated, casting doubt on the feasibility of achieving large graph models through enlarging the model and dataset sizes. In this work, we delve into neural scaling laws on graphs from both model and data perspectives. We first verify the validity of such laws on graphs, establishing formulations to describe the scaling behaviors. For model scaling, we investigate the phenomenon of scaling law collapse and identify overfitting as the potential reason. Moreover, we reveal that the model depth of deep graph models can impact the model scaling behaviors, which differ from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively metric the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of edges as the metric to address the irregular graph sizes. We further demonstrate the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an essential step toward large graph models.

color correspond, dataset, graph, (11 more...)

2402.02054

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Alam, Md Mahfuz Ibn, Anastasopoulos, Antonios

A Case Study on Filtering for End-to-End Speech Translation

It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation. Although these mined corpora are large in volume, their quality is questionable. This work shows that the simplest filtering technique can trim down these big, noisy datasets to a more manageable, clean dataset. We also show that using this clean dataset can improve the model's performance, as in the case of the multilingual-to-English Speech Translation (ST) model, where, on average, we obtain a 4.65 BLEU score improvement.

baseline, computational linguistic, unfiltered sm, (13 more...)

2402.01945

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Alam, Md Mahfuz Ibn, Ahmadi, Sina, Anastasopoulos, Antonios

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

The availability of parallel texts is crucial to the performance of machine translation models. However, most of the world's languages face the predominant challenge of data scarcity. In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data. Our methodology adheres to a realistic scenario backed by the small parallel seed data. It is linguistically informed, as it aims to create augmented data that is more likely to be grammatically correct. We analyze how our synthetic data can be combined with raw parallel data and demonstrate a consistent improvement in performance in our experiments on 14 languages (28 English <-> X pairs) ranging from well- to very low-resource ones. Our method leads to improvements even when using only five seed sentences and a bilingual lexicon.

augmentation, seed sentence, translation, (15 more...)

2402.01939

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Pennsylvania (0.04)
Europe > Germany (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation

Alhazmi, Elaf, Sheng, Quan Z., Zhang, Wei Emma, Zaib, Munazza, Alhazmi, Ahoud

Distractors are important in learning evaluation. This paper surveys distractor generation tasks using English multiple-choice question datasets for textual and multimodal contexts. In particular, this paper presents a thorough literature review of the recent studies on distractor generation tasks, discusses multiple choice components and their characteristics, analyzes the related datasets, and summarizes the evaluation metrics of distractor generation. Our investigation reveals that more than half of datasets are human-generated from educational sources in specific domains such as Science and English, which are largely text-based, with a lack of open domain and multimodal datasets.

dataset, distractor, proceedings, (14 more...)

2402.01512

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > Quebec > Montreal (0.04)
(6 more...)

Genre: Overview (1.00)

Industry:

Education > Educational Setting > K-12 Education (0.46)
Education > Assessment & Standards > Student Performance (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Mąka, Paweł, Semerci, Yusuf Can, Scholtes, Jan, Spanakis, Gerasimos

Sequence Shortening for Context-Aware Machine Translation

Context-aware Machine Translation aims to improve translations of sentences by incorporating surrounding sentences as context. Towards this task, two main architectures have been applied, namely single-encoder (based on concatenation) and multi-encoder models. In this study, we show that a special case of multi-encoder architecture, where the latent representation of the source sentence is cached and reused as the context in the next step, achieves higher accuracy on the contrastive datasets (where the models have to rank the correct translation among the provided sentences) and comparable BLEU and COMET scores as the single- and multi-encoder approaches. Furthermore, we investigate the application of Sequence Shortening to the cached representations. We test three pooling-based shortening techniques and introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context. Our experiments show that the two methods achieve competitive BLEU and COMET scores and accuracies on the contrastive datasets to the other tested methods while potentially allowing for higher interpretability and reducing the growth of memory requirements with increased context size.

computational linguistic, shortening, translation, (13 more...)

2402.01416

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(16 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Mohammed, Wafaa, Niculae, Vlad

On Measuring Context Utilization in Document-Level MT Systems

Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We propose to complement accuracy-based evaluation with measures of context utilization. We find that perturbation-based analysis (comparing models' performance when provided with correct versus random context) is an effective measure of overall context utilization. For a finer-grained phenomenon-specific evaluation, we propose to measure how much the supporting context contributes to handling context-dependent discourse phenomena. We show that automatically-annotated supporting context gives similar conclusions to human-annotated context and can be used as alternative for cases where human annotations are not available. Finally, we highlight the importance of using discourse-rich datasets when assessing context utilization.

computational linguistic, evaluation, translation, (13 more...)

2402.01404

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > Canada > Ontario > Toronto (0.04)
(13 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Fdez-Díaz, Miriam, Quevedo, José Ramón, Montañés, Elena

Target inductive methods for zero-shot regression

This research arises from the need to predict the amount of air pollutants in meteorological stations. Air pollution depends on the location of the stations (weather conditions and activities in the surroundings). Frequently, the surrounding information is not considered in the learning process. This information is known beforehand in the absence of unobserved weather conditions and remains constant for the same station. Considering the surrounding information as side information facilitates the generalization for predicting pollutants in new stations, leading to a zero-shot regression scenario. Available methods in zero-shot typically lean towards classification, and are not easily extensible to regression. This paper proposes two zero-shot methods for regression. The first method is a similarity based approach that learns models from features and aggregates them using side information. However, potential knowledge of the feature models may be lost in the aggregation. The second method overcomes this drawback by replacing the aggregation procedure and learning the correspondence between side information and feature-induced models, instead. Both proposals are compared with a baseline procedure using artificial datasets, UCI repository communities and crime datasets, and the pollutants. Both approaches outperform the baseline method, but the parameter learning approach manifests its superiority over the similarity based method.

dataset, information, side information, (17 more...)

doi: 10.1016/j.ins.2022.03.075

2402.01252

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Asturias (0.04)
South America > Colombia (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Kassem, Aly M., Saad, Sherif

Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion

Adversarial attacks against language models(LMs) are a significant concern. In particular, adversarial samples exploit the model's sensitivity to small input changes. While these changes appear insignificant on the semantics of the input sample, they result in significant decay in model performance. In this paper, we propose Targeted Paraphrasing via RL (TPRL), an approach to automatically learn a policy to generate challenging samples that most likely improve the model's performance. TPRL leverages FLAN T5, a language model, as a generator and employs a self learned policy using a proximal policy gradient to generate the adversarial examples automatically. TPRL's reward is based on the confusion induced in the classifier, preserving the original text meaning through a Mutual Implication score. We demonstrate and evaluate TPRL's effectiveness in discovering natural adversarial attacks and improving model performance through extensive experiments on four diverse NLP classification tasks via Automatic and Human evaluation. TPRL outperforms strong baselines, exhibits generalizability across classifiers and datasets, and combines the strengths of language modeling and reinforcement learning to generate diverse and influential adversarial examples.

arxiv preprint arxiv, classifier, dataset, (13 more...)

2401.11373

Country:

North America > United States > Rhode Island (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Information Technology > Security & Privacy (0.86)
Government > Military (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Cao, Yong, Chen, Min, Hershcovich, Daniel

Bridging Cultural Nuances in Dialogue Agents through Cultural Value Surveys

The cultural landscape of interactions with dialogue agents is a compelling yet relatively unexplored territory. It's clear that various sociocultural aspects -- from communication styles and beliefs to shared metaphors and knowledge -- profoundly impact these interactions. To delve deeper into this dynamic, we introduce cuDialog, a first-of-its-kind benchmark for dialogue generation with a cultural lens. We also develop baseline models capable of extracting cultural attributes from dialogue exchanges, with the goal of enhancing the predictive accuracy and quality of dialogue agents. To effectively co-learn cultural understanding and multi-turn dialogue predictions, we propose to incorporate cultural dimensions with dialogue encoding features. Our experimental findings highlight that incorporating cultural value surveys boosts alignment with references and cultural markers, demonstrating its considerable influence on personalization and dialogue quality. To facilitate further exploration in this exciting domain, we publish our benchmark publicly accessible at https://github.com/yongcaoplus/cuDialog.

computational linguistic, dialogue, proceedings, (13 more...)

2401.10352

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(17 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)