AITopics

2310.186

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(14 more...)

Genre: Research Report (0.64)

Industry: Law > Government & the Courts (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Hsu, Aliyah R., Cherapanamjeri, Yeshwanth, Park, Briton, Naumann, Tristan, Odisho, Anobel Y., Yu, Bin

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making

arXiv.org Artificial IntelligenceOct-28-2023

Pre-trained transformers are often fine-tuned to aid clinical decision-making using limited clinical notes. Model interpretability is crucial, especially in high-stakes domains like medicine, to establish trust and ensure safety, which requires human engagement. We introduce SUFO, a systematic framework that enhances interpretability of fine-tuned transformer feature spaces. SUFO utilizes a range of analytic and visualization techniques, including Supervised probing, Unsupervised similarity analysis, Feature dynamics, and Outlier analysis to address key questions about model trust and interpretability. We conduct a case study investigating the impact of pre-training data where we focus on real-world pathology classification tasks, and validate our findings on MedNLI. We evaluate five 110M-sized pre-trained transformer models, categorized into general-domain (BERT, TNLR), mixed-domain (BioBERT, Clinical BioBERT), and domain-specific (PubMedBERT) groups. Our SUFO analyses reveal that: (1) while PubMedBERT, the domain-specific model, contains valuable information for fine-tuning, it can overfit to minority classes when class imbalances exist. In contrast, mixed-domain models exhibit greater resistance to overfitting, suggesting potential improvements in domain-specific model robustness; (2) in-domain pre-training accelerates feature disambiguation during fine-tuning; and (3) feature spaces undergo significant sparsification during this process, enabling clinicians to identify common outlier modes among fine-tuned models as demonstrated in this paper. These findings showcase the utility of SUFO in enhancing trust and safety when using transformers in medicine, and we believe SUFO can aid practitioners in evaluating fine-tuned language models for other applications in medicine and in more critical domains.

computational linguistic, feature space, fine-tuning, (14 more...)

2305.17588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.88)
Health & Medicine > Therapeutic Area > Urology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Zhang, Zhen, Wang, Jialu, Wang, Xin Eric

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

arXiv.org Artificial IntelligenceOct-28-2023

Pre-trained vision and language models such as CLIP have witnessed remarkable success in connecting images and texts with a primary focus on English texts. Despite recent efforts to extend CLIP to support other languages, disparities in performance among different languages have been observed due to uneven resource availability. Additionally, current cross-lingual transfer methods of those pre-trained models would consume excessive resources for a large number of languages. Therefore, we propose a new parameter-efficient cross-lingual transfer learning framework that utilizes a translation-based alignment method to mitigate multilingual disparities and explores parameter-efficient fine-tuning methods for parameter-efficient cross-lingual transfer. Extensive experiments on XTD and Multi30K datasets, covering 11 languages under zero-shot, few-shot, and full-dataset learning scenarios, show that our framework significantly reduces the multilingual disparities among languages and improves cross-lingual transfer results, especially in low-resource scenarios, while only keeping and fine-tuning an extremely small number of parameters compared to the full model (e.g., Our framework only requires 0.16\% additional parameters of a full-model for each language in the few-shot learning scenario). The codes are available at \url{https://github.com/eric-ai-lab/PECTVLM}. The codes are available at \url{https://github.com/eric-ai-lab/PECTVLM}.

disparity, multilingual-clip, scenario, (14 more...)

2305.0351

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-27-2023

Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems

Raina, Vyas, Gales, Mark

For sequence-to-sequence tasks it is challenging to combine individual system outputs. Further, there is also often a mismatch between the decoding criterion and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used to combine system outputs in a manner that encourages better alignment with the final assessment criterion. This paper examines MBR decoding for Grammatical Error Correction (GEC) systems, where performance is usually evaluated in terms of edits and an associated F-score. Hence, we propose a novel MBR loss function directly linked to this form of criterion. Furthermore, an approach to expand the possible set of candidate sentences is described. This builds on a current max-voting combination scheme, as well as individual edit-level selection. Experiments on three popular GEC datasets and with state-of-the-art GEC systems demonstrate the efficacy of the proposed MBR approach. Additionally, the paper highlights how varying reward metrics within the MBR decoding framework can provide control over precision, recall, and the F-score in combined GEC systems.

computational linguistic, mbr, sequence, (12 more...)

2309.0652

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
(3 more...)

Detecting and Mitigating Hallucinations in Multilingual Summarisation

Qiu, Yifu, Ziser, Yftah, Korhonen, Anna, Ponti, Edoardo M., Cohen, Shay B.

Hallucinations pose a significant challenge to the reliability of neural models for abstractive summarisation. While automatically generated summaries may be fluent, they often lack faithfulness to the original document. This issue becomes even more pronounced in low-resource settings, such as cross-lingual transfer. With the existing faithful metrics focusing on English, even measuring the extent of this phenomenon in cross-lingual settings is hard. To address this, we first develop a novel metric, mFACT, evaluating the faithfulness of non-English summaries, leveraging translation-based transfer from multiple English faithfulness metrics. We then propose a simple but effective method to reduce hallucinations with a cross-lingual transfer, which weighs the loss of each training example by its faithfulness score. Through extensive experiments in multiple languages, we demonstrate that mFACT is the metric that is most suited to detect hallucinations. Moreover, we find that our proposed loss weighting method drastically increases both performance and faithfulness according to both automatic and human evaluation when compared to strong baselines for cross-lingual transfer such as MAD-X. Our code and dataset are available at https://github.com/yfqiu-nlp/mfact-summ.

computational linguistic, faithfulness, hallucination, (14 more...)

2305.13632

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais > Calais (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Nair, Sathvik, Resnik, Philip

Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship?

An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and provide evidence that in the aggregate, predictions using BPE tokenization do not suffer relative to morphological and orthographic segmentation. However, a finer-grained analysis points to potential issues with relying on BPE-based tokenization, as well as providing promising results involving morphologically-aware surprisal estimates and suggesting a new method for evaluating morphological prediction.

prediction, surprisal, tokenization, (15 more...)

2310.17774

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

Zhu, Yongxin, Gao, Zhujin, Zhou, Xinyuan, Ye, Zhongyi, Xu, Linli

diffs2ut, semantic preserving diffusion model, textless direct speech-to-speech translation

While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically, due to the low information density of speech data, the transformed discrete speech unit sequence is much longer than the corresponding text transcription, posing significant challenges to existing auto-regressive models. Furthermore, it is not optimal to brutally apply discrete diffusion on the speech unit sequence while disregarding the continuous space structure, which will degrade the generation performance significantly. In this paper, we propose a novel diffusion model by applying the diffusion forward process in the \textit{continuous} speech representation space, while employing the diffusion backward process in the \textit{discrete} speech unit space. In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models. We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines (500 steps on average) with significantly fewer decoding steps (50 steps).

2310.1757

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Sun, Kaiser, Williams, Adina, Hupkes, Dieuwke

The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks

NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance. Questions remain, however, about how particular dataset design choices may impact the conclusions we draw about model capabilities. In this work, we investigate this question in the domain of compositional generalization. We examine the performance of six modeling approaches across 4 datasets, split according to 8 compositional splitting strategies, ranking models by 18 compositional generalization splits in total. Our results show that: i) the datasets, although all designed to evaluate compositional generalization, rank modeling approaches differently; ii) datasets generated by humans align better with each other than they with synthetic datasets, or than synthetic datasets among themselves; iii) generally, whether datasets are sampled from the same source is more predictive of the resulting model ranking than whether they maintain the same interpretation of compositionality; and iv) which lexical items are used in the data can strongly impact conclusions. Overall, our results demonstrate that much work remains to be done when it comes to assessing whether popular evaluation datasets measure what they intend to measure, and suggest that elucidating more rigorous standards for establishing the validity of evaluation sets could benefit the field.

dataset, length 0, template 0, (16 more...)

2310.17514

Country:

North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)

Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs

Zuo, Yuxin, Li, Bei, Lv, Chuanhao, Zheng, Tong, Xiao, Tong, Zhu, Jingbo

This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete. Instead, we attribute this phenomenon to insufficient cross-modal interaction, rather than image information redundancy. A novel approach is proposed to generate parallel Visual Question-Answering (VQA) style pairs from the source text, fostering more robust cross-modal interaction. Using Large Language Models (LLMs), we explicitly model the probing signal in MMT to convert it into VQA-style data to create the Multi30K-VQA dataset. An MMT-VQA multitask learning framework is introduced to incorporate explicit probing signals from the dataset into the MMT training process. Experimental results on two widely-used benchmarks demonstrate the effectiveness of this novel approach. Our code and data would be available at: \url{https://github.com/libeineu/MMT-VQA}.

computational linguistic, machine translation, translation, (13 more...)

2310.17133

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Belgium (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

M2C: Towards Automatic Multimodal Manga Complement

Guo, Hongcheng, Wang, Boyang, Bai, Jiaqi, Liu, Jiaheng, Yang, Jian, Li, Zhoujun

Multimodal manga analysis focuses on enhancing manga understanding with visual and textual features, which has attracted considerable attention from both natural language processing and computer vision communities. Currently, most comics are hand-drawn and prone to problems such as missing pages, text contamination, and aging, resulting in missing comic text content and seriously hindering human comprehension. In other words, the Multimodal Manga Complement (M2C) task has not been investigated, which aims to handle the aforementioned issues by providing a shared semantic space for vision and language understanding. To this end, we first propose the Multimodal Manga Complement task by establishing a new M2C benchmark dataset covering two languages. First, we design a manga argumentation method called MCoT to mine event knowledge in comics with large language models. Then, an effective baseline FVP-M$^{2}$ using fine-grained visual prompts is proposed to support manga complement. Extensive experimental results show the effectiveness of FVP-M$^{2}$ method for Multimodal Mange Complement.

complement, fvp-m 2, jiaheng liu, (12 more...)

2310.1713

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)