AITopics

2305.02437

Country:

Asia > China (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

On the Use of Metaphor Translation in Psychiatry

Wong, Lois

Providing mental healthcare to individuals with limited English proficiency (LEP) remains a pressing problem within psychiatry. Because the majority of individuals trained in providing psychiatric care are English speakers, the quality of mental healthcare given to LEP patients is significantly lower than that provided for English speakers. The provision of mental healthcare is contingent on communication and understanding between the patient and healthcare provider, much more so than in the realm of physical healthcare, and English speakers are often unable to comprehend figurative language such as metaphors used by LEPs. Hence, Figurative Language Translation is invaluable to providing equitable psychiatric care. Now, metaphor has been shown to be paramount in both identifying individuals struggling with mental problems and helping those individuals understand and communicate their experiences. Therefore, this paper aims to survey the potential of Machine Translation for providing equitable psychiatric healthcare and highlights the need for further research on the transferability of existing machine and metaphor translation research in the domain of psychiatry.

healthcare, metaphor, translation, (13 more...)

2312.14845

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Mukherjee, Sourabrata, Kasner, Zdeněk, Dušek, Ondřej

Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising

Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content. Although current models show good results at changing the sentiment, content preservation in transferred sentences is insufficient. In this paper, we present a sentiment transfer model based on polarity-aware denoising, which accurately controls the sentiment attributes in generated text, preserving the content to a great extent and helping to balance the style-content trade-off. Our proposed model is structured around two key stages in the sentiment transfer process: better representation learning using a shared encoder and sentiment-controlled generation using separate sentiment-specific decoders. Empirical results show that our methods outperforms state-of-the-art baselines in terms of content preservation while staying competitive in terms of style transfer accuracy and fluency. Source code, data, and all other related details are available on Github.

content preservation, latent representation, sentiment, (13 more...)

2312.14708

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Language Model is a Branch Predictor for Simultaneous Machine Translation

Yin, Aoxiong, Zhong, Tianyun, Li, Haoyuan, Tang, Siliang, Zhao, Zhou

The primary objective of simultaneous machine translation (SiMT) is to minimize latency while preserving the quality of the final translation. Drawing inspiration from CPU branch prediction techniques, we propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency. Specifically, we utilize a language model as a branch predictor to predict potential branch directions, namely, future source words. Subsequently, we utilize the predicted source words to decode the output in advance. When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output. To further reduce computational costs, we share the parameters of the encoder and the branch predictor, and utilize a pre-trained language model for initialization. Our proposed method can be seamlessly integrated with any SiMT model. Extensive experimental results demonstrate that our approach can improve translation quality and latency at the same time. Our code is available at https://github.com/YinAoXiong/simt_branch_predictor .

branch predictor, language model, translation, (8 more...)

2312.14488

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Singh, Avi, Co-Reyes, John D., Agarwal, Rishabh, Anand, Ankesh, Patil, Piyush, Garcia, Xavier, Liu, Peter J., Harrison, James, Lee, Jaehoon, Xu, Kelvin, Parisi, Aaron, Kumar, Abhishek, Alemi, Alex, Rizkowsky, Alex, Nova, Azade, Adlam, Ben, Bohnet, Bernd, Elsayed, Gamaleldin, Sedghi, Hanie, Mordatch, Igor, Simpson, Isabelle, Gur, Izzeddin, Snoek, Jasper, Pennington, Jeffrey, Hron, Jiri, Kenealy, Kathleen, Swersky, Kevin, Mahajan, Kshiteej, Culp, Laura, Xiao, Lechao, Bileschi, Maxwell L., Constant, Noah, Novak, Roman, Liu, Rosanne, Warkentin, Tris, Qian, Yundi, Bansal, Yamini, Dyer, Ethan, Neyshabur, Behnam, Sohl-Dickstein, Jascha, Fiedel, Noah

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.

iteration, language model, scaling self-training, (14 more...)

2312.06585

Country: Africa > Rwanda > Kigali > Kigali (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Lusito, Stefano, Ferrante, Edoardo, Maillard, Jean

Text normalization for low-resource languages: the case of Ligurian

Text normalization is a crucial technology for low-resource languages which lack rigid spelling conventions or that have undergone multiple spelling reforms. Low-resource text normalization has so far relied upon hand-crafted rules, which are perceived to be more data efficient than neural methods. In this paper we examine the case of text normalization for Ligurian, an endangered Romance language. We collect 4,394 Ligurian sentences paired with their normalized versions, as well as the first open source monolingual corpus for Ligurian. We show that, in spite of the small amounts of data available, a compact transformer-based model can be trained to achieve very low error rates by the use of backtranslation and appropriate tokenization.

computational linguistic, ligurian, normalization, (14 more...)

2206.07861

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Liguria > Genoa (0.05)
Europe > Germany > Berlin (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceDec-21-2023

Layer-wise Representation Fusion for Compositional Generalization

Zheng, Yafang, Lin, Lei, Li, Shuangtao, Yuan, Yuxuan, Lai, Zhaohong, Liu, Shan, Fu, Biao, Chen, Yidong, Shi, Xiaodong

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel \textbf{L}ayer-wise \textbf{R}epresentation \textbf{F}usion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a \emph{fuse-attention module} at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal.

information, proc, transformer, (12 more...)

2307.10799

Country:

Asia > China > Fujian Province > Xiamen (0.04)
Europe > Belgium (0.04)
Asia > Taiwan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.98)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Charles, Zachary, Mitchell, Nicole, Pillutla, Krishna, Reneer, Michael, Garrett, Zachary

Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning

arXiv.org Artificial IntelligenceDec-21-2023

We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper enables large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work, allowing for federated training of language models with hundreds of millions, and even billions, of parameters. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation. Dataset Grouper is available at https://github.com/google-research/dataset_grouper.

dataset, dataset grouper, learning, (15 more...)

2307.09619

Country:

North America > United States > Virginia (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kaji, Arshad, Shah, Manan

Contextual Code Switching for Machine Translation using Language Models

arXiv.org Artificial IntelligenceDec-20-2023

Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years. Their demonstrated state-of-the-art performance is achieved through methodologies such as zero-shot or few-shot prompting. These models undergo training on extensive datasets that encompass segments of the Internet and subsequently undergo fine-tuning tailored to specific tasks. Notably, they exhibit proficiency in tasks such as translation, summarization, question answering, and creative writing, even in the absence of explicit training for those particular tasks. While they have shown substantial improvement in the multilingual tasks their performance in the code switching, especially for machine translation remains relatively uncharted. In this paper, we present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs. Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task. We posit that the efficacy of multilingual large language models in contextual code switching is constrained by their training methodologies. In contrast, relatively smaller models, when trained and fine-tuned on bespoke datasets, may yield superior results in comparison to the majority of multilingual models.

arxiv, language model, translation, (11 more...)

2312.13179

Country:

Asia > India > Maharashtra > Mumbai (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.79)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceDec-20-2023

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

Gala, Jay, Chitale, Pranjal A., AK, Raghavan, Gumma, Varun, Doddapaneni, Sumanth, Kumar, Aswanth, Nawale, Janki, Sujatha, Anupama, Puduppully, Ratish, Raghavan, Vivek, Kumar, Pratyush, Khapra, Mitesh M., Dabre, Raj, Kunchukuttan, Anoop

India has a rich linguistic landscape with languages from 4 major language families spoken by over a billion people. 22 of these languages are listed in the Constitution of India (referred to as scheduled languages) are the focus of this work. Given the linguistic diversity, high-quality and accessible Machine Translation (MT) systems are essential in a country like India. Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India. In this work, we aim to address this gap by focusing on the missing pieces required for enabling wide, easy, and open access to good machine translation systems for all 22 scheduled Indian languages. We identify four key areas of improvement: curating and creating larger training datasets, creating diverse and high-quality benchmarks, training multilingual models, and releasing models with open access. Our first contribution is the release of the Bharat Parallel Corpus Collection (BPCC), the largest publicly available parallel corpora for Indic languages. BPCC contains a total of 230M bitext pairs, of which a total of 126M were newly added, including 644K manually translated sentence pairs created as part of this work. Our second contribution is the release of the first n-way parallel benchmark covering all 22 Indian languages, featuring diverse domains, Indian-origin content, and source-original test sets. Next, we present IndicTrans2, the first model to support all 22 languages, surpassing existing models on multiple existing and new benchmarks created as a part of this work. Lastly, to promote accessibility and collaboration, we release our models and associated data with permissive licenses at https://github.com/AI4Bharat/IndicTrans2.

bharat parallel corpus collection, european language resource association, indictrans2-m2m model, (12 more...)

2305.16307

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.13)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(45 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Education (1.00)
Government > Regional Government > Asia Government > India Government (0.66)
Consumer Products & Services > Travel (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)