AITopics | foreign language

Collaborating Authors

foreign language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

There's a helpful translation tool hidden in your iPhone Messages

DIY Tech Hacks There's a helpful translation tool hidden in your iPhone Messages More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Get translations right inside the Messages app. Breakthroughs, discoveries, and DIY tips sent six days a week. There are all kinds of ways to use your phone to help you learn a language of course, but here's one you might not have come across before: You can enable real-time translations for a host of different languages right inside the Messages app on iOS. It works through the Apple Translate built into your iPhone, and it's a great way to keep reminding yourself of conversational words and phrases.

artificial intelligence, david nield, iphone, (9 more...)

Popular Science

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner

Neural Information Processing SystemsMar-20-2026, 23:29:45 GMT

Large Language models (LLMs) have demonstrated supreme capabilities in textual understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel LLM-driven audio codec model, LLM-Codec, which transfers the audio modality into textual space by representing audio tokens with words or sub-words from the LLM vocabulary, while maintaining high audio reconstruction quality.The key idea is to reduce the modality heterogeneity between text and audio by compressing the audio modality into the well-trained textual space of LLMs. Thus, the audio representation can be viewed as a new \textit{foreign language}, and LLMs can learn the new \textit{foreign language} with several demonstrations. In experiments, we investigate the performance of the proposed approach across multiple audio understanding and generation tasks, \textit{e.g.} speech emotion classification, audio classification, text-to-speech generation, speech enhancement, etc. Experimental results show that LLMs equipped with the LLM-Codec, named as UniAudio 1.5, prompted by only a few examples, can perform effectively in simple scenarios, validating our cross-modal in-context learning approach.To facilitate research on few-shot audio task learning and multi-modal LLMs, we have open-sourced the LLM-Codec model.

artificial intelligence, large language model, natural language, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MotionGPT: Human Motion as a Foreign Language

Neural Information Processing SystemsDec-24-2025, 20:33:27 GMT

Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multimodal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this motion vocabulary, we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

human motion, motiongpt, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation

Lokhande, Mukul, Dewangan, Tanushree, Mansoori, Mohd Sharik, Chaudhari, Tejas, J., Akarsh, Lokhande, Damayanti, Teman, Adam, Vishvakarma, Santosh Kumar

arXiv.org Artificial IntelligenceOct-14-2025

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.

machine learning, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2510.10676

Country: Asia > India (0.15)

Genre: Research Report (0.64)

Industry:

Information Technology (0.47)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning

Wang, Yufei, Kovashka, Adriana, Fernández, Loretta, Coutanche, Marc N., Wiener, Seth

arXiv.org Artificial IntelligenceOct-14-2025

We investigate a new setting for foreign language learning, where learners infer the meaning of unfamiliar words in a multimodal context of a sentence describing a paired image. We conduct studies with human participants using different image-text pairs. We analyze the features of the data (i.e., images and texts) that make it easier for participants to infer the meaning of a masked or unfamiliar word, and what language backgrounds of the participants correlate with success. We find only some intuitive features have strong correlations with participant performance, prompting the need for further investigating of predictive features for success in these tasks. We also analyze the ability of AI systems to reason about participant performance, and discover promising future directions for improving this reasoning ability.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.09815

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

As Good as a Coin Toss: Human Detection of AI-Generated Content

Communications of the ACMSep-22-2025, 18:17:15 GMT

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. With only a 50-50 chance of detecting synthetic media online, users are more vulnerable than ever to being duped. Advances in generative AI technology have made it easier than ever for anyone to manufacture increasingly realistic synthetic media (colloquially known as deepfakes) at faster speeds, larger scales, and with more customization than ever. This in turn has led to synthetic media increasingly being used for harmful purposes, including disinformation campaigns, nonconsensual pornography, financial fraud, child sexual abuse and exploitation, and espionage. As of today, the principal defense to combat deceptive synthetic media depends in large part on the human observer's perceptual detection capabilities--their ability to visually or auditorily identify AI-generated content when they encounter it. Yet the growing realism of synthetic media impedes this ability, heightening people's vulnerability to weaponized synthetic content. Moreover, people overestimate how capable they are at identifying synthetic media, further exacerbating the problem. As synthetic media continues to advance in sophistication, so too does the threat posed by its growing weaponization, from financial fraud to the production of nonconsensual intimate materials of adults and children.

detection performance, stimuli, synthetic media, (14 more...)

Communications of the ACM

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Media > News (0.89)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification

Ravenscroft, William, Close, George, Bower-Morris, Kit, Stacey, Jamie, Sityaev, Dmitry, Hong, Kris Y.

arXiv.org Artificial IntelligenceAug-26-2025

Large-scale in-the-wild speech datasets have become more prevalent in recent years due to increased interest in models that can learn useful features from unlabelled data for tasks such as speech recognition or synthesis. These datasets often contain undesirable features, such as multiple speakers, non-target languages, and music, which may impact model learning. The Whilter model is proposed as a multitask solution to identify these undesirable samples. Whilter uses a Whisper encoder with an attention-based classifier to solve five diverse classification problems at once. In addition, an annotated dataset is published for a subset of two popular in-the-wild corpora. Whilter achieves F1 scores above 85% and equal error rates of 6.5% to 7.8% for three of five subtasks, outperforming a state-of-the-art BEATs classifier on speech-specific classes, with a notable decrease in processing time compared to a combination of single-task alternatives.

artificial intelligence, machine learning, speech recognition, (20 more...)

arXiv.org Artificial Intelligence

2507.21642

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Using Large Language Models to Measure Symptom Severity in Patients At Risk for Schizophrenia

Chen, Andrew X., Horga, Guillermo, Escola, Sean

arXiv.org Artificial IntelligenceAug-15-2025

Patients who are at clinical high risk (CHR) for schizophrenia need close monitoring of their symptoms to inform appropriate treatments. The Brief Psychiatric Rating Scale (BPRS) is a validated, commonly used research tool for measuring symptoms in patients with schizophrenia and other psychotic disorders; however, it is not commonly used in clinical practice as it requires a lengthy structured interview. Here, we utilize large language models (LLMs) to predict BPRS scores from clinical interview transcripts in 409 CHR patients from the Accelerating Medicines Partnership Schizophrenia (AMP-SCZ) cohort. Despite the interviews not being specifically structured to measure the BPRS, the zero-shot performance of the LLM predictions compared to the true assessment (median concordance: 0.84, ICC: 0.73) approaches human inter- and intra-rater reliability. We further demonstrate that LLMs have substantial potential to improve and standardize the assessment of CHR patients via their accuracy in assessing the BPRS in foreign languages (median concordance: 0.88, ICC: 0.70), and integrating longitudinal information in a one-shot or few-shot learning approach.

large language model, natural language, transcript, (18 more...)

arXiv.org Artificial Intelligence

2508.10226

Country: Europe (0.14)

Genre:

Research Report > New Finding (0.68)
Personal > Interview (0.56)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner

Neural Information Processing SystemsMay-27-2025, 03:54:24 GMT

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SUMART: SUMmARizing Translation from Wordy to Concise Expression

Nishida, Naoto, Rekimoto, Jun

arXiv.org Artificial IntelligenceApr-15-2025

We propose SUMART, a method for summarizing and compressing the volume of verbose subtitle translations. SUMART is designed for understanding translated captions (e.g., interlingual conversations via subtitle translation or when watching movies in foreign language audio and translated captions). SUMART is intended for users who want a big-picture and fast understanding of the conversation, audio, video content, and speech in a foreign language. During the training data collection, when a speaker makes a verbose statement, SUMART employs a large language model on-site to compress the volume of subtitles. This compressed data is then stored in a database for fine-tuning purposes. Later, SUMART uses data pairs from those non-compressed ASR results and compressed translated results for fine-tuning the translation model to generate more concise translations for practical uses. In practical applications, SUMART utilizes this trained model to produce concise translation results. Furthermore, as a practical application, we developed an application that allows conversations using subtitle translation in augmented reality spaces. As a pilot study, we conducted qualitative surveys using a SUMART prototype and a survey on the summarization model for SUMART. We envision the most effective use case of this system is where users need to consume a lot of information quickly (e.g., Speech, lectures, podcasts, Q&A in conferences).

artificial intelligence, machine translation, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/VRW62533.2024.00119

2504.0986

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback