bilingual
BIBERT-Pipe on Biomedical Nested Named Entity Linking at BioASQ 2025
Li, Chunyu, Zheng, Xindi, Liu, Siqi
Entity linking (EL) for biomedical text is typically benchmarked on English-only corpora with flat mentions, leaving the more realistic scenario of nested and multilingual mentions largely unexplored. We present our system for the BioNNE 2025 Multilingual Biomedical Nested Named Entity Linking shared task (English & Russian), closing this gap with a lightweight pipeline that keeps the original EL model intact and modifies only three task-aligned components: Two-stage retrieval-ranking. We leverage the same base encoder model in both stages: the retrieval stage uses the original pre-trained model, while the ranking stage applies domain-specific fine-tuning. Boundary cues. In the ranking stage, we wrap each mention with learnable [Ms] / [Me] tags, providing the encoder with an explicit, language-agnostic span before robustness to overlap and nesting. Dataset augmentation. We also automatically expand the ranking training corpus with three complementary data sources, enhancing coverage without extra manual annotation. On the BioNNE 2025 leaderboard, our two stage system, bilingual bert (BIBERT-Pipe), ranks third in the multilingual track, demonstrating the effectiveness and competitiveness of these minimal yet principled modifications. Code are publicly available at https://github.com/Kaggle-Competitions-Code/BioNNE-L.
Mapping Geopolitical Bias in 11 Large Language Models: A Bilingual, Dual-Framing Analysis of U.S.-China Tensions
Guey, William, Bougault, Pierrick, de Moura, Vitor D., Zhang, Wei, Gomes, Jose O.
This study systematically analyzes geopolitical bias across 11 prominent Large Language Models (LLMs) by examining their responses to seven critical topics in U.S.-China relations. Utilizing a bilingual (English and Chinese) and dual-framing (affirmative and reverse) methodology, we generated 19,712 prompts designed to detect ideological leanings in model outputs. Responses were quantitatively assessed on a normalized scale from -2 (strongly Pro-China) to +2 (strongly Pro-U.S.) and categorized according to stance, neutrality, and refusal rates. The findings demonstrate significant and consistent ideological alignments correlated with the LLMs' geographic origins; U.S.-based models predominantly favored Pro-U.S. stances, while Chinese-origin models exhibited pronounced Pro-China biases. Notably, language and prompt framing substantially influenced model responses, with several LLMs exhibiting stance reversals based on prompt polarity or linguistic context. Additionally, we introduced comprehensive metrics to evaluate response consistency across languages and framing conditions, identifying variability and vulnerabilities in model behaviors. These results offer practical insights that can guide organizations and individuals in selecting LLMs best aligned with their operational priorities and geopolitical considerations, underscoring the importance of careful model evaluation in politically sensitive applications. Furthermore, the research highlights specific prompt structures and linguistic variations that can strategically trigger distinct responses from models, revealing methods for effectively navigating and influencing LLM outputs.
Chloe Joyce Ann Ropa on LinkedIn: #AUSTRIA #HIRING #Bilingual
Global Recruitment Specialist at Appen ACTIVELY HIRING Let's connect! Add me in your connection l Providing remote part-time job opportunities across the globe! Another #opportunity!!! Apply now and start earning while at home!! #Transcript #Data Collection in #Spain Ready to help a leading technology provider to improve quality of day-to-day interactions with your smart home speakers for higher quality outputs? We are looking for individuals who currently own a Google Home and/ or Amazon Echo in their home, interact with it frequently and would be willing to share recent transcripts of the voice commands you give to these smart home speakers. Requirements: · Must own either an Amazon Echo and/ or Google Home · Ability to extract 30 days of transcripts · Frequent interactions with smart home speakers · Minimum of 100 transcripts available/ submitted How to apply?
Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
Chan, Harris, Kiros, Jamie, Chan, William
A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all possible factorizations within and across all channels. MGLM endows flexible inference, including unconditional generation, conditional generation (where 1 channel is observed and other channels are generated), and partially observed generation (where incomplete observations are spread across all the channels). We experiment with the Multi30K dataset containing English, French, Czech, and German. We demonstrate experiments with unconditional, conditional, and partially conditional generation. We provide qualitative samples sampled unconditionally from the generative joint distribution. We also quantitatively analyze the quality-diversity trade-offs and find MGLM outperforms traditional bilingual discriminative models.
Artificial Intelligence Goes Bilingual--Without a Dictionary
Researcher groups at the University of the Basque Country in Spain, and at Facebook, have separately developed unsupervised machine-learning techniques for teaching neural networks to translate between languages without requiring parallel texts. Researchers at the University of the Basque Country (UPV) in Spain and Facebook have separately developed unsupervised machine-learning techniques for teaching neural networks to translate between languages with no parallel texts. Each method employs as training strategies back translation and denoising; in the first process, a sentence in one language is approximately translated into the other, then translated back into the original language, with networks adjusted to make subsequent attempts closer to identical. Meanwhile, denoising adds noise to a sentence by rearranging or removing words, and attempts to translate that back into the original. The UPV method translates more frequently during training, while the Facebook technique, in addition to encoding a sentence from one language into a more abstract representation before decoding it into the other language, also confirms the intermediate language is truly abstract.