South America
Multilingual Large Language Models: A Systematic Survey
Zhu, Shaolin, Supryadi, null, Xu, Shaoyang, Sun, Haoran, Pan, Leiyu, Cui, Menglong, Du, Jiangcun, Jin, Renren, Branco, António, Xiong, Deyi
This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs). MLLMs not only are able to understand and generate language across linguistic boundaries, but also represent an important advancement in artificial intelligence. We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities. We then discuss the construction of multilingual pre-training and alignment datasets, underscoring the importance of data quality and diversity in enhancing MLLM performance. An important focus of this survey is on the evaluation of MLLMs. We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications. Specifically, we extensively discuss multilingual evaluation benchmarks and datasets, and explore the use of LLMs themselves as multilingual evaluators. To enhance MLLMs from black to white boxes, we also address the interpretability of multilingual capabilities, cross-lingual transfer and language bias within these models. Finally, we provide a comprehensive review of real-world applications of MLLMs across diverse domains, including biology, medicine, computer science, mathematics and law. We showcase how these models have driven innovation and improvements in these specialized fields while also highlighting the challenges and opportunities in deploying MLLMs within diverse language communities and application scenarios. We listed the paper related in this survey and publicly available at https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers.
Generating bilingual example sentences with large language models as lexicography assistants
Merx, Raphael, Vylomova, Ekaterina, Kurniawan, Kemal
We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels: French (high-resource), Indonesian (mid-resource), and Tetun (low-resource), with English as the target language. We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility. Our findings reveal that while LLMs can generate reasonably good dictionary examples, their performance degrades significantly for lower-resourced languages. We also observe high variability in human preferences for example quality, reflected in low inter-annotator agreement rates. To address this, we demonstrate that in-context learning can successfully align LLMs with individual annotator preferences. Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages. Our study also contributes a novel dataset of 600 ratings for LLM-generated sentence pairs, and provides insights into the potential of LLMs in reducing the cost of lexicographic work, particularly for low-resource languages.
Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
Li, Ming, Zhong, Jike, Li, Chenxin, Li, Liuzhuozheng, Lin, Nie, Sugiyama, Masashi
Recent advances in fine-tuning Vision-Language Models (VLMs) have witnessed the success of prompt tuning and adapter tuning, while the classic model fine-tuning on inherent parameters seems to be overlooked. It is believed that fine-tuning the parameters of VLMs with few-shot samples corrupts the pre-trained knowledge since fine-tuning the CLIP model even degrades performance. In this paper, we revisit this viewpoint, and propose a new perspective: fine-tuning the specific parameters instead of all will uncover the power of classic model fine-tuning on VLMs. Through our meticulous study, we propose ClipFit, a simple yet effective method to fine-tune CLIP without introducing any overhead of extra parameters. We demonstrate that by only fine-tuning the specific bias terms and normalization layers, ClipFit can improve the performance of zero-shot CLIP by 7.27\% average harmonic mean accuracy. Lastly, to understand how fine-tuning in CLIPFit affects the pre-trained models, we conducted extensive experimental analyses w.r.t. changes in internal parameters and representations. We found that low-level text bias layers and the first layer normalization layer change much more than other layers. The code is available at \url{https://github.com/minglllli/CLIPFit}.
Machine learning-based probabilistic forecasting of solar irradiance in Chile
Baran, Sándor, Marín, Julio C., Cuevas, Omar, Díaz, Mailiu, Szabó, Marianna, Nicolis, Orietta, Lakatos, Mária
By the end of 2023, renewable sources cover 63.4% of the total electric power demand of Chile, and in line with the global trend, photovoltaic (PV) power shows the most dynamic increase. Although Chile's Atacama Desert is considered the sunniest place on Earth, PV power production, even in this area, can be highly volatile. Successful integration of PV energy into the country's power grid requires accurate short-term PV power forecasts, which can be obtained from predictions of solar irradiance and related weather quantities. Nowadays, in weather forecasting, the state-of-the-art approach is the use of ensemble forecasts based on multiple runs of numerical weather prediction models. However, ensemble forecasts still tend to be uncalibrated or biased, thus requiring some form of post-processing. The present work investigates probabilistic forecasts of solar irradiance for Regions III and IV in Chile. For this reason, 8-member short-term ensemble forecasts of solar irradiance for calendar year 2021 are generated using the Weather Research and Forecasting (WRF) model, which are then calibrated using the benchmark ensemble model output statistics (EMOS) method based on a censored Gaussian law, and its machine learning-based distributional regression network (DRN) counterpart. Furthermore, we also propose a neural network-based post-processing method resulting in improved 8-member ensemble predictions. All forecasts are evaluated against station observations for 30 locations, and the skill of post-processed predictions is compared to the raw WRF ensemble. Our case study confirms that all studied post-processing methods substantially improve both the calibration of probabilistic- and the accuracy of point forecasts. Among the methods tested, the corrected ensemble exhibits the best overall performance. Additionally, the DRN model generally outperforms the corresponding EMOS approach.
It's time for G20 to take the initiative to help build a fairer world
Our world is in a spiral of crises. While conventional threats, such as famine, drought, civil war and genocide, continue to loom over humanity in many parts of the world, the race to assume control of new phenomena that have the potential to change the world – such as novel communications and weapons technologies, artificial intelligence and cryptocurrencies – is also gaining pace and posing new threats to our collective wellbeing. Our current "rules-based international order", which was established in the aftermath of World War II to increase global cooperation, generate economic prosperity, prevent wars, and ensure stability, equality and justice is struggling to navigate these complex challenges and falling short of preventing violations of its founding principles. A state of irregularity, which benefits only a handful of powerful countries and interest groups while spelling catastrophe for the masses, is close to becoming the new normal of the global order. Therefore, it is now not a preference but an obligation to make comprehensive reforms to the system to prevent this scenario from becoming reality.
Now the CINEMA is spying on you: Popular UK chain is secretly using AI to monitor viewers - including their seat choice and snack selection
From smartphones to air fryers, several popular gadgets have been found to'spy' on users. Now, it seems that not even the cinema is safe. British cinema chain, Vue, is quietly using artificial intelligence (AI) to track viewer habits and boost admissions. Affecting all of its 93 cinemas in the UK and Ireland, the system identifies the best locations and times to show films in order to maximise ticket sales. So if you saw'Gladiator II' or'Paddington in Peru' at a Vue cinema over the weekend, there's a chance your viewing habits have been used to inform future screening times.
Underwater robot discovers a never-before-seen creature at the junction of three tectonic plates in the Pacific Ocean - as baffled viewers dub it the 'forbidden toilet scrubber'
At first glance at this creature, you'd be forgiven for mistaking it for a sparkly pair of fake eyelashes. But the creature is very much real and was discovered at the junction of three tectonic plates in the Pacific Ocean. Researchers from the Schmidt Ocean Institute spotted the animal while using an underwater robot to scour the seabed. The animal is a polychaete - a class of marine worms, more widely known as bristle worms. 'To describe this polychaete, one simply must use jazz hands -- it is the only way to capture this deep-sea worm's dazzle!' the experts said in an Instagram post about the polychaete.
TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model
Li, Weixian Waylon, Ziser, Yftah, Xie, Yifei, Cohen, Shay B., Ma, Tiejun
Traditional Learning-To-Rank (LETOR) approaches, including pairwise methods like RankNet and LambdaMART, often fall short by solely focusing on pairwise comparisons, leading to sub-optimal global rankings. Conversely, deep learning based listwise methods, while aiming to optimise entire lists, require complex tuning and yield only marginal improvements over robust pairwise models. To overcome these limitations, we introduce Travelling Salesman Problem Rank (TSPRank), a hybrid pairwise-listwise ranking method. TSPRank reframes the ranking problem as a Travelling Salesman Problem (TSP), a well-known combinatorial optimisation challenge that has been extensively studied for its numerous solution algorithms and applications. This approach enables the modelling of pairwise relationships and leverages combinatorial optimisation to determine the listwise ranking. This approach can be directly integrated as an additional component into embeddings generated by existing backbone models to enhance ranking performance. Our extensive experiments across three backbone models on diverse tasks, including stock ranking, information retrieval, and historical events ordering, demonstrate that TSPRank significantly outperforms both pure pairwise and listwise methods. Our qualitative analysis reveals that TSPRank's main advantage over existing methods is its ability to harness global information better while ranking. TSPRank's robustness and superior performance across different domains highlight its potential as a versatile and effective LETOR solution. The code and preprocessed data are available at https://github.com/waylonli/TSPRank-KDD2025.
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks
Wen, Yongyan, Li, Siyuan, Zuo, Rongchang, Yuan, Lei, Mao, Hangyu, Liu, Peng
Deep reinforcement learning (DRL) has achieved remarkable success in various research domains. However, its reliance on neural networks results in a lack of transparency, which limits its practical applications. To achieve explainability, decision trees have emerged as a popular and promising alternative to neural networks. Nonetheless, due to their limited expressiveness, traditional decision trees struggle with high-dimensional long-horizon continuous control tasks. In this paper, we proposes SkillTree, a novel framework that reduces complex continuous action spaces into discrete skill spaces. Our hierarchical approach integrates a differentiable decision tree within the high-level policy to generate skill embeddings, which subsequently guide the low-level policy in executing skills. By making skill decisions explainable, we achieve skill-level explainability, enhancing the understanding of the decision-making process in complex tasks. Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks in complex robotic arm control domains. Furthermore, SkillTree offers explanations at the skill level, thereby increasing the transparency of the decision-making process.
Large Language Model for Qualitative Research -- A Systematic Mapping Study
Barros, Cauã Ferreira, Azevedo, Bruna Borges, Neto, Valdemar Vicente Graciano, Kassab, Mohamad, Kalinowski, Marcos, Nascimento, Hugo Alexandre D. do, Bandeira, Michelle C. G. S. P.
The exponential growth of text-based data in domains such as healthcare, education, and social sciences has outpaced the capacity of traditional qualitative analysis methods, which are time-intensive and prone to subjectivity. Large Language Models (LLMs), powered by advanced generative AI, have emerged as transformative tools capable of automating and enhancing qualitative analysis. This study systematically maps the literature on the use of LLMs for qualitative research, exploring their application contexts, configurations, methodologies, and evaluation metrics. Findings reveal that LLMs are utilized across diverse fields, demonstrating the potential to automate processes traditionally requiring extensive human input. However, challenges such as reliance on prompt engineering, occasional inaccuracies, and contextual limitations remain significant barriers. This research highlights opportunities for integrating LLMs with human expertise, improving model robustness, and refining evaluation methodologies. By synthesizing trends and identifying research gaps, this study aims to guide future innovations in the application of LLMs for qualitative analysis.