South America
Bielik 11B v2 Technical Report
Ociepa, Krzysztof, Flis, Łukasz, Wróbel, Krzysztof, Gwoździej, Adrian, Kinas, Remigiusz
We present Bielik 11B v2, a state-of-the-art language model optimized for Polish text processing. Built on the Mistral 7B v0.2 architecture and scaled to 11B parameters using depth up-scaling, this model demonstrates exceptional performance across Polish language benchmarks while maintaining strong cross-lingual capabilities. We introduce two key technical innovations: Weighted Instruction Cross-Entropy Loss, which optimizes learning across diverse instruction types by assigning quality-based weights to training examples, and Adaptive Learning Rate, which dynamically adjusts based on context length. Comprehensive evaluation across multiple benchmarks demonstrates that Bielik 11B v2 outperforms many larger models, including those with 2-6 times more parameters, and significantly surpasses other specialized Polish language models on tasks ranging from linguistic understanding to complex reasoning. The model's parameter efficiency and extensive quantization options enable deployment across various hardware configurations, advancing Polish language AI capabilities and establishing new benchmarks for resource-efficient language modeling in less-represented languages.
Steering Large Language Models with Register Analysis for Arbitrary Style Transfer
Yang, Xinchen, Carpuat, Marine
Large Language Models (LLMs) have demonstrated strong capabilities in rewriting text across various styles. However, effectively leveraging this ability for example-based arbitrary style transfer, where an input text is rewritten to match the style of a given exemplar, remains an open challenge. A key question is how to describe the style of the exemplar to guide LLMs toward high-quality rewrites. In this work, we propose a prompting method based on register analysis to guide LLMs to perform this task. Empirical evaluations across multiple style transfer tasks show that our prompting approach enhances style transfer strength while preserving meaning more effectively than existing prompting strategies.
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks
Zeng, Wenqi, Sun, Yuqi, Ma, Chenxi, Tan, Weimin, Yan, Bo
Medical vision-language models (VLMs) have shown promise as clinical assistants across various medical fields. However, specialized dermatology VLM capable of delivering professional and detailed diagnostic analysis remains underdeveloped, primarily due to less specialized text descriptions in current dermatology multimodal datasets. To address this issue, we propose MM-Skin, the first large-scale multimodal dermatology dataset that encompasses 3 imaging modalities, including clinical, dermoscopic, and pathological and nearly 10k high-quality image-text pairs collected from professional textbooks. In addition, we generate over 27k diverse, instruction-following vision question answering (VQA) samples (9 times the size of current largest dermatology VQA dataset). Leveraging public datasets and MM-Skin, we developed SkinVL, a dermatology-specific VLM designed for precise and nuanced skin disease interpretation. Comprehensive benchmark evaluations of SkinVL on VQA, supervised fine-tuning (SFT) and zero-shot classification tasks across 8 datasets, reveal its exceptional performance for skin diseases in comparison to both general and medical VLM models. The introduction of MM-Skin and SkinVL offers a meaningful contribution to advancing the development of clinical dermatology VLM assistants. MM-Skin is available at https://github.com/ZwQ803/MM-Skin
V-EfficientNets: Vector-Valued Efficiently Scaled Convolutional Neural Network Models
Neto, Guilherme Vieira, Valle, Marcos Eduardo
EfficientNet models are convolutional neural networks optimized for parameter allocation by jointly balancing network width, depth, and resolution. Renowned for their exceptional accuracy, these models have become a standard for image classification tasks across diverse computer vision benchmarks. While traditional neural networks learn correlations between feature channels during training, vector-valued neural networks inherently treat multidimensional data as coherent entities, taking for granted the inter-channel relationships. This paper introduces vector-valued EfficientNets (V-EfficientNets), a novel extension of EfficientNet designed to process arbitrary vector-valued data. The proposed models are evaluated on a medical image classification task, achieving an average accuracy of 99.46% on the ALL-IDB2 dataset for detecting acute lymphoblastic leukemia. V-EfficientNets demonstrate remarkable efficiency, significantly reducing parameters while outperforming state-of-the-art models, including the original EfficientNet. The source code is available at https://github.com/mevalle/v-nets.
Pope Leo identifies AI as main challenge in first meeting with cardinals
Pope Leo XIV has held his first meeting with the world's cardinals since his election as the head of the Catholic Church, identifying artificial intelligence (AI) as one of the most crucial issues facing humanity. Leo, the first American pope, laid out a vision of his papacy at the Vatican on Saturday, telling the cardinals who elected him that AI poses challenges to defending "human dignity, justice and labour" – a view shared with his predecessor, the late Pope Francis. Explaining his choice of name, the pontiff said he identified with the late Leo XIII, who had defended workers' rights during his 1878-1903 papacy at the dawn of the industrial age, adding that "social teaching" was now needed in response to the modern-day revolution brought by AI. The late Pope Francis, who died last month, warned that AI risked turning human relations into mere algorithms and called for an international treaty to regulate it. Francis warned the Group of Seven industrialised nations last year that AI must remain human-centric, so that decisions about when to use weapons or even less-lethal tools would not fall to machines.
Moscow and Kyiv trade accusations as Russia holds Victory Day spectacle
Russia and Ukraine have accused one another of violating a three-day ceasefire as Moscow marked Victory Day by welcoming allies to a grand military parade. Russia's President Vladimir Putin marked the 80th anniversary of victory over Nazi Germany on Friday alongside China's Xi Jinping, in an event clearly intended to bolster support for his three-year offensive against Ukraine, which he had unilaterally paused for 72 hours to mark the occasion. "Russia has been and will remain an indestructible barrier against Nazism, Russophobia and anti-Semitism," said Putin, seeking to draw parallels between World War II – or the Great Patriotic War as it is named in Russia and other parts of the former Soviet Union – and the Ukraine war. Russia maintains that its February 2022 invasion of its neighbour is a battle against a "Nazi" regime in Kyiv. Ukraine has dismissed that claim as "incomprehensible".
Russia's Putin hosts China's Xi at massive Moscow military parade on Red Square
Chinese soldiers are seen marching in Moscow's Red Square on Friday, May 9. (Credit: CCTV) Chinese President Xi Jinping was photographed standing next to Vladimir Putin on Friday as thousands of Russian troops and military vehicles rumbled through Moscow's Red Square during the country's annual Victory Day parade. The event, marking Russia's 80th anniversary of the defeat of Nazi Germany in World War II, featured over 11,500 troops and more than 180 military vehicles, including tanks, armored infantry vehicles and artillery used on the battlefield in Ukraine. "We are proud of their courage and determination, their spiritual force that always has brought us victory," Putin said about the Russian troops fighting in the war. Russian flag carrier Aeroflot canceled more than 100 flights to and from Moscow and delayed over 140 others on Wednesday as the military were repelling repeated Ukrainian drone attacks on the capital. Russian President Vladimir Putin, center right, and Chinese President Xi Jinping, center, watch the Victory Day military parade in Moscow, Russia, on Friday, May 9. (Mikhail Korytov/Photo host agency RIA Novosti via AP) Ukrainian authorities also reported scores of Russian strikes on Friday that killed at least two people in the Kherson and Zaporizhzhia regions and damaged buildings.
Putin hosts Victory Day parade with tight security and a short ceasefire
In the days ahead of the proposed truce, Moscow and Kyiv exchanged a barrage of strikes. Flights at airports across Russia were cancelled and some 60,000 passengers left stranded in the wake of Ukrainian drone attacks. Heavy restrictions are in place in the centre of Moscow as Russia prepares to mark the Soviet Union's victory over Nazi Germany. Russia says 27 world leaders are attending the event, with thousands of troops marching on Red Square ahead of a parade of some of Russia's latest weaponry. Brazil's Luiz Inácio Lula da Silva and Venezuelan President Nicolas Maduro are among the assembled guests, along with Serbian President Aleksandar Vucic and Robert Fico, Slovakia's prime minister who is the only European Union leader to travel to Moscow. Ukraine's Volodymyr Zelensky had earlier warned that he could not guarantee the safety of anyone attending the event and has urged heads of state not to travel to Moscow.
An Efficient GPU-based Implementation for Noise Robust Sound Source Localization
Lin, Zirui, Takigahira, Masayuki, Terakado, Naoya, Gulzar, Haris, Busto, Monikka Roslianna, Eda, Takeharu, Itoyama, Katsutoshi, Nakadai, Kazuhiro, Amano, Hideharu
Dept. of Information and Computer Science, Keio University, Kanagawa, Japan Email: hunga@am.ics.keio.ac.jp Abstract --Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular V alue Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jet-son AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex -A78AE v8.2 64-bit CPUs, we observe speedups of 5648.7 for GSVD calculations and 10.7 for the SSL module, while speedups of 4245.1 for GSVD calculation and 17.3 for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep leraning tasks. I NTRODUCTION Audition is a critical aspect of human inter-individual communication [1].
A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition
Ahmad, Hussain, Zeng, Qingyang, Wan, Jing
The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of annotated multimodal datasets and the lack of standardized baselines. To address these challenges, we introduce the U-MNER framework and release the Twitter2015-Urdu dataset, a pioneering resource for Urdu MNER. Adapted from the widely used Twitter2015 dataset, it is annotated with Urdu-specific grammar rules. We establish benchmark baselines by evaluating both text-based and multimodal models on this dataset, providing comparative analyses to support future research on Urdu MNER. The U-MNER framework integrates textual and visual context using Urdu-BERT for text embeddings and ResNet for visual feature extraction, with a Cross-Modal Fusion Module to align and fuse information. Our model achieves state-of-the-art performance on the Twitter2015-Urdu dataset, laying the groundwork for further MNER research in low-resource languages.