Goto

Collaborating Authors

 Analamanga


Tracing Multilingual Factual Knowledge Acquisition in Pretraining

Liu, Yihong, Wang, Mingyang, Kargaran, Amir Hossein, Körner, Felicia, Nie, Ercong, Plank, Barbara, Yvon, François, Schütze, Hinrich

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data. However, most studies evaluate only the final model, leaving the development of factual recall and crosslingual consistency throughout pretraining largely unexplored. In this work, we trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B as a case study. We find that both accuracy and consistency improve over time for most languages. We show that this improvement is primarily driven by the fact frequency in the pretraining corpus: more frequent facts are more likely to be recalled correctly, regardless of language. Yet, some low-frequency facts in non-English languages can still be correctly recalled. Our analysis reveals that these instances largely benefit from crosslingual transfer of their English counterparts -- an effect that emerges predominantly in the early stages of pretraining. We pinpoint two distinct pathways through which multilingual factual knowledge acquisition occurs: (1) frequency-driven learning, which is dominant and language-agnostic, and (2) crosslingual transfer, which is limited in scale and typically constrained to relation types involving named entities. We release our code and data to facilitate further research at https://github.com/cisnlp/multilingual-fact-tracing.


Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


Enhancing Large Language Models with Neurosymbolic Reasoning for Multilingual Tasks

Nezhad, Sina Bagheri, Agrawal, Ameeta

arXiv.org Artificial Intelligence

Large language models (LLMs) often struggle to perform multi-target reasoning in long-context scenarios where relevant information is scattered across extensive documents. To address this challenge, we introduce NeuroSymbolic Augmented Reasoning (NSAR), which combines the benefits of neural and symbolic reasoning during inference. NSAR explicitly extracts symbolic facts from text and generates executable Python code to handle complex reasoning steps. Through extensive experiments across seven languages and diverse context lengths, we demonstrate that NSAR significantly outperforms both a vanilla RAG baseline and advanced prompting strategies in accurately identifying and synthesizing multiple pieces of information. Our results highlight the effectiveness of combining explicit symbolic operations with neural inference for robust, interpretable, and scalable reasoning in multilingual settings.


Controllable Context Sensitivity and the Knob Behind It

Minder, Julian, Du, Kevin, Stoehr, Niklas, Monea, Giovanni, Wendler, Chris, West, Robert, Cotterell, Ryan

arXiv.org Artificial Intelligence

When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.


Madagascar's ancient baobab forests are being restored by communities – with a little help from AI

AIHub

The collaboration between communities and scientists aims to restore baobab forests in Madagascar to this natural state. Six of the world's eight baobab species are indigenous to Madagascar, where the distinctive trees with giant trunks have historically grown in huge forests. But these forests are threatened by slash-and-burn agriculture – 4,000 hectares of baobab forest in Madagascar are destroyed every year. Baobab trees can live for 1,000 years and one hectare of land can support eight fully grown baobab trees. But many have been left orphaned – standing alone in barren areas with no contact with the wild animals that spread their seeds, helping the baobabs to reproduce.


Employing Federated Learning for Training Autonomous HVAC Systems

Hagström, Fredrik, Garg, Vikas, Oliveira, Fabricio

arXiv.org Artificial Intelligence

Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy.


Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

Raponi, Elena, Carraz, Nathanael Rakotonirina, Rapin, Jérémy, Doerr, Carola, Teytaud, Olivier

arXiv.org Artificial Intelligence

The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.


Nowcasting Madagascar's real GDP using machine learning algorithms

Ramaharo, Franck, Rasolofomanana, Gerzhino

arXiv.org Artificial Intelligence

We investigate the predictive power of different machine learning algorithms to nowcast Madagascar's gross domestic product (GDP). We trained popular regression models, including linear regularized regression (Ridge, Lasso, Elastic-net), dimensionality reduction model (principal component regression), k-nearest neighbors algorithm (k-NN regression), support vector regression (linear SVR), and tree-based ensemble models (Random forest and XGBoost regressions), on 10 Malagasy quarterly macroeconomic leading indicators over the period 2007Q1--2022Q4, and we used simple econometric models as a benchmark. We measured the nowcast accuracy of each model by calculating the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Our findings reveal that the Ensemble Model, formed by aggregating individual predictions, consistently outperforms traditional econometric models. We conclude that machine learning models can deliver more accurate and timely nowcasts of Malagasy economic performance and provide policymakers with additional guidance for data-driven decision making.


Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Naggita, Keziah, LaChance, Julienne, Xiang, Alice

arXiv.org Artificial Intelligence

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.


Fairness in generative modeling

Zameshina, Mariia, Teytaud, Olivier, Teytaud, Fabien, Hosu, Vlad, Carraz, Nathanael, Najman, Laurent, Wagner, Markus

arXiv.org Artificial Intelligence

We design general-purpose algorithms for addressing fairness issues There are many facets to fairness. An algorithm may be considered and mode collapse in generative modeling. More precisely, to design to be fair if its results are independent of some variables, particularly fair algorithms for as many sensitive variables as possible, including for sensitive variables. Fairness [18] can be measured in terms of variables we might not be aware of, we assume no prior knowledge separation, i.e., whether the probability of a given prediction, given of sensitive variables: our algorithms use unsupervised fairness the actual value, is the same for all values of a sensitive variable.