Goto

Collaborating Authors

 South-East District


Global urban visual perception varies across demographics and personalities

Quintana, Matias, Gu, Youlong, Liang, Xiucheng, Hou, Yujun, Ito, Koichi, Zhu, Yihan, Abdelrahman, Mahmoud, Biljecki, Filip

arXiv.org Artificial Intelligence

Understanding people's preferences is crucial for urban planning, yet current approaches often combine responses from multi-cultural populations, obscuring demographic differences and risking amplifying biases. We conducted a largescale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics -- including gender, age, income, education, race and ethnicity, and personality traits -- shape perceptions among 1,000 participants with balanced demographics from five countries and 45 nationalities. This dataset, Street Perception Evaluation Considering Socioeconomics (SPECS), reveals demographic- and personality-based differences across six traditional indicators -- safe, lively, wealthy, beautiful, boring, depressing -- and four new ones -- live nearby, walk, cycle, green. Location-based sentiments further shape these preferences. Machine learning models trained on existing global datasets tend to overestimate positive indicators and underestimate negative ones compared to human responses, underscoring the need for local context. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.


SQuAI: Scientific Question-Answering with Multi-Agent Retrieval-Augmented Generation

Besrour, Ines, He, Jingbo, Schreieder, Tobias, Färber, Michael

arXiv.org Artificial Intelligence

We present SQuAI (https://squai.scads.ai/), a scalable and trustworthy multi-agent retrieval-augmented generation (RAG) framework for scientific question answering (QA) with large language models (LLMs). SQuAI addresses key limitations of existing RAG systems in the scholarly domain, where complex, open-domain questions demand accurate answers, explicit claims with citations, and retrieval across millions of scientific documents. Built on over 2.3 million full-text papers from arXiv.org, SQuAI employs four collaborative agents to decompose complex questions into sub-questions, retrieve targeted evidence via hybrid sparse-dense retrieval, and adaptively filter documents to improve contextual relevance. To ensure faithfulness and traceability, SQuAI integrates in-line citations for each generated claim and provides supporting sentences from the source documents. Our system improves faithfulness, answer relevance, and contextual relevance by up to +0.088 (12%) over a strong RAG baseline. We further release a benchmark of 1,000 scientific question-answer-evidence triplets to support reproducibility. With transparent reasoning, verifiable citations, and domain-wide scalability, SQuAI demonstrates how multi-agent RAG enables more trustworthy scientific QA with LLMs.


UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations

Mühlematter, Dominik J., Che, Lin, Hong, Ye, Raubal, Martin, Wiedemann, Nina

arXiv.org Artificial Intelligence

Forecasting urban phenomena such as housing prices and public health indicators requires the effective integration of various geospatial data. Current methods primarily utilize task-specific models, while recent foundation models for spatial representations often support only limited modalities and lack multimodal fusion capabilities. To overcome these challenges, we present UrbanFusion, a Geo-Foundation Model (GeoFM) that features Stochastic Multimodal Fusion (SMF). The framework employs modality-specific encoders to process different types of inputs, including street view imagery, remote sensing data, cartographic maps, and points of interest (POIs) data. These multimodal inputs are integrated via a Transformer-based fusion module that learns unified representations. An extensive evaluation across 41 tasks in 56 cities worldwide demonstrates UrbanFusion's strong generalization and predictive performance compared to state-of-the-art GeoAI models. Specifically, it 1) outperforms prior foundation models on location-encoding, 2) allows multimodal input during inference, and 3) generalizes well to regions unseen during training. UrbanFusion can flexibly utilize any subset of available modalities for a given location during both pretraining and inference, enabling broad applicability across diverse data availability scenarios. All source code is available at https://github.com/DominikM198/UrbanFusion.


TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use

He, Pengfei, Dai, Zhenwei, He, Bing, Liu, Hui, Tang, Xianfeng, Lu, Hanqing, Li, Juanhui, Ding, Jiayuan, Mukherjee, Subhabrata, Wang, Suhang, Xing, Yue, Tang, Jiliang, Dumoulin, Benoit

arXiv.org Artificial Intelligence

Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs' tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate LLMs' tool use capability through diverse tasks with fine-grained evaluation metrics. TRAJECT-Bench pairs high-fidelity, executable tools across practical domains with tasks grounded in production-style APIs, and synthesizes trajectories that vary in breadth (parallel calls) and depth (interdependent chains). Besides final accuracy, TRAJECT-Bench also reports trajectory-level diagnostics, including tool selection and argument correctness, and dependency/order satisfaction. Analyses reveal failure modes such as similar tool confusion and parameter-blind selection, and scaling behavior with tool diversity and trajectory length where the bottleneck of transiting from short to mid-length trajectories is revealed, offering actionable guidance for LLMs' tool use.


Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Masis, Tessa, O'Connor, Brendan

arXiv.org Artificial Intelligence

Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.


Do Large Language Models Latently Perform Multi-Hop Reasoning?

Yang, Sohee, Gribovskaya, Elena, Kassner, Nora, Geva, Mor, Riedel, Sebastian

arXiv.org Artificial Intelligence

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.


Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Naggita, Keziah, LaChance, Julienne, Xiang, Alice

arXiv.org Artificial Intelligence

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.


How to Create Dummy Data in Python

#artificialintelligence

Dummy data is randomly generated data that can be substituted for live data. Whether you are a Developer, Software Engineer, or Data Scientist, sometimes you need dummy data to test what you have built, it can be a web app, mobile app, or machine learning model. If you are using python language, you can use a faker python package to create dummy data of any type, for example, dates, transactions, names, texts, time, and others. Faker is a simple python package that generates fake data with different data types. Faker package is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.


Using Big Data Analytics for Transboundary Water Management

#artificialintelligence

Southern Africa has experienced drought-flood cycles for the past decade that strain the ability of any country to properly manage water resources. This dynamic is exacerbated by human drivers such as the heavy reliance of sectors such as mining and agriculture on groundwater and surface water, as well as subsistence agriculture in rural areas along rivers. These factors have progressively depleted natural freshwater systems and contributed to an accumulation of sediment in river systems. In a region where two or more countries share many of the groundwater and surface resources, water security cuts across the socioeconomic divide and is both a rural and urban issue. For example, the City of Cape Town had to heavily ration all water uses in 2017 and 2018, as its dams were drying up.