Goto

Collaborating Authors

 South America


Can Masked Autoencoders Also Listen to Birds?

arXiv.org Artificial Intelligence

Masked Autoencoders (MAEs) learn rich semantic representations in audio classification through an efficient self-supervised reconstruction task. However, general-purpose models fail to generalize well when applied directly to fine-grained audio domains. Specifically, bird-sound classification requires distinguishing subtle inter-species differences and managing high intra-species acoustic variability, revealing the performance limitations of general-domain Audio-MAEs. This work demonstrates that bridging this domain gap domain gap requires full-pipeline adaptation, not just domain-specific pretraining data. We systematically revisit and adapt the pretraining recipe, fine-tuning methods, and frozen feature utilization to bird sounds using BirdSet, a large-scale bioacoustic dataset comparable to AudioSet. Our resulting Bird-MAE achieves new state-of-the-art results in BirdSet's multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, enhancing the utility of frozen MAE representations and closely approaching fine-tuning performance in low-resource settings. Bird-MAE's prototypical probes outperform linear probing by up to 37 percentage points in mean average precision and narrow the gap to fine-tuning across BirdSet downstream tasks. Bird-MAE also demonstrates robust few-shot capabilities with prototypical probing in our newly established few-shot benchmark on BirdSet, highlighting the potential of tailored self-supervised learning pipelines for fine-grained audio domains.


Generalisation and benign over-fitting for linear regression onto random functional covariates

arXiv.org Machine Learning

We study theoretical predictive performance of ridge and ridge-less least-squares regression when covariate vectors arise from evaluating $p$ random, means-square continuous functions over a latent metric space at $n$ random and unobserved locations, subject to additive noise. This leads us away from the standard assumption of i.i.d. data to a setting in which the $n$ covariate vectors are exchangeable but not independent in general. Under an assumption of independence across dimensions, $4$-th order moment, and other regularity conditions, we obtain probabilistic bounds on a notion of predictive excess risk adapted to our random functional covariate setting, making use of recent results of Barzilai and Shamir. We derive convergence rates in regimes where $p$ grows suitably fast relative to $n$, illustrating interplay between ingredients of the model in determining convergence behaviour and the role of additive covariate noise in benign-overfitting.


Vehicle detection from GSV imagery: Predicting travel behaviour for cycling and motorcycling using Computer Vision

arXiv.org Artificial Intelligence

Transportation influence health by shaping exposure to physical activity, air pollution and injury risk. Comparative data on cycling and motorcycling behaviours is scarce, particularly at a global scale. Street view imagery, such as Google Street View (GSV), combined with computer vision, is a valuable resource for efficiently capturing travel behaviour data. This study demonstrates a novel approach using deep learning on street view images to estimate cycling and motorcycling levels across diverse cities worldwide. We utilized data from 185 global cities. The data on mode shares of cycling and motorcycling estimated using travel surveys or censuses. We used GSV images to detect cycles and motorcycles in sampled locations, using 8000 images per city. The YOLOv4 model, fine-tuned using images from six cities, achieved a mean average precision of 89% for detecting cycles and motorcycles. A global prediction model was developed using beta regression with city-level mode shares as outcome, with log transformed explanatory variables of counts of GSV-detected images with cycles and motorcycles, while controlling for population density. We found strong correlations between GSV motorcycle counts and motorcycle mode share (0.78) and moderate correlations between GSV cycle counts and cycling mode share (0.51). Beta regression models predicted mode shares with $R^2$ values of 0.614 for cycling and 0.612 for motorcycling, achieving median absolute errors (MDAE) of 1.3% and 1.4%, respectively. Scatterplots demonstrated consistent prediction accuracy, though cities like Utrecht and Cali were outliers. The model was applied to 60 cities globally for which we didn't have recent mode share data. We provided estimates for some cities in the Middle East, Latin America and East Asia. With computer vision, GSV images capture travel modes and activity, providing insights alongside traditional data sources.


Exploring Content and Social Connections of Fake News with Explainable Text and Graph Learning

arXiv.org Artificial Intelligence

The global spread of misinformation and concerns about content trustworthiness have driven the development of automated fact-checking systems. Since false information often exploits social media dynamics such as "likes" and user networks to amplify its reach, effective solutions must go beyond content analysis to incorporate these factors. Moreover, simply labelling content as false can be ineffective or even reinforce biases such as automation and confirmation bias. This paper proposes an explainable framework that combines content, social media, and graph-based features to enhance fact-checking. It integrates a misinformation classifier with explainability techniques to deliver complete and interpretable insights supporting classification decisions. Experiments demonstrate that multimodal information improves performance over single modalities, with evaluations conducted on datasets in English, Spanish, and Portuguese. Additionally, the framework's explanations were assessed for interpretability, trustworthiness, and robustness with a novel protocol, showing that it effectively generates human-understandable justifications for its predictions. The code and experiments are available at https://github.com/MeLLL-UFF/mu2X/ .


Crossing Borders Without Crossing Boundaries: How Sociolinguistic Awareness Can Optimize User Engagement with Localized Spanish AI Models Across Hispanophone Countries

arXiv.org Artificial Intelligence

Large language models are, by definition, based on language. In an effort to underscore the critical need for regional localized models, this paper examines primary differences between variants of written Spanish across Latin America and Spain, with an in-depth sociocultural and linguistic contextualization therein. We argue that these differences effectively constitute significant gaps in the quotidian use of Spanish among dialectal groups by creating sociolinguistic dissonances, to the extent that locale-sensitive AI models would play a pivotal role in bridging these divides. In doing so, this approach informs better and more efficient localization strategies that also serve to more adequately meet inclusivity goals, while securing sustainable active daily user growth in a major low-risk investment geographic area. Therefore, implementing at least the proposed five sub variants of Spanish addresses two lines of action: to foment user trust and reliance on AI language models while also demonstrating a level of cultural, historical, and sociolinguistic awareness that reflects positively on any internationalization strategy.


Understanding Distribution Structure on Calibrated Recommendation Systems

arXiv.org Artificial Intelligence

--Traditional recommender systems aim to generate a recommendation list comprising the most relevant or similar items to the user's profile. These approaches can create recommendation lists that omit item genres from the less prominent areas of a user's profile, thereby undermining the user's experience. T o solve this problem, the calibrated recommendation system provides a guarantee of including less representative areas in the recommended list. The calibrated context works with three distributions. The first is from the user's profile, the second is from the candidate items, and the last is from the recommendation list. These distributions are G-dimensional, where G is the total number of genres in the system. This high dimensionality requires a different evaluation method, considering that traditional recommenders operate in a one-dimensional data space. In this sense, we implement fifteen models that help to understand how these distributions are structured. We evaluate the users' patterns in three datasets from the movie domain. The results indicate that the models of outlier detection provide a better understanding of the structures. The calibrated system creates recommendation lists that act similarly to traditional recommendation lists, allowing users to change their groups of preferences to the same degree. Commonly, traditional recommender systems generate recommendations with miscalibration [1]. Miscalibration means that the recommendation lists do not follow the user preferences distribution, instead suggesting items from user's dominant area of interest. It creates an overspecialized recommendation list in which the items from the less dominant area are overwhelmed. This effect puts the user in a filter bubble or an echo chamber problem [2]. For instance, when a specific area dominates the recommended list, the user likely has few other options to interact with, aside from items within that dominant area. Then, the subsequent lists are recommended, with the dominant area becoming more overspecialized. In recent years, calibrated recommendation systems have attracted attention [3]-[8] from the recommender system community to overcome this issue. This type of system demonstrates the capacity to improve several objectives, such as diversity [3], control of popularity bias [4], item coverage [5], precision [6], and the reduction of miscalibration [7]. To illustrate how calibrated recommendation works, consider a scenario: if a user's preferences distribution indicates Corresponding author is Diego Corr ˆ ea da Silva.


Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

Neural Information Processing Systems

Nevertheless, we show that under certain minimal and realistic distributional settings, it is possible to obtain a (1+ null)-approximation with a nearly linear running time and poly (k/null) + O ( k log n) columns. Namely, we show that if the input matrix A has the form A = B + E, where B is an arbitrary rank-k matrix, and E is a matrix with i.i.d.