AITopics

2506.0358

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
(16 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry: Education > Curriculum > Subject-Specific Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Hodgkinson, Liam, Wang, Zhichao, Mahoney, Michael W.

Models of Heavy-Tailed Mechanistic Universality

arXiv.org Machine LearningJun-5-2025

Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians, Hessians, and weight matrices has led to the introduction of the concept of heavy-tailed mechanistic universality (HT-MU). Multiple lines of empirical evidence suggest a robust correlation between heavy-tailed metrics and model performance, indicating that HT-MU may be a fundamental aspect of deep learning efficacy. Here, we propose a general family of random matrix models -- the high-temperature Marchenko-Pastur (HTMP) ensemble -- to explore attributes that give rise to heavy-tailed behavior in trained neural networks. Under this model, spectral densities with power laws on (upper and lower) tails arise through a combination of three independent factors (complex correlation structures in the data; reduced temperatures during training; and reduced eigenvector entropy), appearing as an implicit bias in the model structure, and they can be controlled with an "eigenvalue repulsion" parameter. Implications of our model on other appearances of heavy tails, including neural scaling laws, optimizer trajectories, and the five-plus-one phases of neural network training, are discussed.

artificial intelligence, machine learning, matrix, (17 more...)

2506.0347

Country:

North America > United States > California > Alameda County > Berkeley (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)
(3 more...)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pielok, Tobias, Bischl, Bernd, Rügamer, David

Revisiting Unbiased Implicit Variational Inference

arXiv.org Machine LearningJun-5-2025

Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from complex distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. Although unbiased implicit variational inference (UIVI) has largely been dismissed as imprecise and computationally prohibitive because of its inner MCMC loop, we revisit this method and show that UIVI's MCMC loop can be effectively replaced via importance sampling and the optimal proposal distribution can be learned stably by minimizing an expected forward Kullback-Leibler divergence without bias. Our refined approach demonstrates superior performance or parity with state-of-the-art methods on established SIVI benchmarks.

artificial intelligence, international conference, machine learning, (11 more...)

2506.03839

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Vischer, Marc Aurel, Otero, Noelia, Ma, Jackie

Spatially Resolved Meteorological and Ancillary Data in Central Europe for Rainfall Streamflow Modeling

arXiv.org Machine LearningJun-5-2025

We present a dataset for rainfall streamflow modeling that is fully spatially resolved with the aim of taking neural network-driven hydrological modeling beyond lumped catchments. To this end, we compiled data covering five river basins in central Europe: upper Danube, Elbe, Oder, Rhine, and Weser. The dataset contains meteorological forcings, as well as ancillary information on soil, rock, land cover, and orography. The data is harmonized to a regular 9km times 9km grid and contains daily values that span from October 1981 to September 2011. We also provide code to further combine our dataset with publicly available river discharge data for end-to-end rainfall streamflow modeling.

artificial intelligence, information management, machine learning, (14 more...)

2506.03819

Country:

Europe > Central Europe (0.61)
South America > Chile (0.05)
South America > Brazil (0.05)
(8 more...)

Genre:

Research Report (0.50)
Overview (0.47)

Technology:

Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Holtermann, Carolin, Röttger, Paul, Lauscher, Anne

Around the World in 24 Hours: Probing LLM Knowledge of Time and Place

arXiv.org Artificial IntelligenceJun-5-2025

Reasoning over time and space is essential for understanding our world. However, the abilities of language models in this area are largely unexplored as previous work has tested their abilities for logical reasoning in terms of time and space in isolation or only in simple or artificial environments. In this paper, we present the first evaluation of the ability of language models to jointly reason over time and space. To enable our analysis, we create GeoTemp, a dataset of 320k prompts covering 289 cities in 217 countries and 37 time zones. Using GeoTemp, we evaluate eight open chat models of three different model families for different combinations of temporal and geographic knowledge. We find that most models perform well on reasoning tasks involving only temporal knowledge and that overall performance improves with scale. However, performance remains constrained in tasks that require connecting temporal and geographical information. We do not find clear correlations of performance with specific geographic regions. Instead, we find a significant performance increase for location names with low model perplexity, suggesting their repeated occurrence during model training. We further demonstrate that their performance is heavily influenced by prompt formulation - a direct injection of geographical knowledge leads to performance gains, whereas, surprisingly, techniques like chain-of-thought prompting decrease performance on simpler tasks.

large language model, machine learning, natural language, (20 more...)

2506.03984

Country:

Oceania (1.00)
North America (1.00)
Asia > Middle East > Iran (0.14)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Schneider, Nicole R, Ramachandran, Nandini, O'Sullivan, Kent, Samet, Hanan

DistRAG: Towards Distance-Based Spatial Reasoning in LLMs

arXiv.org Artificial IntelligenceJun-5-2025

Many real world tasks where Large Language Models (LLMs) can be used require spatial reasoning, like Point of Interest (POI) recommendation and itinerary planning. However, on their own LLMs lack reliable spatial reasoning capabilities, especially about distances. To address this problem, we develop a novel approach, DistRAG, that enables an LLM to retrieve relevant spatial information not explicitly learned during training. Our method encodes the geodesic distances between cities and towns in a graph and retrieves a context subgraph relevant to the question. Using this technique, our method enables an LLM to answer distance-based reasoning questions that it otherwise cannot answer. Given the vast array of possible places an LLM could be asked about, DistRAG offers a flexible first step towards providing a rudimentary `world model' to complement the linguistic knowledge held in LLMs.

large language model, machine learning, natural language, (19 more...)

2506.03424

Country:

North America > United States (0.50)
Europe (0.47)
Oceania > Australia > New South Wales > Sydney (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Popular ScienceJun-4-2025, 15:18:25 GMT

Wild cockatoos are learning how to use water fountains

Breakthroughs, discoveries, and DIY tips sent every weekday. Animals constantly adapt to their environments, but keeping up with humanity's dramatic influence on the natural world poses unique challenges. While this unfortunately ends in disaster for many species, some populations are figuring out new ways to navigate urban spaces. Back in 2022, wildlife biologists confirmed that a community of wild, sulfur-crested cockatoos in Sydney, Australia had learned how to open the lids of curbside trash bins on garbage day in order to snack on locals' leftovers. But that's not all these birds can do.

artificial intelligence, cockatoo, fountain, (10 more...)

Popular Science

Country: Oceania > Australia > New South Wales > Sydney (0.25)

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence (0.56)

arXiv.org Artificial IntelligenceJun-4-2025

Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data

Lübbers, Christopher Lee

Paraphrasing re-expresses meaning to enhance applications like text simplification, machine translation, and question-answering. Specific paraphrase types facilitate accurate semantic analysis and robust language models. However, existing paraphrase-type generation methods often misalign with human preferences due to reliance on automated metrics and limited human-annotated training data, obscuring crucial aspects of semantic fidelity and linguistic transformations. This study addresses this gap by leveraging a human-ranked paraphrase-type dataset and integrating Direct Preference Optimization (DPO) to align model outputs directly with human judgments. DPO-based training increases paraphrase-type generation accuracy by 3 percentage points over a supervised baseline and raises human preference ratings by 7 percentage points. A newly created human-annotated dataset supports more rigorous future evaluations. Additionally, a paraphrase-type detection model achieves F1 scores of 0.91 for addition/deletion, 0.78 for same polarity substitution, and 0.70 for punctuation changes. These findings demonstrate that preference data and DPO training produce more reliable, semantically accurate paraphrases, enabling downstream applications such as improved summarization and more robust question-answering. The PTD model surpasses automated metrics and provides a more reliable framework for evaluating paraphrase quality, advancing paraphrase-type research toward richer, user-aligned language generation and establishing a stronger foundation for future evaluations grounded in human-centric criteria.

large language model, machine learning, natural language, (15 more...)

2506.02018

Country:

Europe > Germany > Lower Saxony > Gottingen (0.50)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(17 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Behrendt, Maike, Wagner, Stefan Sylvius, Weinmann, Carina, Bormann, Marike, Warne, Mira, Harmeling, Stefan

Natural Language Processing to Enhance Deliberation in Political Online Discussions: A Survey

arXiv.org Artificial IntelligenceJun-4-2025

Political online participation in the form of discussing political issues and exchanging opinions among citizens is gaining importance with more and more formats being held digitally. To come to a decision, a careful discussion and consideration of opinions and a civil exchange of arguments, which is defined as the act of deliberation, is desirable. The quality of discussions and participation processes in terms of their deliberativeness highly depends on the design of platforms and processes. To facilitate online communication for both participants and initiators, machine learning methods offer a lot of potential. In this work we want to showcase which issues occur in political online discussions and how machine learning can be used to counteract these issues and enhance deliberation.

artificial intelligence, machine learning, natural language, (20 more...)

2506.02533

Country:

Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(34 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-4-2025

CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation

Gorla, Aditya, Wang, Ryan, Liu, Zhengtong, An, Ulzee, Sankararaman, Sriram

We present CACTI, a masked autoencoding approach for imputing tabular data that leverages the structure in missingness patterns and contextual information. Our approach employs a novel median truncated copy masking training strategy that encourages the model to learn from empirical patterns of missingness while incorporating semantic relationships between features - captured by column names and text descriptions - to better represent feature dependence. These dual sources of inductive bias enable CACTI to outperform state-of-the-art methods - an average $R^2$ gain of 7.8% over the next best method (13.4%, 6.1%, and 5.3% under missing not at random, at random and completely at random, respectively) - across a diverse range of datasets and missingness conditions. Our results highlight the value of leveraging dataset-specific contextual information and missingness patterns to enhance imputation performance.

artificial intelligence, machine learning, natural language, (16 more...)

2506.02306

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > New Zealand (0.04)
North America > Canada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)