AITopics

2503.14324

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Iceland (0.04)
Europe > Greece (0.04)
(2 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Theodoropoulos, George S., Patakis, Andreas, Tritsarolis, Andreas, Theodoridis, Yannis

FLP-XR: Future Location Prediction on Extreme Scale Maritime Data in Real-time

arXiv.org Artificial IntelligenceMar-19-2025

Movements of maritime vessels are inherently complex and challenging to model due to the dynamic and often unpredictable nature of maritime operations. Even within structured maritime environments, such as shipping lanes and port approaches, where vessels adhere to navigational rules and predefined sea routes, uncovering underlying patterns is far from trivial. The necessity for accurate modeling of the mobility of maritime vessels arises from the numerous applications it serves, including risk assessment for collision avoidance, optimization of shipping routes, and efficient port management. This paper introduces FLP-XR, a model that leverages maritime mobility data to construct a robust framework that offers precise predictions while ensuring extremely fast training and inference capabilities. We demonstrate the efficiency of our approach through an extensive experimental study using three real-world AIS datasets. According to the experimental results, FLP-XR outperforms the current state-of-the-art in many cases, whereas it performs 2-3 orders of magnitude faster in terms of training and inference.

data mining, machine learning, real time system, (20 more...)

2503.13491

Country:

Europe > Greece (0.05)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Netherlands > South Holland > Rotterdam (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Transportation > Marine (0.86)
Transportation > Freight & Logistics Services > Shipping (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Ljungbergh, William, Lilja, Adam, Ling, Adam Tonderski. Arvid Laveno, Lindström, Carl, Verbeke, Willem, Fu, Junsheng, Petersson, Christoffer, Hammarstrand, Lars, Felsberg, Michael

GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

arXiv.org Artificial IntelligenceMar-19-2025

Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time. In this direction, we propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime, (1) general occupancy, capturing the evolving structure of the 3D scene; (2) ego occupancy, modeling the ego vehicle path through the environment; and (3) distilled high-level features from a vision foundation model. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction. Our results demonstrate that continuous 4D geometric and semantic occupancy prediction provides a scalable and effective pre-training paradigm for autonomous driving. For code and additional visualizations, see \href{https://research.zenseact.com/publications/gasp/.

artificial intelligence, geometric and semantic self-supervised pre-training, machine learning, (3 more...)

2503.15672

Country:

Europe > United Kingdom > UK North Sea (0.04)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.04)

Genre: Research Report > New Finding (0.53)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Liu, Junming, Meng, Siyuan, Gao, Yanting, Mao, Song, Cai, Pinlong, Yan, Guohang, Chen, Yirong, Bian, Zilin, Shi, Botian, Wang, Ding

Multimodal reasoning in Large Language Models (LLMs) struggles with incomplete knowledge and hallucination artifacts, challenges that textual Knowledge Graphs (KGs) only partially mitigate due to their modality isolation. While Multimodal Knowledge Graphs (MMKGs) promise enhanced cross-modal understanding, their practical construction is impeded by semantic narrowness of manual text annotations and inherent noise in visual-semantic entity linkages. In this paper, we propose Vision-align-to-Language integrated Knowledge Graph (VaLiK), a novel approach for constructing MMKGs that enhances LLMs reasoning through cross-modal information supplementation. Specifically, we cascade pre-trained Vision-Language Models (VLMs) to align image features with text, transforming them into descriptions that encapsulate image-specific information. Furthermore, we developed a cross-modal similarity verification mechanism to quantify semantic consistency, effectively filtering out noise introduced during feature alignment. Even without manually annotated image captions, the refined descriptions alone suffice to construct the MMKG. Compared to conventional MMKGs construction paradigms, our approach achieves substantial storage efficiency gains while maintaining direct entity-to-image linkage capability. Experimental results on multimodal reasoning tasks demonstrate that LLMs augmented with VaLiK outperform previous state-of-the-art models. Our code is published at https://github.com/Wings-Of-Disaster/VaLiK.

large language model, machine learning, natural language, (19 more...)

2503.12972

Country:

North America > Mexico (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(11 more...)

Genre: Research Report > Promising Solution (0.54)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Yao, Jianzhu, Wang, Kevin, Hsieh, Ryan, Zhou, Haisu, Zou, Tianqing, Cheng, Zerui, Wang, Zhangyang, Viswanath, Pramod

Reasoning and strategic behavior in social interactions is a hallmark of intelligence. This form of reasoning is significantly more sophisticated than isolated planning or reasoning tasks in static settings (e.g., math problem solving). In this paper, we present Strategic Planning, Interaction, and Negotiation (SPIN-Bench), a new multi-domain evaluation designed to measure the intelligence of strategic planning and social reasoning. While many existing benchmarks focus on narrow planning or single-agent reasoning, SPIN-Bench combines classical PDDL tasks, competitive board games, cooperative card games, and multi-agent negotiation scenarios in one unified framework. The framework includes both a benchmark as well as an arena to simulate and evaluate the variety of social settings to test reasoning and strategic behavior of AI agents. We formulate the benchmark SPIN-Bench by systematically varying action spaces, state complexity, and the number of interacting agents to simulate a variety of social settings where success depends on not only methodical and step-wise decision making, but also conceptual inference of other (adversarial or cooperative) participants. Our experiments reveal that while contemporary LLMs handle basic fact retrieval and short-range planning reasonably well, they encounter significant performance bottlenecks in tasks requiring deep multi-hop reasoning over large state spaces and socially adept coordination under uncertainty. We envision SPIN-Bench as a catalyst for future research on robust multi-agent planning, social reasoning, and human--AI teaming. Project Website: https://spinbench.github.io/

large language model, machine learning, spin-bench, (22 more...)

2503.12349

Country:

Europe > Sweden (0.14)
Europe > Denmark (0.14)
Europe > Norway (0.14)
(23 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.92)
Leisure & Entertainment > Games > Chess (0.69)
Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Siegel, Noah Y., Heess, Nicolas, Perez-Ortiz, Maria, Camburu, Oana-Maria

Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance

As large language models (LLMs) become increasingly capable, ensuring that their self-generated explanations are faithful to their internal decision-making process is critical for safety and oversight. In this work, we conduct a comprehensive counterfactual faithfulness analysis across 62 models from 8 families, encompassing both pretrained and instruction-tuned variants and significantly extending prior studies of counterfactual tests. We introduce phi-CCT, a simplified variant of the Correlational Counterfactual Test, which avoids the need for token probabilities while explaining most of the variance of the original test. Our findings reveal clear scaling trends: larger models are consistently more faithful on our metrics. However, when comparing instruction-tuned and human-imitated explanations, we find that observed differences in faithfulness can often be attributed to explanation verbosity, leading to shifts along the true-positive/false-positive Pareto frontier. While instruction-tuning and prompting can influence this trade-off, we find limited evidence that they fundamentally expand the frontier of explanatory faithfulness beyond what is achievable with pretrained models of comparable size. Our analysis highlights the nuanced relationship between instruction-tuning, verbosity, and the faithful representation of model decision processes.

large language model, machine learning, natural language, (19 more...)

2503.13445

Country:

Asia > Middle East > Republic of Türkiye (0.06)
Europe > France (0.04)
North America > United States > New York (0.04)
(23 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Retail (1.00)
Media (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zbinden, Robin, van Tiel, Nina, Sumbul, Gencer, Vanalli, Chiara, Kellenberger, Benjamin, Tuia, Devis

MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling

Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.

artificial intelligence, machine learning, predictor, (18 more...)

2503.13057

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Virginia (0.04)
North America > United States > Maryland (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Awais, Ch Muhammad, Reggiannini, Marco, Moroni, Davide, Salerno, Emanuele

A Survey on SAR ship classification using Deep Learning

arXiv.org Artificial IntelligenceMar-14-2025

Deep learning (DL) has emerged as a powerful tool for Synthetic Aperture Radar (SAR) ship classification. This survey comprehensively analyzes the diverse DL techniques employed in this domain. We identify critical trends and challenges, highlighting the importance of integrating handcrafted features, utilizing public datasets, data augmentation, fine-tuning, explainability techniques, and fostering interdisciplinary collaborations to improve DL model performance. This survey establishes a first-of-its-kind taxonomy for categorizing relevant research based on DL models, handcrafted feature use, SAR attribute utilization, and the impact of fine-tuning. We discuss the methodologies used in SAR ship classification tasks and the impact of different techniques. Finally, the survey explores potential avenues for future research, including addressing data scarcity, exploring novel DL architectures, incorporating interpretability techniques, and establishing standardized performance metrics. By addressing these challenges and leveraging advancements in DL, researchers can contribute to developing more accurate and efficient ship classification systems, ultimately enhancing maritime surveillance and related applications.

artificial intelligence, classification, machine learning, (17 more...)

2503.11906

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Asia > China (0.04)
Africa > Comoros > Grande Comore > Moroni (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Transportation > Marine (1.00)
Transportation > Freight & Logistics Services > Shipping (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-14-2025

AQUA-SLAM: Tightly-Coupled Underwater Acoustic-Visual-Inertial SLAM with Sensor Calibration

Xu, Shida, Zhang, Kaicheng, Wang, Sen

Abstract--Underwater environments pose significant challenges for visual Simultaneous Localization and Mapping (SLAM) systems due to limited visibility, inadequate illumination, and sporadic loss of structural features in images. Addressing these challenges, this paper introduces a novel, tightly-coupled Acoustic-Visual-Inertial SLAM approach, termed AQUA-SLAM, to fuse a Doppler Velocity Log (DVL), a stereo camera, and an Inertial Measurement Unit (IMU) within a graph optimization framework. The proposed system will be made open-source for the community. These vehicles are indispensable occasionally outside the camera's field of view leading to for tasks such as seabed mapping, pipeline and intermittent loss of visual tracking. Therefore, although visual cable inspections, biological and environmental monitoring, SLAM techniques have recently made tremendous progress and the maintenance of underwater infrastructure. A key in terrestrial settings [1], [2], [3], their performance and application area is the detailed visual inspection of subsea robustness are inevitably compromised in underwater due to structures, including offshore wind turbine foundations, where the complex and dynamic nature of aquatic environments. Considering cameras are widely equipped on underwater (IMU), known as visual-inertial SLAM (VI-SLAM) [4], [5], robots, visual Simultaneous Localization and Mapping can alleviate some of the challenges arising from transient, (SLAM) techniques emerge as natural solutions. The rapid attenuation of underwater SLAM systems, particularly against shortterm of light energy in water severely limits the visibility of visual disruptions, can be substantially enhanced [6]. However, most of the challenges for underwater vision, such Moreover, underwater vision often suffers from poor lighting as the limited visibility and the "marine snow", are longterm and blizzards of "marine snow" caused by small particles of effects that last at least from tens of seconds to a few organic matter in water, severely reducing image quality with minutes before being mitigated. VI-SLAM also encounters increased motion blur and dynamic image regions.

artificial intelligence, calibration, machine learning, (17 more...)

2503.1142

Country:

Europe > North Sea (0.04)
Atlantic Ocean > North Atlantic Ocean > North Sea (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Greece > Ionian Islands > Corfu (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Renewable > Wind (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Sensing and Signal Processing (0.93)
(2 more...)

arXiv.org Artificial IntelligenceMar-13-2025

Information Density Principle for MLLM Benchmarks

Li, Chunyi, Li, Xiaozhe, Zhang, Zicheng, Tian, Yuan, Jia, Ziheng, Liu, Xiaohong, Min, Xiongkuo, Wang, Jia, Duan, Haodong, Chen, Kai, Zhai, Guangtao

With the emergence of Multimodal Large Language Models (MLLMs), hundreds of benchmarks have been developed to ensure the reliability of MLLMs in downstream tasks. However, the evaluation mechanism itself may not be reliable. For developers of MLLMs, questions remain about which benchmark to use and whether the test results meet their requirements. Therefore, we propose a critical principle of Information Density, which examines how much insight a benchmark can provide for the development of MLLMs. We characterize it from four key dimensions: (1) Fallacy, (2) Difficulty, (3) Redundancy, (4) Diversity. Through a comprehensive analysis of more than 10,000 samples, we measured the information density of 19 MLLM benchmarks. Experiments show that using the latest benchmarks in testing can provide more insight compared to previous ones, but there is still room for improvement in their information density. We hope this principle can promote the development and application of future MLLM benchmarks. Project page: https://github.com/lcysyzxdxc/bench4bench

benchmark, eval, zhang, (15 more...)

2503.10079

Country:

Atlantic Ocean > Mediterranean Sea (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > Central America (0.04)
(5 more...)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)