AITopics

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceSep-29-2025

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization

Lu, Xudong, Zheng, Zhi, Wan, Yi, Yao, Yongxiang, Wang, Annan, Zhang, Renrui, Xia, Panwang, Wu, Qiong, Li, Qingyun, Lin, Weifeng, Zhao, Xiangyu, Ma, Peifeng, Yang, Xue, Li, Hongsheng

Cross-View Geo-Localization (CVGL) focuses on identifying correspondences between images captured from distinct perspectives of the same geographical location. However, existing CVGL approaches are typically restricted to a single view or modality, and their direct visual matching strategy lacks interpretability: they only determine whether two images correspond, without explaining the rationale behind the match. In this paper, we present GLEAM-C, a foundational CVGL model that unifies multiple views and modalities-including UAV imagery, street maps, panoramic views, and ground photographs-by aligning them exclusively with satellite imagery. Our framework enhances training efficiency through optimized implementation while achieving accuracy comparable to prior modality-specific CVGL models through a two-phase training strategy. Moreover, to address the lack of interpretability in traditional CVGL methods, we leverage the reasoning capabilities of multimodal large language models (MLLMs) to propose a new task, GLEAM-X, which combines cross-view correspondence prediction with explainable reasoning. To support this task, we construct a bilingual benchmark using GPT-4o and Doubao-1.5-Thinking-Vision-Pro to generate training and testing data. The test set is further refined through detailed human revision, enabling systematic evaluation of explainable cross-view reasoning and advancing transparency and scalability in geo-localization. Together, GLEAM-C and GLEAM-X form a comprehensive CVGL pipeline that integrates multi-modal, multi-view alignment with interpretable correspondence analysis, unifying accurate cross-view matching with explainable reasoning and advancing Geo-Localization by enabling models to better Explain And Match. Code and datasets used in this work will be made publicly accessible at https://github.com/Lucky-Lance/GLEAM.

explanation, large language model, machine learning, (19 more...)

2509.0745

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Health & Medicine (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 03:27:32 GMT

Distinguishing discrete and continuous behavioral variability using warped autoregressive HMMs

W ARHMM achieves similar performance to the standard ARHMM while using fewer behavioral syllables.

arhmm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceOct-24-2024

Predicting potato plant vigor from the seed tuber properties

Atza, Elisa, Klooster, Rob, Hofstra, Falko, van der Werff, Frank, van Doorn, Hans, Budko, Neil

The vigor of potato plants, defined as the canopy area at the end of the exponential growth stage, depends on the origin and physiological state of the seed tuber. Experiments carried out with six potato varieties in three test fields over three years show that there is a 73%-90% correlation in the vigor of the plants from the same seedlot grown in different test fields. However, these correlations are not always observed on the level of individual varieties and vanish or become negative when the seed tubers and young plants experience environmental stress. A comprehensive study of the association between the vigor and the seed tuber biochemistry has revealed that, while 50%-70% of the variation in the plant vigor is explained by the tuber data, the vigor is dominated by the potato genotype. Analysis of individual predictors, such as the abundance of a particular metabolite, indicates that the vigor enhancing properties of the seed tubers differ between genotypes. Variety-specific models show that, for some varieties, up to 30% of the vigor variation within the variety is explained by and can be predicted from the tuber biochemistry, whereas, for other varieties, the association between the tuber composition and the vigor is much weaker.

artificial intelligence, machine learning, vigor, (18 more...)

2410.19875

Country: Europe (0.93)

Genre: Research Report > Experimental Study (0.68)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Food & Agriculture > Agriculture (0.93)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceFeb-8-2024

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Yan, Siming, Bai, Min, Chen, Weifeng, Zhou, Xiong, Huang, Qixing, Li, Li Erran

By combining natural language understanding and the generation capabilities and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented reasoning capabilities in the real world. However, the generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucinating nonexistent scene elements, missing significant parts of the scene, and inferring incorrect attributes and relationships between objects. To address these issues, we introduce a novel framework, ViGoR (Visual Grounding Through Fine-Grained Reward Modeling) that utilizes fine-grained reward modeling to significantly enhance the visual grounding of LVLMs over pre-trained baselines. This improvement is efficiently achieved using much cheaper human evaluations instead of full supervisions, as well as automated methods. We show the effectiveness of our approach through numerous metrics on several benchmarks. Additionally, we construct a comprehensive and challenging dataset specifically designed to validate the visual grounding capabilities of LVLMs. Finally, we plan to release our human annotation comprising approximately 16,000 images and generated text pairs with fine-grained evaluations to contribute to related research in the community.

language model, lvlm, reward model, (10 more...)

2402.06118

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Freymuth, Niklas, Schreiber, Nicolas, Becker, Philipp, Taranovic, Aleksandar, Neumann, Gerhard

Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors

arXiv.org Artificial IntelligenceNov-9-2022

Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior.

artificial intelligence, demonstration, machine learning, (17 more...)

2210.08121

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Niv, Yael, Daw, Nathaniel D., Dayan, Peter

How fast to work: Response vigor, motivation and tonic dopamine

Neural Information Processing SystemsDec-31-2006

Reinforcement learning models have long promised to unify computational, psychological and neural accounts of appetitively conditioned behavior. However, the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for reinforcement. Existing reinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to address the simple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater productivity even when working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding.

dopamine, reinforcement, vigor, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(2 more...)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Niv, Yael, Daw, Nathaniel D., Dayan, Peter

How fast to work: Response vigor, motivation and tonic dopamine

Neural Information Processing SystemsDec-31-2006

dopamine, reinforcement, vigor, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(2 more...)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Niv, Yael, Daw, Nathaniel D., Dayan, Peter

How fast to work: Response vigor, motivation and tonic dopamine

Neural Information Processing SystemsDec-31-2006

Reinforcement learning models have long promised to unify computational, psychologicaland neural accounts of appetitively conditioned behavior. However,the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for reinforcement. Existingreinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to address thesimple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater productivity evenwhen working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States (1.00)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)