AITopics | image 0

Collaborating Authors

image 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a28e024ccd623ed113fb19683fa0910d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 02:38:37 GMT

agent, caption, representation, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Utilizing a Geospatial Foundation Model for Coastline Delineation in Small Sandy Islands

Chhabra, Tishya, Bajpai, Manisha, Zesk, Walter, Tibbits, Skylar

arXiv.org Artificial IntelligenceNov-14-2025

We present an initial evaluation of NASA and IBM's Prithvi-EO-2.0 geospatial foundation model on shoreline delineation of small sandy islands using satellite images. We curated and labeled a dataset of 225 multispectral images of two Maldivian islands, which we publicly release, and fine-tuned both the 300M and 600M parameter versions of Prithvi on training subsets ranging from 5 to 181 images. Our experiments show that even with as few as 5 training images, the models achieve high performance (F1 of 0.94, IoU of 0.79). Our results demonstrate the strong transfer learning capability of Prithvi, underscoring the potential of such models to support coastal monitoring in data-poor regions.

artificial intelligence, machine learning, prithvi-eo-2, (14 more...)

arXiv.org Artificial Intelligence

2511.10177

Country: North America > United States (0.69)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis

Zheng, Xinhan, Wu, Huyu, Wang, Xueting, Jiang, Haiyun

arXiv.org Artificial IntelligenceOct-31-2025

Multimodal large language models (MLLMs) exhibit a pronounced preference for textual inputs when processing vision-language data, limiting their ability to reason effectively from visual evidence. Unlike prior studies that attribute this text bias to external factors such as data imbalance or instruction tuning, we propose that the bias originates from the model's internal architecture. Specifically, we hypothesize that visual key vectors (Visual Keys) are out-of-distribution (OOD) relative to the text key space learned during language-only pretraining. Consequently, these visual keys receive systematically lower similarity scores during attention computation, leading to their under-utilization in the context representation. To validate this hypothesis, we extract key vectors from LLaVA and Qwen2.5-VL and analyze their distributional structures using qualitative (t-SNE) and quantitative (Jensen-Shannon divergence) methods. The results provide direct evidence that visual and textual keys occupy markedly distinct subspaces within the attention space. The inter-modal divergence is statistically significant, exceeding intra-modal variation by several orders of magnitude. These findings reveal that text bias arises from an intrinsic misalignment within the attention key space rather than solely from external data factors.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2510.26721

Country: Asia > China (0.48)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

Ghahroodi, Omid, Hemmat, Arshia, Nouri, Marzia, Hosseini, Seyed Mohammad Hadi, Dastgheib, Doratossadat, Sanian, Mohammad Vali, Sahebi, Alireza, Zohrabi, Reihaneh, Rohban, Mohammad Hossein, Asgari, Ehsaneddin, Baghshah, Mahdieh Soleymani

arXiv.org Artificial IntelligenceAug-26-2025

Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model's ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.1729

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education > Secondary School (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

a28e024ccd623ed113fb19683fa0910d-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 08:24:34 GMT

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Experiment on creating a neural network with weights determined by the potential of a simulated electrostatic field

Polad, Geidarov

arXiv.org Artificial IntelligenceJul-8-2025

This paper explores the possibility of determining the weights and thresholds of a neural network using the potential -- a parameter of an electrostatic field -- without analytical calculations and without applying training algorithms. The work is based on neural network architectures employing metric recognition methods. The electrostatic field is simulated in the Builder C++ environment. In the same environment, a neural network based on metric recognition methods is constructed, with the weights of the first-layer neurons determined by the values of the potentials of the simulated electrostatic field. The effectiveness of the resulting neural network within the simulated system is evaluated using the MNIST test dataset under various initial conditions of the simulated system. The results demonstrated functional viability. The implementation of this approach shows that a neural network can obtain weight values almost instantaneously from the electrostatic field, without the need for analytical computations, lengthy training procedures, or massive training datasets.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3103/S0147688222050161

2507.02933

Country: Asia (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Structured Attention Matters to Multimodal LLMs in Document Understanding

Liu, Chang, Chen, Hongkai, Cai, Yujun, Wu, Hang, Ye, Qingwen, Yang, Ming-Hsuan, Wang, Yiwei

arXiv.org Artificial IntelligenceJun-30-2025

Document understanding remains a significant challenge for multimodal large language models (MLLMs). While previous research has primarily focused on locating evidence pages through precise multimodal queries, our work investigates a fundamental yet overlooked aspect: how input format influences document comprehension performance. Through systematic analysis, we discover that raw OCR text often impairs rather than improves MLLMs' performance, which is a counterintuitive finding we attribute to attention dispersion and structure loss. To further substantiate our hypothesis, we propose a novel structure-preserving approach that encodes document elements using the LaTex paradigm, maintaining the hierarchical organization and spatial relationships critical for comprehension. Our attention analysis reveals that structured text induces structured attention patterns on both textual and visual content, directing models to focus on semantically meaningful regions while reducing attention waste. This approach significantly enhances MLLMs' document question answering performance across diverse document types without requiring architectural modifications or additional training.

arxiv preprint arxiv, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.216

Country:

Europe > Germany (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States > California > Merced County > Merced (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models

Edwards, Kristen M., Tehranchi, Farnaz, Miller, Scarlett R., Ahmed, Faez

arXiv.org Artificial IntelligenceApr-1-2025

The subjective evaluation of early stage engineering designs, such as conceptual sketches, traditionally relies on human experts. However, expert evaluations are time-consuming, expensive, and sometimes inconsistent. Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human experts. However, no existing framework assesses expert equivalence. This paper introduces a rigorous statistical framework to determine whether an AI judge's ratings match those of human experts. We apply this framework in a case study evaluating four VLM-based judges on key design metrics (uniqueness, creativity, usefulness, and drawing quality). These AI judges employ various in-context learning (ICL) techniques, including uni- vs. multimodal prompts and inference-time reasoning. The same statistical framework is used to assess three trained novices for expert-equivalence. Results show that the top-performing AI judge, using text- and image-based ICL with reasoning, achieves expert-level agreement for uniqueness and drawing quality and outperforms or matches trained novices across all metrics. In 6/6 runs for both uniqueness and creativity, and 5/6 runs for both drawing quality and usefulness, its agreement with experts meets or exceeds that of the majority of trained novices. These findings suggest that reasoning-supported VLM models can achieve human-expert equivalence in design evaluation. This has implications for scaling design evaluation in education and practice, and provides a general statistical framework for validating AI judges in other domains requiring subjective content evaluation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.00938

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Add feedback

A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs

Broomfield, Julius, Sharma, Kartik, Kumar, Srijan

arXiv.org Artificial IntelligenceFeb-27-2025

Large language models (LLMs) have recently demonstrated remarkable advancements in embodying diverse personas, enhancing their effectiveness as conversational agents and virtual assistants. Consequently, LLMs have made significant strides in processing and integrating multimodal information. However, even though human personas can be expressed in both text and image, the extent to which the modality of a persona impacts the embodiment by the LLM remains largely unexplored. In this paper, we investigate how do different modalities influence the expressiveness of personas in multimodal LLMs. To this end, we create a novel modality-parallel dataset of 40 diverse personas varying in age, gender, occupation, and location. This consists of four modalities to equivalently represent a persona: image-only, text-only, a combination of image and small text, and typographical images, where text is visually stylized to convey persona-related attributes. We then create a systematic evaluation framework with 60 questions and corresponding metrics to assess how well LLMs embody each persona across its attributes and scenarios. Comprehensive experiments on $5$ multimodal LLMs show that personas represented by detailed text show more linguistic habits, while typographical images often show more consistency with the persona. Our results reveal that LLMs often overlook persona-specific details conveyed through images, highlighting underlying limitations and paving the way for future research to bridge this gap. We release the data and code at https://github.com/claws-lab/persona-modality .

modality, persona, representation, (15 more...)

arXiv.org Artificial Intelligence

2502.20504

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
South America > Brazil > São Paulo (0.04)
(23 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.42)

Industry:

Banking & Finance (0.94)
Health & Medicine (0.93)
Transportation (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Filters

Collaborating Authors

image 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

a28e024ccd623ed113fb19683fa0910d-Supplemental-Conference.pdf

Utilizing a Geospatial Foundation Model for Coastline Delineation in Small Sandy Islands

Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

a28e024ccd623ed113fb19683fa0910d-Supplemental-Conference.pdf

f610a13de080fb8df6cf972fc01ad93f-Supplemental.pdf

Experiment on creating a neural network with weights determined by the potential of a simulated electrostatic field

Structured Attention Matters to Multimodal LLMs in Document Understanding

AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models

A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs