AITopics

Large Vision Language Models (LVLMs) have demonstrated remarkable abilities in understanding and reasoning about both visual and textual information. However, existing evaluation methods for LVLMs, primarily based on benchmarks like Visual Question Answering and image captioning, often fail to capture the full scope of LVLMs' capabilities. These benchmarks are limited by issues such as inadequate assessment of detailed visual perception, data contamination, and a lack of focus on multi-turn reasoning. To address these challenges, we propose LVLM-Playground, a game-based evaluation framework designed to provide a comprehensive assessment of LVLMs' cognitive and reasoning skills in structured environments. LVLM-Playground uses a set of games to evaluate LVLMs on four core tasks: Perceiving, Question Answering, Rule Following, and End-to-End Playing, with each target task designed to assess specific abilities, including visual perception, reasoning, decision-making, etc.

conference paper, game state, log 10, (16 more...)

2503.02358

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Virginia (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Rubehn, Arne, Rzymski, Christoph, Ciucci, Luca, van Dam, Kellen Parker, Kučerová, Alžběta, Bocklage, Katja, Snee, David, Stephen, Abishek, List, Johann-Mattis

Annotating and Inferring Compositional Structures in Numeral Systems Across Languages

Numeral systems across the world's languages vary in fascinating ways, both regarding their synchronic structure and the diachronic processes that determined how they evolved in their current shape. For a proper comparison of numeral systems across different languages, however, it is important to code them in a standardized form that allows for the comparison of basic properties. Here, we present a simple but effective coding scheme for numeral annotation, along with a workflow that helps to code numeral systems in a computer-assisted manner, providing sample data for numerals from 1 to 40 in 25 typologically diverse languages. We perform a thorough analysis of the sample, focusing on the systematic comparison between the underlying and the surface morphological structure. We further experiment with automated models for morpheme segmentation, where we find allomorphy as the major reason for segmentation errors. Finally, we show that subword tokenization algorithms are not viable for discovering morphemes in low-resource scenarios.

johann-mattis list, morpheme, numeral system, (14 more...)

2503.01625

Country:

South America > Paraguay (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(17 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Gu, Xingjian, Ericson, Barbara J.

AI Literacy in K-12 and Higher Education in the Wake of Generative AI: An Integrative Review

Accordingly, education researchers and practitioners have increasingly turned to AI literacy as an important learning objective. However, the definition of AI literacy remains vague. Researchers have used the term to describe learning interventions that differ by in school contexts, learning objectives, and types of AI technologies they use. Furthermore, the research of AI literacy is shifting significantly in the wake of generative AI. Thus, it is crucial to review the field and develop a conceptual framework that captures the diverse conceptualizations of AI literacy. The concept of AI literacy and recognition of its potential significance are well-established [75, 127]. One of the pioneering works by Touretzky et al. in 2019 laid out "five big ideas" for the AI4K12 initiative: "computers perceive the world using sensors", "agents maintain models/representations of the world and use them for reasoning", "computers can learn from data", "making agents interact with humans is a substantial challenge for AI developers", and "AI applications can impact society in both positive and negative ways" [127]. This paper had a major influence on subsequent AI literacy curriculum design. The next year, another prominent work by Long and Magerko defined AI literacy as "a set

ai literacy, literacy, proceedings, (13 more...)

2503.00079

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Hong Kong (0.04)
(26 more...)

Genre:

Instructional Material > Course Syllabus & Notes (0.93)
Research Report > New Finding (0.68)

Industry:

Education > Educational Setting > Higher Education (1.00)
Education > Curriculum (1.00)
Education > Educational Setting > K-12 Education > Primary School (0.93)
Education > Educational Setting > K-12 Education > Secondary School (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support

Pu, Kevin, Lazaro, Daniel, Arawjo, Ian, Xia, Haijun, Xiao, Ziang, Grossman, Tovi, Chen, Yan

AI programming tools enable powerful code generation, and recent prototypes attempt to reduce user effort with proactive AI agents, but their impact on programming workflows remains unexplored. We introduce and evaluate Codellaborator, a design probe LLM agent that initiates programming assistance based on editor activities and task context. We explored three interface variants to assess trade-offs between increasingly salient AI support: prompt-only, proactive agent, and proactive agent with presence and context (Codellaborator). In a within-subject study (N=18), we find that proactive agents increase efficiency compared to prompt-only paradigm, but also incur workflow disruptions. However, presence indicators and interaction context support alleviated disruptions and improved users' awareness of AI processes. We underscore trade-offs of Codellaborator on user control, ownership, and code understanding, emphasizing the need to adapt proactivity to programming processes. Our research contributes to the design exploration and evaluation of proactive AI systems, presenting design implications on AI-integrated programming workflow.

ai agent, interaction, participant, (15 more...)

doi: 10.1145/3706598.3713357

2502.18658

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
(16 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.91)

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

Huang, Yaxuan, Dai, Xili, Wang, Jianan, Qi, Xianbiao, Yuan, Yixing, Yue, Xiangyu

Room layout estimation from multiple-perspective images is poorly investigated due to the complexities that emerge from multi-view geometry, which requires muti-step solutions such as camera intrinsic and extrinsic estimation, image matching, and triangulation. However, in 3D reconstruction, the advancement of recent 3D foundation models such as DUSt3R has shifted the paradigm from the traditional multi-step structure-from-motion process to an end-to-end single-step approach. To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes. By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results. Unlike previous methods that rely on single-perspective or panorama image, Plane-DUSt3R extends the setting to handle multiple-perspective images. Moreover, it offers a streamlined, end-to-end solution that simplifies the process and reduces error accumulation. Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon. Our code is available at: https://github.com/justacar/Plane-DUSt3R

dataset, estimation, layout estimation, (11 more...)

2502.16779

Country:

Asia > China > Hong Kong (0.05)
Oceania > Australia > Western Australia > Perth (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report > New Finding (0.88)
Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

AI Governance InternationaL Evaluation Index (AGILE Index)

Zeng, Yi, Lu, Enmeng, Guan, Xin, Huangfu, Cunqing, Ruan, Zizhe, Younas, Ammar, Sun, Kang, Tang, Xuan, Wang, Yuwei, Suo, Hongjie, Liang, Dongqi, Han, Zhengqiang, Bao, Aorigele, Guo, Xiaoyang, Wang, Jin, Xie, Jiawei, Liang, Yao

The rapid advancement of Artificial Intelligence (AI) technology is profoundly transforming human society and concurrently presenting a series of ethical, legal, and social issues. The effective governance of AI has become a crucial global concern. Since 2022, the extensive deployment of generative AI, particularly large language models, marked a new phase in AI governance. Continuous efforts are being made by the international community in actively addressing the novel challenges posed by these AI developments. As consensus on international governance continues to be established and put into action, the practical importance of conducting a global assessment of the state of AI governance is progressively coming to light. In this context, we initiated the development of the AI Governance InternationaL Evaluation Index (AGILE Index). Adhering to the design principle, "the level of governance should match the level of development," the inaugural evaluation of the AGILE Index commences with an exploration of four foundational pillars: the development level of AI, the AI governance environment, the AI governance instruments, and the AI governance effectiveness. It covers 39 indicators across 18 dimensions to comprehensively assess the AI governance level of 14 representative countries globally. The index is utilized to delve into the status of AI governance to date in 14 countries for the first batch of evaluation. The aim is to depict the current state of AI governance in these countries through data scoring, assist them in identifying their governance stage and uncovering governance issues, and ultimately offer insights for the enhancement of their AI governance systems.

ai governance, governance, indicator, (15 more...)

2502.15859

Country:

North America > United States (0.47)
Asia > Middle East > UAE (0.15)
Europe > United Kingdom (0.14)
(15 more...)

Genre: Research Report (1.00)

Industry:

Law > Statutes (0.94)
Education (0.93)
Information Technology > Security & Privacy (0.69)
Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Ceccherini, Emma, Gallagher, Ian, Jones, Andrew, Lawson, Daniel

Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees

arXiv.org Machine LearningMar-4-2025

While most existing network embedding techniques focus solely on the network features, nodes in real-world networks are associated with a rich set of attributes. For example, in a social network, the user's posts are significantly correlated with trust and following relationships, and it has been shown that jointly exploiting both information sources improves learning performance [Tang et al., 2013]. Network embeddings for static attributed networks include frameworks based on matrix factorisation [Yang et al., 2015], or deep learning [Gao and Huang, 2018, Tu et al., 2017, Tan et al., 2023, Sun et al., 2016, Zhang et al., 2018, Li et al., 2021]. Some existing dynamic network embeddings leverage node attributes, but their exploitation of node attributes is rather limited, as they are usually solely used to initialise the first layer [Sankar et al., 2020, Dwivedi et al., 2023, Liu et al., 2021, Xu et al., 2020b,a]. Approaches that purposefully exploit node attributes include frameworks based on matrix factorisation [Liu et al., 2020, Li et al., 2017], deep learning [Tang et al., 2022, Ahmed et al., 2024, Wei et al., 2019], or Bayesian modelling [Luodi et al., 2024]. However, to the best of our knowledge, none of these methods have stability guarantees, which ensure that if two node/time pairs "behave the same" in the network, their representation is the same up to noise. Stability allows for the comparison of embeddings over time because the embedding space has a consistent interpretation. Attributed unfolded adjacency spectral embedding (AUASE) is a framework for unsupervised dynamic attributed network embedding with stability guarantees.

auase, matrix, unsupervised attributed dynamic network embedding, (11 more...)

arXiv.org Machine Learning

2503.02859

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Bowyer, Sam, Aitchison, Laurence, Ivanova, Desi R.

Position: Don't use the CLT in LLM evals with fewer than a few hundred datapoints

arXiv.org Machine LearningMar-4-2025

Rigorous statistical evaluations of large language models (LLMs), including valid error bars and significance testing, are essential for meaningful and reliable performance assessment. Currently, when such statistical measures are reported, they typically rely on the Central Limit Theorem (CLT). In this position paper, we argue that while CLT-based methods for uncertainty quantification are appropriate when benchmarks consist of thousands of examples, they fail to provide adequate uncertainty estimates for LLM evaluations that rely on smaller, highly specialized benchmarks. In these small-data settings, we demonstrate that CLT-based methods perform very poorly, usually dramatically underestimating uncertainty (i.e. producing error bars that are too small). We give recommendations for alternative frequentist and Bayesian methods that are both easy to implement and more appropriate in these increasingly common scenarios. We provide a simple Python library for these Bayesian methods at https://github.com/sambowyer/bayes_evals .

confidence interval, confidence level, llm eval, (12 more...)

arXiv.org Machine Learning

2503.01747

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Greenland (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Education (1.00)

AIHubMar-3-2025, 12:11:46 GMT

Forthcoming machine learning and AI seminars: March 2025 edition

This post contains a list of the AI-related seminars that are scheduled to take place between 3 March and 30 April 2025. All events detailed here are free and open for anyone to attend virtually. Pareto sensitivity, most-changing sub-fronts, and optimal knee solutions Speaker: Luis Nunes Vicente (Lehigh University) Organised by: Association of European Operational Research Societies To receive the seminar link, sign up to the mailing list. Title to be confirmed Speaker: Maximilian Nickel (Meta AI) Organised by: Vanderbilt University Check the Google group for Zoom instructions. Unsupervised Discovery of Interpretable Structure in Complex Systems Speaker: Mark Hamilton (MIT/Microsoft) Organised by: EPFL Zoom link is here.

artificial intelligence, machine learning, organised, (13 more...)

AIHub

Country:

North America > Canada > Alberta (0.15)
North America > United States > Minnesota (0.11)
Europe > Sweden (0.08)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.31)

arXiv.org Artificial IntelligenceMar-3-2025

MoCFL: Mobile Cluster Federated Learning Framework for Highly Dynamic Network

Fang, Kai, Deng, Jiangtao, Dong, Chengzu, Naseem, Usman, Liu, Tongcun, Feng, Hailin, Wang, Wei

Frequent fluctuations of client nodes in highly dynamic mobile clusters can lead to significant changes in feature space distribution and data drift, posing substantial challenges to the robustness of existing federated learning (FL) strategies. To address these issues, we proposed a mobile cluster federated learning framework (MoCFL). MoCFL enhances feature aggregation by introducing an affinity matrix that quantifies the similarity between local feature extractors from different clients, addressing dynamic data distribution changes caused by frequent client churn and topology changes. Additionally, MoCFL integrates historical and current feature information when training the global classifier, effectively mitigating the catastrophic forgetting problem frequently encountered in mobile scenarios. This synergistic combination ensures that MoCFL maintains high performance and stability in dynamically changing mobile environments. Experimental results on the UNSW-NB15 dataset show that MoCFL excels in dynamic environments, demonstrating superior robustness and accuracy while maintaining reasonable training costs.

accuracy, artificial intelligence, machine learning, (12 more...)

doi: 10.1145/3696410.3714515

2503.01557

Country:

Oceania > Australia > New South Wales > Sydney (0.15)
Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)