AITopics

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Ventura County > Thousand Oaks (0.04)
Europe > United Kingdom > Wales (0.04)
(2 more...)

Industry:

Law (1.00)
Health & Medicine (0.93)
Information Technology (0.67)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
(2 more...)

Neural Information Processing SystemsDec-27-2025, 03:04:11 GMT

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations

Semantic modeling approaches employed on monocular video often ingest outputs from off-the-shelf SLAM/SfM pipelines, which are anecdotally observed to perform poorly or fail completely on some fraction of the videos of interest. These target videos may vary widely in complexity of scenes, activities, camera trajectory, etc. Unfortunately, such semantically-rich video data often comes with no ground-truth 3D information, and in practice it is prohibitively costly or impossible to obtain ground truth reconstructions or camera pose post-hoc. This paper proposes a novel evaluation protocol, Object Reprojection Error (ORE) to benchmark camera trajectories; ORE computes reprojection error for static objects within the video and requires only lightweight object tracklet annotations. These annotations are easy to gather on new or existing video, enabling ORE to be calculated on essentially arbitrary datasets. We show that ORE maintains high rank correlation with standard metrics based on groundtruth.

annotation, camera pose benchmark, object reprojection error, (7 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceOct-23-2025

KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

Jiang, Kailin, Jiang, Hongbo, Jiang, Ning, Gao, Zhi, Bi, Jinhe, Ren, Yuchen, Li, Bin, Du, Yuntao, Liu, Lei, Li, Qing

Large Multimodal Models encode extensive factual knowledge in their pre-trained weights. However, its knowledge remains static and limited, unable to keep pace with real-world developments, which hinders continuous knowledge acquisition. Effective knowledge injection thus becomes critical, involving two goals: knowledge adaptation (injecting new knowledge) and knowledge retention (preserving old knowledge). Existing methods often struggle to learn new knowledge and suffer from catastrophic forgetting. To address this, we propose KORE, a synergistic method of KnOwledge-oRientEd augmentations and constraints for injecting new knowledge into large multimodal models while preserving old knowledge. Unlike general text or image data augmentation, KORE automatically converts individual knowledge items into structured and comprehensive knowledge to ensure that the model accurately learns new knowledge, enabling accurate adaptation. Meanwhile, KORE stores previous knowledge in the covariance matrix of LMM's linear layer activations and initializes the adapter by projecting the original weights into the matrix's null space, defining a fine-tuning direction that minimizes interference with previous knowledge, enabling powerful retention. Extensive experiments on various LMMs, including LLaVA-v1.5-7B, LLaVA-v1.5-13B, and Qwen2.5-VL-7B, show that KORE achieves superior new knowledge injection performance and effectively mitigates catastrophic forgetting.

large language model, machine learning, natural language, (21 more...)

2510.19316

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)

Neural Information Processing SystemsOct-9-2025, 10:48:49 GMT

eb206443c93d07da8b1974b768d8a0d4-Paper-Datasets_and_Benchmarks.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Ventura County > Thousand Oaks (0.04)
Europe > United Kingdom > Wales (0.04)
(2 more...)

Industry:

Law (1.00)
Health & Medicine (0.93)
Information Technology (0.67)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
(2 more...)

arXiv.org Machine LearningMay-20-2025

Transformer learns the cross-task prior and regularization for in-context learning

Lu, Fei, Yu, Yue

Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context and its utility for downstream predictions remain open questions. This paper aims to address these questions by examining ICL for inverse linear regression (ILR), where context inference can be characterized by unsupervised learning of underlying weight vectors. Focusing on the challenging scenario of rank-deficient inverse problems, where context length is smaller than the number of unknowns in the weight vectors and regularization is necessary, we introduce a linear transformer to learn the inverse mapping from contextual examples to the underlying weight vector. Our findings reveal that the transformer implicitly learns both a prior distribution and an effective regularization strategy, outperforming traditional ridge regression and regularization methods. A key insight is the necessity of low task dimensionality relative to the context length for successful learning. Furthermore, we numerically verify that the error of the transformer estimator scales linearly with the noise level, the ratio of task dimension to context length, and the condition number of the input data. These results not only demonstrate the potential of transformers for solving ill-posed inverse problems, but also provide a new perspective towards understanding the knowledge extraction mechanism within transformers.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Machine Learning

2505.12138

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Glaze, Sean, Inclezan, Daniela

Architecture for Simulating Behavior Mode Changes in Norm-Aware Autonomous Agents

arXiv.org Artificial IntelligenceFeb-13-2025

This paper presents an architecture for simulating the actions of a norm-aware intelligent agent whose behavior with respect to norm compliance is set, and can later be changed, by a human controller. Updating an agent's behavior mode from a norm-abiding to a riskier one may be relevant when the agent is involved in time-sensitive rescue operations, for example. We base our work on the Authorization and Obligation Policy Language AOPL designed by Gelfond and Lobo for the specification of norms. We introduce an architecture and a prototype software system that can be used to simulate an agent's plans under different behavior modes that can later be changed by the controller. We envision such software to be useful to policy makers, as they can more readily understand how agents may act in certain situations based on the agents' attitudes towards norm-compliance. Policy makers may then refine their policies if simulations show unwanted consequences.

agent, artificial intelligence, behavior mode, (14 more...)

doi: 10.4204/EPTCS.416.7

2502.09215

Country:

North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Government (0.54)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Neural Information Processing SystemsJan-20-2025, 01:30:53 GMT

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations

annotation, lightweight tracking annotation, object reprojection error, (5 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceJun-25-2024

Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Muhamed, Aashiq, Li, Oscar, Woodruff, David, Diab, Mona, Smith, Virginia

Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .

gradient, matrix, rass, (12 more...)

2406.1766

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Suri, Anshuman, Zhang, Xiao, Evans, David

Do Parameters Reveal More than Loss for Membership Inference?

arXiv.org Artificial IntelligenceJun-17-2024

Membership inference attacks aim to infer whether an individual record was used to train a model, serving as a key tool for disclosure auditing. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide very tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for most useful settings such as stochastic gradient descent, and that optimal membership inference indeed requires white-box access. We validate our findings with a new white-box inference attack IHA (Inverse Hessian Attack) that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both audits and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership privacy auditing.

inference, membership inference, purchase-100, (13 more...)

2406.11544

Country:

North America > United States > Virginia (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceMay-24-2024

Persian Homograph Disambiguation: Leveraging ParsBERT for Enhanced Sentence Understanding with a Novel Word Disambiguation Dataset

Ayyoubzadeh, Seyed Moein

Homograph disambiguation, the task of distinguishing words with identical spellings but different meanings, poses a substantial challenge in natural language processing. In this study, we introduce a novel dataset tailored for Persian homograph disambiguation. Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method and their efficacy in downstream tasks like classification. Our investigation entails training a diverse array of lightweight machine learning and deep learning models for phonograph disambiguation. We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score, thereby gaining insights into their respective strengths and limitations. The outcomes of our research underscore three key contributions. First, we present a newly curated Persian dataset, providing a solid foundation for future research in homograph disambiguation. Second, our comparative analysis of embeddings highlights their utility in different contexts, enriching the understanding of their capabilities. Third, by training and evaluating a spectrum of models, we extend valuable guidance for practitioners in selecting suitable strategies for homograph disambiguation tasks. In summary, our study unveils a new dataset, scrutinizes embeddings through diverse perspectives, and benchmarks various models for homograph disambiguation. These findings empower researchers and practitioners to navigate the intricate landscape of homograph-related challenges effectively.

biguation, ho ograph, isa biguation, (13 more...)

2406.00028

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Qatar (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)