AITopics

doi: 10.1109/LRA.2024.3415931

2406.14097

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceJul-1-2024

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Ma, Bolei, Wang, Xinpeng, Hu, Tiancheng, Haensch, Anna-Carolina, Hedderich, Michael A., Plank, Barbara, Kreuter, Frauke

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may have. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOV). However, measuring AOV embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing an overview of recent works on the evaluation of AOV in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOV in LLMs.

computational linguistic, linguistic, preprint, (17 more...)

2406.11096

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
(22 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)

Industry:

Government (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-1-2024

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Nguyen, Thong, Bin, Yi, Xiao, Junbin, Qu, Leigang, Li, Yicong, Wu, Jay Zhangjie, Nguyen, Cong-Duy, Ng, See-Kiong, Tuan, Luu Anh

Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with temporal dynamics. In this survey, we review the key tasks of these systems and highlight the associated challenges. Based on the challenges, we summarize their methods from model architecture, model training, and data perspectives. We also conduct performance comparison among the methods, and discuss promising directions for future research.

arxiv preprint arxiv, proceedings, video, (11 more...)

2406.05615

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
(5 more...)

Genre: Overview (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fornace, Mark, Lindsey, Michael

Column and row subset selection using nuclear scores: algorithms and theory for Nystr\"{o}m approximation, CUR decomposition, and graph Laplacian reduction

arXiv.org Machine LearningJul-1-2024

Column selection is an essential tool for structure-preserving low-rank approximation, with wide-ranging applications across many fields, such as data science, machine learning, and theoretical chemistry. In this work, we develop unified methodologies for fast, efficient, and theoretically guaranteed column selection. First we derive and implement a sparsity-exploiting deterministic algorithm applicable to tasks including kernel approximation and CUR decomposition. Next, we develop a matrix-free formalism relying on a randomization scheme satisfying guaranteed concentration bounds, applying this construction both to CUR decomposition and to the approximation of matrix functions of graph Laplacians. Importantly, the randomization is only relevant for the computation of the scores that we use for column selection, not the selection itself given these scores. For both deterministic and matrix-free algorithms, we bound the performance favorably relative to the expected performance of determinantal point process (DPP) sampling and, in select scenarios, that of exactly optimal subset selection. The general case requires new analysis of the DPP expectation. Finally, we demonstrate strong real-world performance of our algorithms on a diverse set of example approximation tasks.

algorithm, maximization, selection, (15 more...)

arXiv.org Machine Learning

2407.01698

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre:

Overview (0.67)
Research Report (0.64)

Industry: Energy (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Software (0.92)

Tschalzev, Andrej, Nitschke, Paul, Kirchdorfer, Lukas, Lüdtke, Stefan, Bartelt, Christian, Stuckenschmidt, Heiner

Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods

arXiv.org Machine LearningJul-1-2024

Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' have been proposed to improve generalization and interpretability for clustered data. However, existing methods only allow for approximate quantification of cluster effects and are limited to regression and binary targets with only one clustering feature. We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks. We empirically demonstrate that MC-GMENN outperforms existing mixed effects deep learning models in terms of generalization performance, time complexity, and quantification of inter-cluster variance. Additionally, MC-GMENN is applicable to a wide range of datasets, including multi-class classification tasks with multiple high-cardinality categorical features. For these datasets, we show that MC-GMENN outperforms conventional encoding and embedding methods, simultaneously offering a principled methodology for interpreting the effects of clustering patterns.

dataset, mc-gmenn, random effect, (15 more...)

arXiv.org Machine Learning

2407.01115

Country:

Europe > Germany (0.04)
North America > United States > Minnesota (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJul-1-2024

Bayesian Entropy Neural Networks for Physics-Aware Prediction

Rathnakumar, Rahul, Huang, Jiayu, Yan, Hao, Liu, Yongming

This paper addresses the need for deep learning models to integrate well-defined constraints into their outputs, driven by their application in surrogate models, learning with limited data and partial information, and scenarios requiring flexible model behavior to incorporate non-data sample information. We introduce Bayesian Entropy Neural Networks (BENN), a framework grounded in Maximum Entropy (MaxEnt) principles, designed to impose constraints on Bayesian Neural Network (BNN) predictions. BENN is capable of constraining not only the predicted values but also their derivatives and variances, ensuring a more robust and reliable model output. To achieve simultaneous uncertainty quantification and constraint satisfaction, we employ the method of multipliers approach. This allows for the concurrent estimation of neural network parameters and the Lagrangian multipliers associated with the constraints. Our experiments, spanning diverse applications such as beam deflection modeling and microstructure generation, demonstrate the effectiveness of BENN. The results highlight significant improvements over traditional BNNs and showcase competitive performance relative to contemporary constrained deep learning methods.

bayesian entropy neural network, constraint, neural network, (11 more...)

arXiv.org Machine Learning

2407.01015

Country: North America > United States > Arizona (0.04)

Genre:

Research Report (0.82)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Zollicoffer, Geigh, Bhatta, Kshitij, Bhattarai, Manish, Romero, Phil, Negre, Christian F. A., Niklasson, Anders M. N., Adedoyin, Adetokunbo

Towards Faster Matrix Diagonalization with Graph Isomorphism Networks and the AlphaZero Framework

In this paper, we introduce innovative approaches for accelerating the Jacobi method for matrix diagonalization, specifically through the formulation of large matrix diagonalization as a Semi-Markov Decision Process and small matrix diagonalization as a Markov Decision Process. Furthermore, we examine the potential of utilizing scalable architecture between different-sized matrices. During a short training period, our method discovered a significant reduction in the number of steps required for diagonalization and exhibited efficient inference capabilities. Importantly, this approach demonstrated possible scalability to large-sized matrices, indicating its potential for wide-ranging applicability. Upon training completion, we obtain action-state probabilities and transition graphs, which depict transitions between different states. These outputs not only provide insights into the diagonalization process but also pave the way for cost savings pertinent to large-scale matrices. The advancements made in this research enhance the efficacy and scalability of matrix diagonalization, pushing for new possibilities for deployment in practical applications in scientific and engineering domains.

algorithm, matrix, rotation, (13 more...)

2407.00779

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Zhao, Hong, Wei-Kocsis, Jin, Akhijahani, Adel Heidari, Butler-Purry, Karen L

Exploring a Physics-Informed Decision Transformer for Distribution System Restoration: Methodology and Performance Analysis

Driven by advancements in sensing and computing, deep reinforcement learning (DRL)-based methods have demonstrated significant potential in effectively tackling distribution system restoration (DSR) challenges under uncertain operational scenarios. However, the data-intensive nature of DRL poses obstacles in achieving satisfactory DSR solutions for large-scale, complex distribution systems. Inspired by the transformative impact of emerging foundation models, including large language models (LLMs), across various domains, this paper explores an innovative approach harnessing LLMs' powerful computing capabilities to address scalability challenges inherent in conventional DRL methods for solving DSR. To our knowledge, this study represents the first exploration of foundation models, including LLMs, in revolutionizing conventional DRL applications in power system operations. Our contributions are twofold: 1) introducing a novel LLM-powered Physics-Informed Decision Transformer (PIDT) framework that leverages LLMs to transform conventional DRL methods for DSR operations, and 2) conducting comparative studies to assess the performance of the proposed LLM-powered PIDT framework at its initial development stage for solving DSR problems. While our primary focus in this paper is on DSR operations, the proposed PIDT framework can be generalized to optimize sequential decision-making across various power system operations.

distribution system, dsr operation, opération, (15 more...)

2407.00808

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Large Language Model Enhanced Knowledge Representation Learning: A Survey

Wang, Xin, Chen, Zirui, Wang, Haofen, U, Leong Hou, Li, Zhao, Guo, Wenbin

arxiv preprint arxiv, knowledge graph, model enhanced knowledge representation learning, (6 more...)

Large language models (LLMs) (e.g., BERT [18], LLaMA [59]) which represents a direction of ever-increasing model sizes pre-trained on larger corpora, have demonstrated powerful capabilities in solving natural language processing (NLP) tasks, including question answering [99], text generation [100] and document understanding [101]. There are no clear and static thresholds regarding the model sizes. Early LLMs (e.g., BERT, RoBERTa) adopt an encoder architecture and show capabilities in text representation learning and natural language understanding. In recent years, more focus has been given to larger encoder-decoder [102] or decoder-only [103] architectures. As the model size scales up, such LLMs have also shown reasoning ability and even more advanced emergent ability [104], exposing a strong potential for Artificial General Intelligence (AGI). This inflection point, with the arrival of LLMs, marks a paradigm shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. As a popular approach for explicit knowledge representation, KGs are now widely investigated for the combination with Transformer-based LLMs, including pretrained masked language models (PLMs) like BERT and RoBERTa, and more recent generative LLMs like the GPT series and LLaMA. Some works use LLMs to augment knowledge graph representation learning.

2407.00936

Country:

Asia > Macao (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Diffusion Models and Representation Learning: A Survey

Fuest, Michael, Ma, Pingchuan, Gui, Ming, Fischer, Johannes S., Hu, Vincent Tao, Ommer, Bjorn

Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models' essential aspects, including mathematical foundations, popular denoising network architectures, and guidance methods. Various approaches related to diffusion models and representation learning are detailed. These include frameworks that leverage representations learned from pre-trained diffusion models for subsequent recognition tasks and methods that utilize advancements in representation and self-supervised learning to enhance diffusion models. This survey aims to offer a comprehensive overview of the taxonomy between diffusion models and representation learning, identifying key areas of existing concerns and potential exploration. Github link: https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy

arxiv, diffusion model, representation, (13 more...)

2407.00783

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)