Goto

Collaborating Authors

 Overview


Generative AI Application for Building Industry

arXiv.org Artificial Intelligence

This paper investigates the transformative potential of generative AI technologies, particularly large language models (LLMs), within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as energy code compliance, building design optimization, and workforce training. The research highlights how LLMs can automate labor-intensive processes, significantly improving efficiency, accuracy, and safety in building practices. The paper also addresses the challenges associated with interpreting complex visual and textual data in architectural plans and regulatory codes, proposing innovative solutions to enhance AI-driven compliance checking and design processes. Additionally, the study considers the broader implications of AI integration, including the development of AI-powered tools for comprehensive code compliance across various regulatory domains and the potential for AI to revolutionize workforce training through realistic simulations. This paper provides a comprehensive analysis of the current capabilities of generative AI in the building industry while outlining future directions for research and development, aiming to pave the way for smarter, more sustainable, and responsive construction practices.


From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

arXiv.org Artificial Intelligence

Since the onset of LLMs, translating natural language queries to structured SQL commands is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches, and how LLMs impacted this field. We discuss benchmarks, evaluation methods and evaluation metrics. Also, we uniquely study the role of integration of knowledge graphs for better contextual accuracy and schema linking in these systems. The current techniques fall into two categories: in-context learning of corpus and fine-tuning, which then leads to approaches such as zero-shot, few-shot learning from the end, and data augmentation. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy with perspectives toward their development and improvements in potential areas for future of LLM-based text-to-SQL system.


Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

arXiv.org Artificial Intelligence

Domain-Incremental Learning (DIL) involves the progressive adaptation of a model to new concepts across different domains. While recent advances in pre-trained models provide a solid foundation for DIL, learning new concepts often results in the catastrophic forgetting of pre-trained knowledge. Specifically, sequential model updates can overwrite both the representation and the classifier with knowledge from the latest domain. Thus, it is crucial to develop a representation and corresponding classifier that accommodate all seen domains throughout the learning process. To this end, we propose DUal ConsolidaTion (Duct) to unify and consolidate historical knowledge at both the representation and classifier levels. By merging the backbone of different stages, we create a representation space suitable for multiple domains incrementally. The merged representation serves as a balanced intermediary that captures task-specific features from all seen domains. Additionally, to address the mismatch between consolidated embeddings and the classifier, we introduce an extra classifier consolidation process. Leveraging class-wise semantic information, we estimate the classifier weights of old domains within the latest embedding space. By merging historical and estimated classifiers, we align them with the consolidated embedding space, facilitating incremental classification. Extensive experimental results on four benchmark datasets demonstrate Duct's state-of-the-art performance.


Review of blockchain application with Graph Neural Networks, Graph Convolutional Networks and Convolutional Neural Networks

arXiv.org Artificial Intelligence

This paper reviews the applications of Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs), and Convolutional Neural Networks (CNNs) in blockchain technology. As the complexity and adoption of blockchain networks continue to grow, traditional analytical methods are proving inadequate in capturing the intricate relationships and dynamic behaviors of decentralized systems. To address these limitations, deep learning models such as GNNs, GCNs, and CNNs offer robust solutions by leveraging the unique graph-based and temporal structures inherent in blockchain architectures. GNNs and GCNs, in particular, excel in modeling the relational data of blockchain nodes and transactions, making them ideal for applications such as fraud detection, transaction verification, and smart contract analysis. Meanwhile, CNNs can be adapted to analyze blockchain data when represented as structured matrices, revealing hidden temporal and spatial patterns in transaction flows. This paper explores how these models enhance the efficiency, security, and scalability of both linear blockchains and Directed Acyclic Graph (DAG)-based systems, providing a comprehensive overview of their strengths and future research directions. By integrating advanced neural network techniques, we aim to demonstrate the potential of these models in revolutionizing blockchain analytics, paving the way for more sophisticated decentralized applications and improved network performance.


Multimodal Coherent Explanation Generation of Robot Failures

arXiv.org Artificial Intelligence

The explainability of a robot's actions is crucial to its acceptance in social spaces. Explaining why a robot fails to complete a given task is particularly important for non-expert users to be aware of the robot's capabilities and limitations. So far, research on explaining robot failures has only considered generating textual explanations, even though several studies have shown the benefits of multimodal ones. However, a simple combination of multiple modalities may lead to semantic incoherence between the information across different modalities - a problem that is not well-studied. An incoherent multimodal explanation can be difficult to understand, and it may even become inconsistent with what the robot and the human observe and how they perform reasoning with the observations. Such inconsistencies may lead to wrong conclusions about the robot's capabilities. In this paper, we introduce an approach to generate coherent multimodal explanations by checking the logical coherence of explanations from different modalities, followed by refinements as required. We propose a classification approach for coherence assessment, where we evaluate if an explanation logically follows another. Our experiments suggest that fine-tuning a neural network that was pre-trained to recognize textual entailment, performs well for coherence assessment of multimodal explanations. Code & data: https://pradippramanick.github.io/coherent-explain/.


Developing Guidelines for Functionally-Grounded Evaluation of Explainable Artificial Intelligence using Tabular Data

arXiv.org Artificial Intelligence

Explainable Artificial Intelligence (XAI) techniques are used to provide transparency to complex, opaque predictive models. However, these techniques are often designed for image and text data, and it is unclear how fit-for-purpose they are when applied to tabular data. As XAI techniques are rarely evaluated in settings with tabular data, the applicability of existing evaluation criteria and methods are also unclear and needs (re-)examination. For example, some works suggest that evaluation methods may unduly influence the evaluation results when using tabular data. This lack of clarity on evaluation procedures can lead to reduced transparency and ineffective use of XAI techniques in real world settings. In this study, we examine literature on XAI evaluation to derive guidelines on functionally-grounded assessment of local, post hoc XAI techniques. We identify 20 evaluation criteria and associated evaluation methods, and derive guidelines on when and how each criterion should be evaluated. We also identify key research gaps to be addressed by future work. Our study contributes to the body of knowledge on XAI evaluation through in-depth examination of functionally-grounded XAI evaluation protocols, and has laid the groundwork for future research on XAI evaluation.


A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

arXiv.org Artificial Intelligence

However, their remarkable capabilities come with significant computational and memory demands. This has raised considerable challenges when deploying these models in scenarios with limited resources or high concurrency. To address these challenges, low-bit quantization has emerged as a pivotal approach for enhancing the efficiency and deployability of LLMs. Low-bit quantization involves the process of reducing the bit-width of tensors, which effectively decreases the memory footprint and computational requirements of LLMs. By compressing weights, activations, and gradients of LLMs with low-bit integer/binary representation, quantization can significantly accelerate inference and training and reduce storage requirements with acceptable accuracy. This efficiency is crucial for enabling advanced LLMs to be accessible on devices with constrained resources, thereby broadening their applicability. In this paper, we aim to provide a survey with a comprehensive overview of low-bit quantization for large language models (LLMs), encompassing the fundamental concepts, system implementations, and algorithmic approaches related to low-bit LLMs. Compared with the traditional models, LLMs, as the representative paradigm of the foundation model, always feature a vast number of parameters, which presents unique challenges for effective quantization. As depicted in Figure 1, Section 2 introduces the fundamentals of low-bit quantization of LLMs, including new low-bit data formats and quantization granularities specific to LLMs.


Modulation and Coding for NOMA and RSMA

arXiv.org Artificial Intelligence

Next-generation multiple access (NGMA) serves as an umbrella term for transmission schemes distinct from conventional orthogonal methods. A key candidate of NGMA, non-orthogonal multiple access (NOMA), emerges as a solution to enhance connectivity by allowing multiple users to share time, frequency, and space concurrently. However, NOMA faces challenges in implementation, particularly in canceling inter-user interference. In this paper, we discuss the principles behind NOMA and review conventional NOMA methods. Then, to address these challenges, we present asynchronous transmission and interference-aware modulation techniques, enabling decoding without successive interference cancellation. The goal is to design constellations that dynamically adapt to interference, minimizing bit error rates (BERs) and enhancing user throughput in the presence of inter-user, inter-carrier, and inter-cell interference. The traditional link between minimizing BER and increasing spectral efficiency is explored, with deep autoencoders for end-to-end communication emerging as a potential solution to improve BERs. Interference-aware modulation can revolutionize constellation design for non-orthogonal channels. Rate-splitting multiple access (RSMA) is another promising interference management technique in multi-user systems. In addition to addressing challenges in finite-alphabet NOMA, this paper offers new insights and provides an overview of code-domain NOMA, trellis-coded NOMA, and RSMA as key NGMA candidates. We also discuss the evolution of channel coding toward low-latency communication and examine modulation and coding schemes in 5G networks. Finally, we highlight future research directions, emphasizing their importance for realizing NOMA from concept to functional technology.


Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques

arXiv.org Artificial Intelligence

This research investigates the classification of Environmental, Social, and Governance (ESG) information within textual disclosures. The aim is to develop and evaluate binary classification models capable of accurately identifying and categorizing E, S and G-related content respectively. The motivation for this research stems from the growing importance of ESG considerations in investment decisions and corporate accountability. Accurate and efficient classification of ESG information is crucial for stakeholders to understand the impact of companies on sustainability and to make informed decisions. The research uses a quantitative approach involving data collection, data preprocessing, and the development of ESG-focused Large Language Models (LLMs) and traditional machine learning (Support Vector Machines, XGBoost) classifiers. Performance evaluation guides iterative refinement until satisfactory metrics are achieved. The research compares traditional machine learning techniques (Support Vector Machines, XGBoost), state-of-the-art language model (FinBERT-ESG) and fine-tuned LLMs like Llama 2, by employing standard Natural Language Processing performance metrics such as accuracy, precision, recall, F1-score. A novel fine-tuning method, Qlora, is applied to LLMs, resulting in significant performance improvements across all ESG domains. The research also develops domain-specific fine-tuned models, such as EnvLlama 2-Qlora, SocLlama 2-Qlora, and GovLlama 2-Qlora, which demonstrate impressive results in ESG text classification.


An Overview of the Burer-Monteiro Method for Certifiable Robot Perception

arXiv.org Artificial Intelligence

This paper presents an overview of the Burer-Monteiro method (BM), a technique that has been applied to solve robot perception problems to certifiable optimality in real-time. BM is often used to solve semidefinite programming relaxations, which can be used to perform global optimization for non-convex perception problems. Specifically, BM leverages the low-rank structure of typical semidefinite programs to dramatically reduce the computational cost of performing optimization. This paper discusses BM in certifiable perception, with three main objectives: (i) to consolidate information from the literature into a unified presentation, (ii) to elucidate the role of the linear independence constraint qualification (LICQ), a concept not yet well-covered in certifiable perception literature, and (iii) to share practical considerations that are discussed among practitioners but not thoroughly covered in the literature. Our general aim is to offer a practical primer for applying BM towards certifiable perception.