AITopics

doi: 10.1088/1742-6596/1924/1/012003

2305.12775

Country: Europe > Germany (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Information Technology (0.48)
Automobiles & Trucks (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-22-2023

A Survey of Explainable Graph Neural Networks: Taxonomy and Evaluation Metrics

Li, Yiqiao, Zhou, Jianlong, Verma, Sunny, Chen, Fang

Graph neural networks (GNNs) have demonstrated a significant boost in prediction performance on graph data. At the same time, the predictions made by these models are often hard to interpret. In that regard, many efforts have been made to explain the prediction mechanisms of these models from perspectives such as GNNExplainer, XGNN and PGExplainer. Although such works present systematic frameworks to interpret GNNs, a holistic review for explainable GNNs is unavailable. In this survey, we present a comprehensive review of explainability techniques developed for GNNs. We focus on explainable graph neural networks and categorize them based on the use of explainable methods. We further provide the common performance metrics for GNNs explanations and point out several future research directions.

artificial intelligence, explainable graph neural network, machine learning, (1 more...)

2207.12599

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

Discovering Causal Relations and Equations from Data

Camps-Valls, Gustau, Gerhardus, Andreas, Ninad, Urmi, Varando, Gherardo, Martius, Georg, Balaguer-Ballester, Emili, Vinuesa, Ricardo, Diaz, Emiliano, Zanna, Laure, Runge, Jakob

Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.

data mining, evolutionary algorithm, machine learning, (24 more...)

2305.13341

Country:

Europe > United Kingdom > England (0.67)
North America > United States > California (0.45)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
(10 more...)

Yenduri, Gokul, M, Ramalingam, G, Chemmalar Selvi, Y, Supriya, Srivastava, Gautam, Maddikunta, Praveen Kumar Reddy, G, Deepti Raj, Jhaveri, Rutvij H, B, Prabadevi, Wang, Weizheng, Vasilakos, Athanasios V., Gadekallu, Thippa Reddy

Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions

The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward the development of machines that can understand and communicate using language in a manner that closely resembles that of humans. GPT is based on the transformer architecture, a deep neural network designed for natural language processing tasks. Due to their impressive performance on natural language processing tasks and ability to effectively converse, GPT have gained significant popularity among researchers and industrial communities, making them one of the most widely used and effective models in natural language processing and related fields, which motivated to conduct this review. This review provides a detailed overview of the GPT, including its architecture, working process, training procedures, enabling technologies, and its impact on various applications. In this review, we also explored the potential challenges and limitations of a GPT. Furthermore, we discuss potential solutions and future directions. Overall, this paper aims to provide a comprehensive understanding of GPT, enabling technologies, their impact on various applications, emerging challenges, and potential solutions.

large language model, machine learning, natural language, (18 more...)

2305.10435

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)
(9 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Media > News (1.00)
Media > Film (1.00)
(24 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

Ignat, Oana, Jin, Zhijing, Abzaliev, Artem, Biester, Laura, Castro, Santiago, Deng, Naihao, Gao, Xinyi, Gunal, Aylin, He, Jacky, Kazemi, Ashkan, Khalifa, Muhammad, Koh, Namho, Lee, Andrew, Liu, Siyang, Min, Do June, Mori, Shinka, Nwatu, Joan, Perez-Rosas, Veronica, Shen, Siqi, Wang, Zekun, Wu, Winston, Mihalcea, Rada

Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab. While we identify many research areas, many others exist; we do not cover those areas that are currently addressed by LLMs but where LLMs lag behind in performance, or those focused on LLM development. We welcome suggestions for other research directions to include: https://bit.ly/nlp-era-llm

computational linguistic, large language model, machine learning, (17 more...)

2305.12544

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(34 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
Information Technology > Security & Privacy (0.92)
Media (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hanif, Muhammad Abdullah, Shafique, Muhammad

FAQ: Mitigating the Impact of Faults in the Weight Memory of DNN Accelerators through Fault-Aware Quantization

Permanent faults induced due to imperfections in the manufacturing process of Deep Neural Network (DNN) accelerators are a major concern, as they negatively impact the manufacturing yield of the chip fabrication process. Fault-aware training is the state-of-the-art approach for mitigating such faults. However, it incurs huge retraining overheads, specifically when used for large DNNs trained on complex datasets. To address this issue, we propose a novel Fault-Aware Quantization (FAQ) technique for mitigating the effects of stuck-at permanent faults in the on-chip weight memory of DNN accelerators at a negligible overhead cost compared to fault-aware retraining while offering comparable accuracy results. We propose a lookup table-based algorithm to achieve ultra-low model conversion time. We present extensive evaluation of the proposed approach using five different DNNs, i.e., ResNet-18, VGG11, VGG16, AlexNet and MobileNetV2, and three different datasets, i.e., CIFAR-10, CIFAR-100 and ImageNet. The results demonstrate that FAQ helps in maintaining the baseline accuracy of the DNNs at low and moderate fault rates without involving costly fault-aware training. For example, for ResNet-18 trained on the CIFAR-10 dataset, at 0.04 fault rate FAQ offers (on average) an increase of 76.38% in accuracy. Similarly, for VGG11 trained on the CIFAR-10 dataset, at 0.04 fault rate FAQ offers (on average) an increase of 70.47% in accuracy. The results also show that FAQ incurs negligible overheads, i.e., less than 5% of the time required to run 1 epoch of retraining. We additionally demonstrate the efficacy of our technique when used in conjunction with fault-aware retraining and show that the use of FAQ inside fault-aware retraining enables fast accuracy recovery.

accelerator, artificial intelligence, machine learning, (17 more...)

2305.1259

Country:

Europe (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > New York (0.04)

Genre:

Overview (1.00)
Frequently Asked Questions (FAQ) (1.00)
Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

A Deeper (Autoregressive) Approach to Non-Convergent Discourse Parsing

Tulpan, Yoav, Tsur, Oren

Online social platforms provide a bustling arena for information-sharing and for multi-party discussions. Various frameworks for dialogic discourse parsing were developed and used for the processing of discussions and for predicting the productivity of a dialogue. However, most of these frameworks are not suitable for the analysis of contentious discussions that are commonplace in many online platforms. A novel multi-label scheme for contentious dialog parsing was recently introduced by Zakharov et al. (2021). While the schema is well developed, the computational approach they provide is both naive and inefficient, as a different model (architecture) using a different representation of the input, is trained for each of the 31 tags in the annotation scheme. Moreover, all their models assume full knowledge of label collocations and context, which is unlikely in any realistic setting. In this work, we present a unified model for Non-Convergent Discourse Parsing that does not require any additional input other than the previous dialog utterances. We fine-tuned a RoBERTa backbone, combining embeddings of the utterance, the context and the labels through GRN layers and an asymmetric loss function. Overall, our model achieves results comparable with SOTA, without using label collocation and without training a unique architecture/model for each label.

machine learning, natural language, utterance, (20 more...)

2305.1251

Country:

North America > United States > Maine (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.46)
Media > News (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Margatina, Katerina, Aletras, Nikolaos

On the Limitations of Simulating Active Learning

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question ``why do active learning algorithms sometimes fail to outperform random sampling?''. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.

computational linguistic, large language model, machine learning, (18 more...)

2305.13342

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)
(17 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

arXiv.org Artificial IntelligenceMay-20-2023

SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News

Sinha, Ankur, Kedas, Satishwar, Kumar, Rishu, Malo, Pekka

Fine-grained financial sentiment analysis on news headlines is a challenging task requiring human-annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human-annotated dataset of 10,753 news headlines with entity-sentiment annotations, of which 2,847 headlines contain multiple entities, often with conflicting sentiments. We augment our dataset with a database of over 1,000 financial entities and their various representations in news media amounting to over 5,000 phrases. We propose a framework that enables the extraction of entity-relevant sentiments using a feature-based approach rather than an expression-based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon-based and pre-trained sentence representations and five classification approaches. Our experiments indicate that lexicon-based n-gram ensembles are above par with pre-trained word embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain-specific BERT) achieve the highest average accuracy of 94.29% and F1-score of 93.27%. Further, using over 210,000 entity-sentiment predictions, we validate the economic effect of sentiments on aggregate market movements over a long duration.

machine learning, natural language, sentiment, (21 more...)

doi: 10.1002/asi.24634

2305.12257

Country:

Asia > India (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media > News (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-20-2023

Game-Theoretical Analysis of Reviewer Rewards in Peer-Review Journal Systems: Analysis and Experimental Evaluation using Deep Reinforcement Learning

Lee, Minhyeok

In this paper, we navigate the intricate domain of reviewer rewards in open-access academic publishing, leveraging the precision of mathematics and the strategic acumen of game theory. We conceptualize the prevailing voucher-based reviewer reward system as a two-player game, subsequently identifying potential shortcomings that may incline reviewers towards binary decisions. To address this issue, we propose and mathematically formalize an alternative reward system with the objective of mitigating this bias and promoting more comprehensive reviews. We engage in a detailed investigation of the properties and outcomes of both systems, employing rigorous game-theoretical analysis and deep reinforcement learning simulations. Our results underscore a noteworthy divergence between the two systems, with our proposed system demonstrating a more balanced decision distribution and enhanced stability. This research not only augments the mathematical understanding of reviewer reward systems, but it also provides valuable insights for the formulation of policies within journal review system. Our contribution to the mathematical community lies in providing a game-theoretical perspective to a real-world problem and in the application of deep reinforcement learning to simulate and understand this complex system.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2305.12088

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)