AITopics

This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to learn more about the interrater reliability, consistency of the ratings across iterations and the correlation between ratings in terms of content and style. The results revealed a high interrater reliability with ICC scores ranging between 0.94 and 0.99 for different timespans, suggesting that GPT-4 is capable of generating consistent ratings across repetitions with a clear prompt. Style and content ratings show a high correlation of 0.87. When applying a non-adequate style the average content ratings remained constant, while style ratings decreased, which indicates that the large language model (LLM) effectively distinguishes between these two criteria during evaluation. The prompt used in this study is furthermore presented and explained. Further research is necessary to assess the robustness and reliability of AI models in various use cases.

large language model, machine learning, natural language, (21 more...)

2308.02575

Country:

Europe > Germany (0.05)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry:

Education > Educational Technology > Educational Software (0.68)
Education > Educational Setting (0.50)
Education > Assessment & Standards > Student Performance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning

Duan, Keyu, Liu, Qian, Chua, Tat-Seng, Yan, Shuicheng, Ooi, Wei Tsang, Xie, Qizhe, He, Junxian

Textual graphs (TGs) are graphs whose nodes correspond to text (sentences or documents), which are widely prevalent. The representation learning of TGs involves two stages: (i) unsupervised feature extraction and (ii) supervised graph representation learning. In recent years, extensive efforts have been devoted to the latter stage, where Graph Neural Networks (GNNs) have dominated. However, the former stage for most existing graph benchmarks still relies on traditional feature engineering techniques. More recently, with the rapid development of language models (LMs), researchers have focused on leveraging LMs to facilitate the learning of TGs, either by jointly training them in a computationally intensive framework (merging the two stages), or designing complex self-supervised training tasks for feature extraction (enhancing the first stage). In this work, we present SimTeG, a frustratingly Simple approach for Textual Graph learning that does not innovate in frameworks, models, and tasks. Instead, we first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task, such as node classification. We then generate node embeddings using the last hidden states of finetuned LM. These derived features can be further utilized by any GNN for training on the same task. We evaluate our approach on two fundamental graph representation learning tasks: node classification and link prediction. Through extensive experiments, we show that our approach significantly improves the performance of various GNNs on multiple graph benchmarks.

large language model, machine learning, natural language, (16 more...)

2308.02565

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Singapore (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Gaur, Vedant, Saunshi, Nikunj

Reasoning in Large Language Models Through Symbolic Math Word Problems

Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data. Despite their versatile abilities, the larger question of their ability to reason remains ill-understood. This paper addresses reasoning in math word problems (MWPs) by studying symbolic versions of the numeric problems, since a symbolic expression is a "concise explanation" of the numeric answer. We create and use a symbolic version of the SVAMP dataset and find that GPT-3's davinci-002 model also has good zero-shot accuracy on symbolic MWPs. To evaluate the faithfulness of the model's reasoning, we go beyond accuracy and additionally evaluate the alignment between the final answer and the outputted reasoning, which correspond to numeric and symbolic answers respectively for MWPs. We explore a self-prompting approach to encourage the symbolic reasoning to align with the numeric answer, thus equipping the LLM with the ability to provide a concise and verifiable reasoning and making it more interpretable. Surprisingly, self-prompting also improves the symbolic accuracy to be higher than both the numeric and symbolic accuracies, thus providing an ensembling effect. The SVAMP_Sym dataset will be released for future research on symbolic math problems.

accuracy, large language model, machine learning, (21 more...)

2308.01906

Country:

Europe > Spain > Aragón (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Wider and Deeper LLM Networks are Fairer LLM Evaluators

Zhang, Xinghua, Yu, Bowen, Yu, Haiyang, Lv, Yangyu, Liu, Tingwen, Huang, Fei, Xu, Hongbo, Li, Yongbin

Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neurons, with each neuron being the same LLM. In this paper, we draw upon the extensive research on deep neural networks to explore whether deeper and wider networks can lead to fairer evaluations. Specifically, inspired by the observation that different neurons in a neural network are responsible for detecting different concepts, we first adaptively generate as many neuron roles as possible for each evaluation sample. Each perspective corresponds to the role of a specific LLM neuron in the first layer. In subsequent layers, we follow the idea that higher layers in deep networks are responsible for more comprehensive features, each layer receives representations from all neurons in the previous layer, integrating the locally learned evaluation information to obtain a more comprehensive evaluation result. Interestingly, this network design resembles the process of academic paper reviewing. To validate the effectiveness of our method, we construct the largest and most diverse English evaluation benchmark LLMEval$^2$ for LLM evaluators, comprising 15 tasks, 8 abilities, and 2,553 samples. Experimental results demonstrate that a wider network (involving many reviewers) with 2 layers (one round of discussion) performs the best, improving kappa correlation coefficient from 0.28 to 0.34. We also leverage WideDeep to aid in the assessment of Chinese LLMs, which has accelerated the evaluation time by 4.6 times, resulting in a 60% cost saving. WideDeep achieves a remarkable 93% agreement level among humans.

large language model, machine learning, natural language, (17 more...)

2308.01862

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

XNLP: An Interactive Demonstration System for Universal Structured NLP

Fei, Hao, Zhang, Meishan, Zhang, Min, Chua, Tat-Seng

Structured Natural Language Processing (XNLP) is an important subset of NLP that entails understanding the underlying semantic or syntactic structure of texts, which serves as a foundational component for many downstream applications. Despite certain recent efforts to explore universal solutions for specific categories of XNLP tasks, a comprehensive and effective approach for unifying all XNLP tasks long remains underdeveloped. In the meanwhile, while XNLP demonstration systems are vital for researchers exploring various XNLP tasks, existing platforms can be limited to, e.g., supporting few XNLP tasks, lacking interactivity and universalness. To this end, we propose an advanced XNLP demonstration platform, where we propose leveraging LLM to achieve universal XNLP, with one model for all with high generalizability. Overall, our system advances in multiple aspects, including universal XNLP modeling, high performance, interpretability, scalability, and interactivity, providing a unified platform for exploring diverse XNLP tasks in the community. XNLP is online: https://xnlp.haofei.vip

large language model, machine learning, natural language, (17 more...)

2308.01846

Country:

Asia > Singapore (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Galatzer-Levy, Isaac R., McDuff, Daniel, Natarajan, Vivek, Karthikesalingam, Alan, Malgaroli, Matteo

The Capability of Large Language Models to Measure Psychiatric Functioning

The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.

disorder, large language model, machine learning, (19 more...)

2308.01834

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Jain, Ayush, Padmanaban, Manikandan, Hazra, Jagabondhu, Godbole, Shantanu, Weldemariam, Kommy

Supply chain emission estimation using large language models

Unfortunately, the world remains off track in meeting Development Goals (SDGs), especially goal 13, which focuses the Paris Agreement's target of limiting the temperature rise to on combating climate change and its impacts. To mitigate the effects 1.5 C above pre-industrial levels and reaching net-zero emissions of climate change, reducing enterprise Scope 3 (supply chain by 2050 [14], with a projected temperature rise of around 2.7 C emissions) is vital, as it accounts for more than 90% of total emission above pre-industrial levels by 2100 [22]. To achieve these targets, inventories. However, tracking Scope 3 emissions proves challenging, it is critical to engage non-state actors like enterprises, who have as data must be collected from thousands of upstream and pledged to reduce their GHG emissions, and have significant potential downstream suppliers. To address the above mentioned challenges, to drive more ambitious actions towards climate targets than we propose a first-of-a-kind framework that uses domain-adapted governments [9]. However, a lack of high-quality data and insights NLP foundation models to estimate Scope 3 emissions, by utilizing about an enterprise's operational performance can create barriers to financial transactions as a proxy for purchased goods and services.

large language model, machine learning, natural language, (18 more...)

2308.01741

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.05)
Asia > India (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law > Environmental Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Ambient Adventures: Teaching ChatGPT on Developing Complex Stories

Chen, Zexin, Zhou, Eric, Eaton, Kenneth, Peng, Xiangyu, Riedl, Mark

Imaginative play is an area of creativity that could allow robots to engage with the world around them in a much more personified way. Imaginary play can be seen as taking real objects and locations and using them as imaginary objects and locations in virtual scenarios. We adopted the story generation capability of large language models (LLMs) to obtain the stories used for imaginary play with human-written prompts. Those generated stories will be simplified and mapped into action sequences that can guide the agent in imaginary play. To evaluate whether the agent can successfully finish the imaginary play, we also designed a text adventure game to simulate a house as the playground for the agent to interact.

large language model, machine learning, natural language, (20 more...)

2308.01734

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Bumgardner, V. K. Cody, Mullen, Aaron, Armstrong, Sam, Hickey, Caylin, Talbert, Jeff

Local Large Language Models for Complex Structured Medical Tasks

This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex, domain-specific tasks. Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local LLMs, which can be fine-tuned to respond to specific generative instructions and provide structured outputs. The authors collected a dataset of over 150k uncurated surgical pathology reports, containing gross descriptions, final diagnoses, and condition codes. They trained different model architectures, including LLaMA, BERT and LongFormer and evaluated their performance. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics, even with extremely reduced precision. The LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform domain-specific tasks using accessible hardware, with potential applications in the medical domain, where complex data extraction and classification are required.

large language model, machine learning, natural language, (18 more...)

2308.01727

Country:

North America > United States > Kentucky > Fayette County > Lexington (0.14)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

Fins, Ivo S., Davies, Heather, Farrell, Sean, Torres, Jose R., Pinchbeck, Gina, Radford, Alan D., Noble, Peter-John

Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large language models create diverse opportunities and, whilst complex, present an intuitive interface to information but require careful implementation to avoid unpredictable errors.

large language model, machine learning, natural language, (19 more...)

2308.01666

Country:

Europe > Spain > Canary Islands > Gran Canaria > Las Palmas de Gran Canaria (0.14)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Europe > United Kingdom > England > Durham > Durham (0.04)
Europe > Spain > Balearic Islands > Mallorca > Palma (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)