AITopics | reading level

Collaborating Authors

reading level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Chen, Jiali, Hei, Xusen, Xue, Yuqi, Wu, Zihan, Xie, Jiayuan, Cai, Yi

arXiv.org Artificial IntelligenceFeb-3-2025

Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible text for children. However, recent large language models (LLMs) overlook children's reading preferences (\ie, vivid character portrayals, concise narrative structures, and appropriate readability), which poses challenges in CLA. In this paper, we propose a method called InstructChild, which augments the LLM with these preferences for adaptation. Specifically, we first obtain the characters' personalities and narrative structure as additional information for fine-grained instruction tuning. Then, we devise a readability metric as the reward to align the LLM with the children's reading level. Finally, a lookahead decoding strategy is applied to improve the readability of the generated text during inference. To support the evaluation of CLA task, we construct the Classic4Children dataset, which comprises both the original and child-friendly versions of the Four Great Classical Novels of Chinese literature. Experimental results show that our InstructChild significantly improves automatic and human evaluation performance.

instructchild, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.0109

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini

McGee, Monnie, Sadler, Bivin

arXiv.org Artificial IntelligenceJan-15-2025

Many believe that use of generative AI as a private tutor has the potential to shrink access and achievement gaps between students and schools with abundant resources versus those with fewer resources. Shrinking the gap is possible only if paid and free versions of the platforms perform with the same accuracy. In this experiment, we investigate the performance of GPT versions 3.5, 4.0, and 4o-mini on the same 16-question statistics exam given to a class of first-year graduate students. While we do not advocate using any generative AI platform to complete an exam, the use of exam questions allows us to explore aspects of ChatGPT's responses to typical questions that students might encounter in a statistics course. Results on accuracy indicate that GPT 3.5 would fail the exam, GPT4 would perform well, and GPT4o-mini would perform somewhere in between. While we acknowledge the existence of other Generative AI/LLMs, our discussion concerns only ChatGPT because it is the most widely used platform on college campuses at this time. We further investigate differences among the AI platforms in the answers for each problem using methods developed for text analytics, such as reading level evaluation and topic modeling. Results indicate that GPT3.5 and 4o-mini have characteristics that are more similar than either of them have with GPT4.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.09171

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > Higher Education (1.00)
Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Vernacular? I Barely Know Her: Challenges with Style Control and Stereotyping

Aich, Ankit, Liu, Tingting, Giorgi, Salvatore, Isman, Kelsey, Ungar, Lyle, Curtis, Brenda

arXiv.org Artificial IntelligenceJun-18-2024

Large Language Models (LLMs) are increasingly being used in educational and learning applications. Research has demonstrated that controlling for style, to fit the needs of the learner, fosters increased understanding, promotes inclusion, and helps with knowledge distillation. To understand the capabilities and limitations of contemporary LLMs in style control, we evaluated five state-of-the-art models: GPT-3.5, GPT-4, GPT-4o, Llama-3, and Mistral-instruct-7B across two style control tasks. We observed significant inconsistencies in the first task, with model performances averaging between 5th and 8th grade reading levels for tasks intended for first-graders, and standard deviations up to 27.6. For our second task, we observed a statistically significant improvement in performance from 0.02 to 0.26. However, we find that even without stereotypes in reference texts, LLMs Figure 1: Overall view of this paper. We find that while often generated culturally insensitive content in-context learning can control for reading level and during their tasks. We provide a thorough analysis simplicity, it cannot do the same for vernacular English.

instruction, reading level, stereotype, (17 more...)

arXiv.org Artificial Intelligence

2406.12679

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)
Education > Educational Setting > K-12 Education > Primary School (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReadCtrl: Personalizing text generation with readability-controlled instruction learning

Tran, Hieu, Yao, Zonghai, Li, Lingxi, Yu, Hong

arXiv.org Artificial IntelligenceJun-13-2024

Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' readability levels. Unlike the traditional methods, which primarily focused on categorical readability adjustments typically classified as high, medium, and low or expert and layperson levels with limited success, ReadCtrl introduces a dynamic framework that enables LLMs to generate content at various (near continuous level) complexity levels, thereby enhancing their versatility across different applications. Our results show that the ReadCtrl-Mistral-7B models significantly outperformed strong baseline models such as GPT-4 and Claude-3, with a win rate of 52.1%:35.7% against GPT-4 in human evaluations. Furthermore, Read-Ctrl has shown significant improvements in automatic evaluations, as evidenced by better readability metrics (e.g., FOG, FKGL) and generation quality metrics (e.g., BLEU, SARI, SummaC-Factuality, UniEval-Consistency and Coherence). These results underscore Read-Ctrl's effectiveness and tenacity in producing high-quality, contextually appropriate outputs that closely align with targeted readability levels, marking a significant advancement in personalized content generation using LLMs.

evaluation, instruction, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2406.09205

Country:

North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Worcester County > Worcester (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives

Badrinath, Anirudhan, Agarwal, Prabhat, Xu, Jiajing

arXiv.org Artificial IntelligenceMay-29-2024

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to tune language models to easily maximize non-differentiable and non-binary objectives according to the LLM designer's preferences (e.g., using simpler language or minimizing specific kinds of harmful content). These may neither align with user preferences nor even be able to be captured tractably by binary preference data. To leverage the simplicity and performance of DPO with the generalizability of RL, we propose a hybrid approach between DPO and RLHF. With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards using offline RL. The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives, while preserving alignment performance across a range of challenging benchmarks and model sizes.

arxiv preprint arxiv, objective, preference optimization, (12 more...)

arXiv.org Artificial Intelligence

2405.17956

Country:

Europe > Greece > Attica > Athens (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
(2 more...)

Add feedback

Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation

Cripwell, Liam, Legrand, Joël, Gardent, Claire

arXiv.org Artificial IntelligenceApr-4-2024

Text simplification intends to make a text easier to read while preserving its core meaning. Intuitively and as shown in previous works, these two dimensions (simplification and meaning preservation) are often-times inversely correlated. An overly conservative text will fail to simplify sufficiently, whereas extreme simplification will degrade meaning preservation. Yet, popular evaluation metrics either aggregate meaning preservation and simplification into a single score (SARI, LENS), or target meaning preservation alone (BERTScore, QuestEval). Moreover, these metrics usually require a set of references and most previous work has only focused on sentence-level simplification. In this paper, we focus on the evaluation of document-level text simplification and compare existing models using distinct metrics for meaning preservation and simplification. We leverage existing metrics from similar tasks and introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation, seldom performing well on both dimensions. Making use of the fact that the metrics we use are all reference-less, we also investigate the performance of existing models when applied to unseen data (where reference simplifications are unavailable).

machine learning, natural language, simplification, (18 more...)

arXiv.org Artificial Intelligence

2404.03278

Country:

North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(19 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.94)

Add feedback

Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning

Rahman, Md Mushfiqur, Irbaz, Mohammad Sabik, North, Kai, Williams, Michelle S., Zampieri, Marcos, Lybarger, Kevin

arXiv.org Artificial IntelligenceJan-26-2024

Objective: The reading level of health educational materials significantly influences information understandability and accessibility, particularly for minoritized populations. Many patient educational resources surpass the reading level and complexity of widely accepted standards. There is a critical need for high-performing text simplification models in health information to enhance dissemination and literacy. This need is particularly acute in cancer education, where effective prevention and screening education can substantially reduce morbidity and mortality. Methods: We introduce Simplified Digestive Cancer (SimpleDC), a parallel corpus of cancer education materials tailored for health text simplification research. Utilizing SimpleDC alongside the existing Med-EASi corpus, we explore Large Language Model (LLM)-based simplification methods, including fine-tuning, reinforcement learning (RL), reinforcement learning with human feedback (RLHF), domain adaptation, and prompt-based approaches. Our experimentation encompasses Llama 2 and GPT-4. A novel RLHF reward function is introduced, featuring a lightweight model adept at distinguishing between original and simplified texts, thereby enhancing the model's effectiveness with unlabeled data. Results: Fine-tuned Llama 2 models demonstrated high performance across various metrics. Our innovative RLHF reward function surpassed existing RL text simplification reward functions in effectiveness. The results underscore that RL/RLHF can augment fine-tuning, facilitating model training on unlabeled text and improving performance. Additionally, these methods effectively adapt out-of-domain text simplification models to targeted domains.

computational linguistic, simplification, text simplification, (13 more...)

arXiv.org Artificial Intelligence

2401.15043

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

Gwilliam, Matthew, Cogswell, Michael, Ye, Meng, Sikka, Karan, Shrivastava, Abhinav, Divakaran, Ajay

arXiv.org Artificial IntelligenceNov-30-2023

Existing long video retrieval systems are trained and tested in the paragraph-to-video retrieval regime, where every long video is described by a single long paragraph. This neglects the richness and variety of possible valid descriptions of a video, which could be described in moment-by-moment detail, or in a single phrase summary, or anything in between. To provide a more thorough evaluation of the capabilities of long video retrieval systems, we propose a pipeline that leverages state-of-the-art large language models to carefully generate a diverse set of synthetic captions for long videos. We validate this pipeline's fidelity via rigorous human inspection. We then benchmark a representative set of video language models on these synthetic captions using a few long video datasets, showing that they struggle with the transformed data, especially the shortest captions. We also propose a lightweight fine-tuning method, where we use a contrastive loss to learn a hierarchical embedding loss based on the differing levels of information among the various captions. Our method improves performance both on the downstream paragraph-to-video retrieval task (+1.1% R@1 on ActivityNet), as well as for the various long video retrieval metrics we compute using our synthetic data (+3.6% R@1 for short descriptions on ActivityNet). For data access and other details, please refer to our project website at https://mgwillia.github.io/10k-words.

caption, simplification, video, (14 more...)

arXiv.org Artificial Intelligence

2312.00115

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

Amin, Kanhai S., Mayes, Linda, Khosla, Pavan, Doshi, Rushabh

arXiv.org Artificial IntelligenceNov-16-2023

Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.

health literacy, llm, reading level, (11 more...)

arXiv.org Artificial Intelligence

2311.10075

Country: North America > United States > Connecticut > New Haven County > New Haven (0.05)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting > K-12 Education > Middle School (0.76)
Education > Educational Setting > K-12 Education > Secondary School (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for Sentence Simplification

Cripwell, Liam, Legrand, Joël, Gardent, Claire

arXiv.org Artificial IntelligenceOct-12-2023

Automatic evaluation for sentence simplification remains a challenging problem. Most popular evaluation metrics require multiple high-quality references -- something not readily available for simplification -- which makes it difficult to test performance on unseen domains. Furthermore, most existing metrics conflate simplicity with correlated attributes such as fluency or meaning preservation. We propose a new learned evaluation metric (SLE) which focuses on simplicity, outperforming almost all existing metrics in terms of correlation with human judgements.

computational linguistic, simplicity, simplification, (13 more...)

arXiv.org Artificial Intelligence

2310.0817

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
North America > Dominican Republic (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback