text mining
Recommender systems and reinforcement learning for human-building interaction and context-aware support: A text mining-driven review of scientific literature
Zhang, Wenhao, Quintana, Matias, Miller, Clayton
The indoor environment significantly impacts human health and well-being; enhancing health and reducing energy consumption in these settings is a central research focus. With the advancement of Information and Communication Technology (ICT), recommendation systems and reinforcement learning (RL) have emerged as promising approaches to induce behavioral changes to improve the indoor environment and energy efficiency of buildings. This study aims to employ text mining and Natural Language Processing (NLP) techniques to thoroughly examine the connections among these approaches in the context of human-building interaction and occupant context-aware support. The study analyzed 27,595 articles from the ScienceDirect database, revealing extensive use of recommendation systems and RL for space optimization, location recommendations, and personalized control suggestions. Furthermore, this review underscores the vast potential for expanding recommender systems and RL applications in buildings and indoor environments. Fields ripe for innovation include predictive maintenance, building-related product recommendation, and optimization of environments tailored for specific needs, such as sleep and productivity enhancements based on user feedback. The study also notes the limitations of the method in capturing subtle academic nuances. Future improvements could involve integrating and fine-tuning pre-trained language models to better interpret complex texts.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.34)
- Information Technology (1.00)
- Health & Medicine > Consumer Health (1.00)
- Energy > Power Industry (1.00)
- (3 more...)
Applying Text Mining to Analyze Human Question Asking in Creativity Research
Wróblewska, Anna, Korbin, Marceli, Kenett, Yoed N., Dan, Daniel, Ganzha, Maria, Paprzycki, Marcin
Creativity relates to the ability to generate novel and effective ideas in the areas of interest. How are such creative ideas generated? One possible mechanism that supports creative ideation and is gaining increased empirical attention is by asking questions. Question asking is a likely cognitive mechanism that allows defining problems, facilitating creative problem solving. However, much is unknown about the exact role of questions in creativity. This work presents an attempt to apply text mining methods to measure the cognitive potential of questions, taking into account, among others, (a) question type, (b) question complexity, and (c) the content of the answer. This contribution summarizes the history of question mining as a part of creativity research, along with the natural language processing methods deemed useful or helpful in the study. In addition, a novel approach is proposed, implemented, and applied to five datasets. The experimental results obtained are comprehensively analyzed, suggesting that natural language processing has a role to play in creative research.
- Europe > Austria > Vienna (0.14)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (10 more...)
- Health & Medicine (0.93)
- Education > Educational Setting (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Data Science > Data Mining > Text Mining (0.62)
Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining
Lee, Jaewoong, Woo, Junhee, Kim, Sejin, Paulina, Cinthya, Park, Hyunmin, Kim, Hee-Tak, Park, Steve, Kim, Jihan
These authors contributed equally: J. Lee, J. Woo *: Corresponding author Corresponding author Email: Jihankim@kaist.ac.kr (Jihan Kim), stevepark@kaist.ac.kr (Steve Park), heetak.kim@kaist.ac.kr (Hee-Tak Kim) Abstract Recent advances in data-driven research have shown great potential in understanding the intricate relationships between materials and their performances. Herein, we introduce a novel multi modal data-driven approach employing an Automatic Battery data Collector (ABC) that integrates a large language model (LLM) with an automatic graph mining tool, Material Graph Digitizer (MatGD). This platform enables state-of-the-art accurate extraction of battery material data and cyclability performance metrics from diverse textual and graphical data sources. From the database derived through the ABC platform, we developed machine learning models that can accurately predict the capacity and stability of lithium metal batteries, which is the first-ever model developed to achieve such predictions. Our models were also experimentally validated, confirming practical applicability and reliability of our data-driven approach. INTRODUCTION Lithium metal batteries (LMBs) are a promising next-generation device that can achieve high capacity using lithium metal as an anode due to its exceptionally low density (0.534 g cm Therefore, these studies lack sufficient information to discern a comprehensive effect of different components on the battery performance. Additionally, previous mining research focused not on the entire battery cells but rather on the characteristics of individual battery components. Moreover, these studies were limited by the small number of entities considered and did not extract quantitative information such as concentrations or ratios. Furthermore, the absence of automatic graph mining tools made it difficult to obtain performance data from graphs, such as specific capacity and cycle stability.
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Materials > Metals & Mining > Lithium (1.00)
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)
Humanity in AI: Detecting the Personality of Large Language Models
Zhan, Baohua, Huang, Yongyi, Cui, Wenyao, Zhang, Huaping, Shang, Jianyun
Questionnaires are a common method for detecting the personality of Large Language Models (LLMs). However, their reliability is often compromised by two main issues: hallucinations (where LLMs produce inaccurate or irrelevant responses) and the sensitivity of responses to the order of the presented options. To address these issues, we propose combining text mining with questionnaires method. Text mining can extract psychological features from the LLMs' responses without being affected by the order of options. Furthermore, because this method does not rely on specific answers, it reduces the influence of hallucinations. By normalizing the scores from both methods and calculating the root mean square error, our experiment results confirm the effectiveness of this approach. To further investigate the origins of personality traits in LLMs, we conduct experiments on both pre-trained language models (PLMs), such as BERT and GPT, as well as conversational models (ChatLLMs), such as ChatGPT. The results show that LLMs do contain certain personalities, for example, ChatGPT and ChatGLM exhibit the personality traits of 'Conscientiousness'. Additionally, we find that the personalities of LLMs are derived from their pre-trained data. The instruction data used to train ChatLLMs can enhance the generation of data containing personalities and expose their hidden personality. We compare the results with the human average personality score, and we find that the personality of FLAN-T5 in PLMs and ChatGPT in ChatLLMs is more similar to that of a human, with score differences of 0.34 and 0.22, respectively.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)
Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification
Peng, Letian, Gu, Yi, Dong, Chengyu, Wang, Zihan, Shang, Jingbo
For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot generate in-distribution (i.e., similar to the corpus where the text classifier will be applied) data, leading to ungeneralizable classifiers. In this paper, we combine the advantages of these two approaches and propose to bridge the gap via a novel framework, \emph{text grafting}, which aims to obtain clean and near-distribution weak supervision for minority classes. Specifically, we first use LLM-based logits to mine masked templates from the raw corpus, which have a high potential for data synthesis into the target minority class. Then, the templates are filled by state-of-the-art LLMs to synthesize near-distribution texts falling into minority classes. Text grafting shows significant improvement over direct mining or synthesis on minority classes. We also use analysis and case studies to comprehend the property of text grafting.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > Dominican Republic (0.04)
- (16 more...)
Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data
Xiao, Lingxi, Li, Muqing, Feng, Yinqiu, Wang, Meiqi, Zhu, Ziyi, Chen, Zexi
The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining. It targets the challenge of analyzing unstructured text information within medical data. This research seeks to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms. This paper reviews the basic principles and typical model architecture of attention mechanisms and shows the effectiveness of their application in the tasks of disease prediction, drug side effect monitoring, and entity relationship extraction. Aiming at the particularity of medical texts, an adaptive attention model integrating domain knowledge is proposed, and its ability to understand medical terms and process complex contexts is optimized. The experiment verifies the model's effectiveness in improving task accuracy and robustness, especially when dealing with long text. The future research path of enhancing model interpretation, realizing cross-domain knowledge transfer, and adapting to low-resource scenarios is discussed in the research outlook, which provides a new perspective and method support for intelligent medical information processing and clinical decision assistance. Finally, cross-domain knowledge transfer and adaptation strategies for low-resource scenarios, providing theoretical basis and technical reference for promoting the development of intelligent medical information processing and clinical decision support systems.
- North America > United States > North Carolina (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
Text mining in education
Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., Romero, C.
The explosive growth of online education environments is generating a massive volume of data, specially in text format from forums, chats, social networks, assessments, essays, among others. It produces exciting challenges on how to mine text data in order to find useful knowledge for educational stakeholders. Despite the increasing number of educational applications of text mining published recently, we have not found any paper surveying them. In this line, this work presents a systematic overview of the current status of the Educational Text Mining field. Our final goal is to answer three main research questions: Which are the text mining techniques most used in educational environments? Which are the most used educational resources? And which are the main applications or educational goals? Finally, we outline the conclusions and the more interesting future trends.
- South America > Brazil > Pernambuco (0.04)
- South America > Brazil > Alagoas (0.04)
- North America > United States > New York (0.04)
- (4 more...)
- Research Report (1.00)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.93)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting > Online (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
- Education > Assessment & Standards > Student Performance (0.93)
A New Multifractal-based Deep Learning Model for Text Mining
Wang, Zhenhua, Ren, Ming, Gao, Dong
Text mining aims to automatically and efficiently uncover and explore valuable information or patterns from noisy, irregular and unstructured texts [1-2], thereby enabling us to gain a deeper understanding of the underlying meaning and context within the text, and easily exploring the knowledge, uncovering hidden insights. It can generate informed understanding of the content and has become significant in decision-making in various sectors and domains across industries. For example, we can understand users' preferences [3], sentiments [4], opinions [5], concerns [6] and 2 interests [7] etc., by mining the text generated by users, thus infer their intentions and purposes [8-11]. We are also amenable to the attainment of more sophisticated security risk management practices [12]. Additionally, text mining is responsible for various natural language processing applications such as knowledge graph [13-14], questionanswer dialogue system [15-16], and recommendation system [17-18]. Text mining mainly approaches entity recognition and text classification, both of which exhibit certain distinctions in their form. The purpose of entity recognition is to automatically identify expected knowledge from text [19], such as defect knowledge and technical terms in technical reports [20-21].
- Information Technology (0.66)
- Energy > Oil & Gas > Upstream (0.46)
BioBERT Based SNP-traits Associations Extraction from Biomedical Literature
Dehghani, Mohammad, Bokharaeian, Behrouz, Yazdanparast, Zahra
Scientific literature contains a considerable amount of information that provides an excellent opportunity for developing text mining methods to extract biomedical relationships. An important type of information is the relationship between singular nucleotide polymorphisms (SNP) and traits. In this paper, we present a BioBERT-GRU method to identify SNP- traits associations. Based on the evaluation of our method on the SNPPhenA dataset, it is concluded that this new method performs better than previous machine learning and deep learning based methods. BioBERT-GRU achieved the result a precision of 0.883, recall of 0.882 and F1-score of 0.881.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Machine Learning Approach for Cancer Entities Association and Classification
Jeyakodi, G., Pal, Arkadeep, Gupta, Debapratim, Sarukeswari, K., Amouda, V.
As numerous biomedical research articles are published regularly, adding knowledge to the accumulated literature on different diseases, such as cancer, neurodegenerative diseases, and hereditary diseases. One of the leading causes of global mortality disease is cancer due to various reasons such as lifestyle habits, radiation exposure, viral infections, and tobacco consumption [1] [2]. These reasons ultimately make some genetic change in a cell of tissue which causes it to become cancerous. Due to the top priority given to cancer research compared to other human diseases, enormous articles were published [3] [4] in a short period [5]. It can serve as a relevant source for cancer knowledge discovery in different fields of diagnostics, application of drugs, genetic association, prevention, and treatment. An automate downloading of articles and extraction of related entities will advance the progression of the research faster. Natural Language Processing (NLP) helps in communicating computers with humans in their language and converts the unstructured data into structured data to improve the accuracy of text mining. NLP function guides to understanding the human query language to discover knowledge from literature without much manual effort [6]. Named Entity Recognition (NER) and text classification is used mainly for text mining [7].
- Asia > India > Puducherry (0.05)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)