AITopics | keybert

Collaborating Authors

keybert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A BERT-Based Summarization approach for depression detection

Gavalan, Hossein Salahshoor, Rastgoo, Mohmmad Naim, Nakisa, Bahareh

arXiv.org Artificial IntelligenceSep-12-2024

Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed, especially in individuals with recurrent episodes. Prior research has shown that early intervention has the potential to mitigate or alleviate symptoms of depression. However, implementing such interventions in a real-world setting may pose considerable challenges. A promising strategy involves leveraging machine learning and artificial intelligence to autonomously detect depression indicators from diverse data sources. One of the most widely available and informative data sources is text, which can reveal a person's mood, thoughts, and feelings. In this context, virtual agents programmed to conduct interviews using clinically validated questionnaires, such as those found in the DAIC-WOZ dataset, offer a robust means for depression detection through linguistic analysis. Utilizing BERT-based models, which are powerful and versatile yet use fewer resources than contemporary large language models, to convert text into numerical representations significantly enhances the precision of depression diagnosis. These models adeptly capture complex semantic and syntactic nuances, improving the detection accuracy of depressive symptoms. Given the inherent limitations of these models concerning text length, our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts. Implementing this method within our uniquely developed framework for feature extraction and classification yielded an F1-score of 0.67 on the test set surpassing all prior benchmarks and 0.81 on the validation set exceeding most previous results on the DAIC-WOZ dataset. Furthermore, we have devised a depression lexicon to assess summary quality and relevance. This lexicon constitutes a valuable asset for ongoing research in depression detection.

dataset, depression, depression detection, (16 more...)

arXiv.org Artificial Intelligence

2409.08483

Country:

Oceania > Australia (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Oceania > New Zealand (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques

Tami, Mohammad, Ashqar, Huthaifa I., Elhenawy, Mohammed

arXiv.org Artificial IntelligenceJun-11-2024

Question generation for education assessments is a growing field within artificial intelligence applied to education. These question-generation tools have significant importance in the educational technology domain, such as intelligent tutoring systems and dialogue-based platforms. The automatic generation of assessment questions, which entail clear-cut answers, usually relies on syntactical and semantic indications within declarative sentences, which are then transformed into questions. Recent research has explored the generation of assessment educational questions in Arabic. The reported performance has been adversely affected by inherent errors, including sentence parsing inaccuracies, name entity recognition issues, and errors stemming from rule-based question transformation. Furthermore, the complexity of lengthy Arabic sentences has contributed to these challenges. This research presents an innovative Arabic question-generation system built upon a three-stage process: keywords and key phrases extraction, question generation, and subsequent ranking. The aim is to tackle the difficulties associated with automatically generating assessment questions in the Arabic language. The proposed approach and results show a precision of 83.50%, a recall of 78.68%, and an Fl score of 80.95%, indicating the framework high efficiency. Human evaluation further confirmed the model efficiency, receiving an average rating of 84%.

machine learning, question answering, question generation, (20 more...)

arXiv.org Artificial Intelligence

2406.0852

Country:

Asia > Middle East > Palestine (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.55)
Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Exploring acceptance of autonomous vehicle policies using KeyBERT and SNA: Targeting engineering students

Ha, Jinwoo, Kim, Dongsoo

arXiv.org Artificial IntelligenceJul-18-2023

This study aims to explore user acceptance of Autonomous Vehicle (AV) policies with improved text-mining methods. Recently, South Korean policymakers have viewed Autonomous Driving Car (ADC) and Autonomous Driving Robot (ADR) as next-generation means of transportation that will reduce the cost of transporting passengers and goods. They support the construction of V2I and V2V communication infrastructures for ADC and recognize that ADR is equivalent to pedestrians to promote its deployment into sidewalks. To fill the gap where end-user acceptance of these policies is not well considered, this study applied two text-mining methods to the comments of graduate students in the fields of Industrial, Mechanical, and Electronics-Electrical-Computer. One is the Co-occurrence Network Analysis (CNA) based on TF-IWF and Dice coefficient, and the other is the Contextual Semantic Network Analysis (C-SNA) based on both KeyBERT, which extracts keywords that contextually represent the comments, and double cosine similarity. The reason for comparing these approaches is to balance interest not only in the implications for the AV policies but also in the need to apply quality text mining to this research domain. Significantly, the limitation of frequency-based text mining, which does not reflect textual context, and the trade-off of adjusting thresholds in Semantic Network Analysis (SNA) were considered. As the results of comparing the two approaches, the C-SNA provided the information necessary to understand users' voices using fewer nodes and features than the CNA. The users who pre-emptively understood the AV policies based on their engineering literacy and the given texts revealed potential risks of the AV accident policies. This study adds suggestions to manage these risks to support the successful deployment of AVs on public roads.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.09014

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.93)
Questionnaire & Opinion Survey (0.93)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

Golchin, Shahriar, Surdeanu, Mihai, Tavabi, Nazgol, Kiapour, Ata

arXiv.org Artificial IntelligenceJul-14-2023

We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.0716

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > Oregon (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Semantic Keywords And Keyphrases Extraction With KeyBERT

#artificialintelligenceFeb-14-2022, 10:06:11 GMT

It is also possible to use different embedding models for multilingual tasks in case you might want to use other languages. N-gram words/expressions retrieval: from the same previous document, keywords and key phrases are extracted using the n-gram approach. We get keywords when the n-gram range is (1, 1). N-grams embedding: each one of those n-grams is then embedded using the same embedding model as the one used for the original document. Cosine Similarity search: amongst the previous set of words/phrases/expressions, the most similar ones to the input document are selected using the cosine similarity metrics.

input document, keybert, semantic keyword and keyphrase extraction

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

How to Extract Relevant Keywords with KeyBERT

#artificialintelligenceJun-17-2021, 20:20:36 GMT

There are many powerful techniques that perform keywords extraction (e.g. However, they are mainly based on the statistical properties of the text and don't necessarily take into account the semantic aspects of the full document. KeyBERT is a minimal and easy-to-use keyword extraction technique that aims at solving this issue. It leverages the BERT language model and relies on the transformers library. So go check his repo (and clone it) if you're interested in using it.

extract relevant keyword, keybert, keyword, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback