AITopics | wordcloud

Collaborating Authors

wordcloud

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism

Babalola, Olusola, Ojokoh, Bolanle, Boyinbode, Olutayo

arXiv.org Artificial IntelligenceNov-18-2025

This research examines the potential of datasets generated by Large Language Models (LLMs) to support Natural Language Processing (NLP) tasks, aiming to overcome challenges related to data acquisition and privacy concerns associated with real-world data. Focusing on negative valence text, a critical component of sentiment analysis, we explore the use of LLM-generated synthetic news headlines as an alternative to real-world data. A specialized corpus of negative news headlines was created using tailored prompts to capture diverse negative sentiments across various societal domains. The synthetic headlines were validated by expert review and further analyzed in embedding space to assess their alignment with real-world negative news in terms of content, tone, length, and style. Key metrics such as correlation with real headlines, perplexity, coherence, and realism were evaluated. The synthetic dataset was benchmarked against two sets of real news headlines using evaluations including the Comparative Perplexity Test, Comparative Readability Test, Comparative POS Profiling, BERTScore, and Comparative Semantic Similarity. Results show the generated headlines match real headlines with the only marked divergence being in the proper noun score of the POS profile test.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.11591

Country:

Africa > Nigeria (0.28)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Government (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Opinion Mining on Offshore Wind Energy for Environmental Engineering

Bittencourt, Isabele, Varde, Aparna S., Lal, Pankaj

arXiv.org Artificial IntelligenceSep-21-2024

In this paper, we conduct sentiment analysis on social media data to study mass opinion about offshore wind energy. We adapt three machine learning models, namely, TextBlob, VADER, and SentiWordNet because different functions are provided by each model. TextBlob provides subjectivity analysis as well as polarity classification. VADER offers cumulative sentiment scores. SentiWordNet considers sentiments with reference to context and performs classification accordingly. Techniques in NLP are harnessed to gather meaning from the textual data in social media. Data visualization tools are suitably deployed to display the overall results. This work is much in line with citizen science and smart governance via involvement of mass opinion to guide decision support. It exemplifies the role of Machine Learning and NLP here.

artificial intelligence, natural language, offshore wind energy, (16 more...)

arXiv.org Artificial Intelligence

2409.14292

Country:

Europe > Germany (0.05)
North America > United States > New Jersey > Atlantic County > Atlantic City (0.04)
Europe > United Kingdom (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Energy > Renewable > Wind (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Web scraping and text analysis in R and GGplot2 – A.Z. Andis Arietta

#artificialintelligenceDec-29-2022, 23:21:23 GMT

I recently needed to learn text mining for a project at work. I generally learn more quickly with a real-world project. So, I turned to a topic I love: Wilderness, to see how I could apply the skills of text scrubbing and natural language processing. You can clone my Git repo for the project or follow along in the post below. The first portion of this post will cover web scraping, then text mining, and finally analysis and visualization.

text mining, western region, wilderness area, (12 more...)

#artificialintelligence

Country:

North America > United States > Hawaii (0.05)
North America > Puerto Rico (0.05)
Pacific Ocean > North Pacific Ocean > Cook Inlet (0.05)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Web Mining (0.61)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.40)

Add feedback

Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering

Esmaeilzadeh, Soheil, Williams, Brian, Shamsi, Davood, Vikingstad, Onar

arXiv.org Artificial IntelligenceOct-8-2022

Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would be limited since it not only fails to account for embedded contexts but also cannot detect polysemous words or phrases and semantics that are not expressible in single words. In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data. Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors. The encoded vectors then get clustered either into an optimally tuned number of groups or into a set of groups with pre-specified titles. In the former case, the clusters are then further analyzed to extract a representative set of keywords or summary sentences that serve as the labels of the clusters. In our framework, for the designated clusters, we finally provide context-aware wordclouds that demonstrate the semantically prominent keywords within each group. Honoring user privacy, we have successfully built the on-device implementation of our framework suitable for real-time analysis on mobile devices and have tested it on a synthetic dataset. Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-11644-5

2203.01294

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China (0.04)

Genre:

Questionnaire & Opinion Survey (0.94)
Overview (0.68)
Instructional Material (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)

Add feedback

Resume Screening using Deep Learning on Cainvas

#artificialintelligenceJan-10-2022, 06:20:33 GMT

Resume Screening is necessary when companies receive thousands of applications for different roles and need to find suitable matches. For this project, the dataset originally consists of 2 columns -- Category and Resume, where the Category denotes the field (eg: Data Science, HR, Testing etc.). By using value_counts on Category, we can find the frequency-wise distribution of different categories present in our dataset. During pre-processing, we need to remove links, hashtags, urls etc. as these are irrelevant in the resume. Further, using nltk, we also remove stopwords (for eg words like'are', 'the', 'or') that provide no significance to the content.

cainva, deep learning, résumé screening, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Predicting the Difficulty of Texts Using Machine Learning and Getting a Visual Representation of…

#artificialintelligenceDec-29-2021, 17:28:43 GMT

We see that text data is ubiquitous in nature. There is a lot of text present in different forms such as posts, books, articles, and blogs. What is more interesting is the fact that there is a subset of Artificial Intelligence called Natural Language Processing (NLP) that would convert text into a form that could be used for machine learning. I know that sounds a lot but getting to know the details and the proper implementation of machine learning algorithms could ensure that one learns the important tools in the process. Since there are newer and better libraries being created to be used for machine learning purposes, it would make sense to learn some of the state-of-the-art tools that could be used for predictions. I've recently come across a challenge on Kaggle about predicting the difficulty of the text.

library, mathematical vector, visual representation, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Predicting Fake News using NLP and Machine Learning

#artificialintelligenceMay-14-2021, 10:25:06 GMT

The ratio is disturbed from being 1:1 to 4:5 for genuine to fake news. It is seen that the median length is lower for fake articles but it also has loads of outliers. It is seen that they start from 0 which is concerning. It actually starts from 1 when I used .describe() to see the numbers. So I took a look at these texts and found that they are blank.

expand contraction, nlp and machine learning, wordcloud, (1 more...)

#artificialintelligence

Industry: Media > News (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Spam Email Detection Using Machine Learning

#artificialintelligenceApr-21-2021, 20:17:08 GMT

There are 4,825 ham and 747 spam messages. This indicates the data is imbalanced which needs to be fixed. The top ham message is "Sorry, I'll call later", whereas the top spam message is "Please call our customer service…" which occurred 30 and 4 times, respectively. First, let's create a separate dataframe for ham and spam messages and convert it to NumPy array and then to a list to generate WordCloud later. Since it is a text data, there are many unnecessary stopwords like articles, prepositions etc., which needs to be removed from the data.

architecture, ham message, spam message, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The discovery of wine's structural form

#artificialintelligenceJan-30-2021, 22:26:54 GMT

Today I will present a guided tutorial for applying Kemp & Tenembaum's brilliant "form discovery" algorithm to a wine dataset. Ultimately, this provides a data-driven map to choose wines from, based on our tastes. If you are, like me, fond of data science, machine learning, cognition and/or a wine lover, then you might find this post interesting. Actually, if you know of ways it could be improved I'd love to hear them!] First of all, like every recipe, we'll start with a list of things we need: Essentially, in their work Kemp & Tenenbaum created an algorithm which finds the best structural representation for a dataset, without any assumption nor indication about this dimension.

algorithm, dataset, wine variety, (15 more...)

#artificialintelligence

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology:

Information Technology > Data Science (0.54)
Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Develop Text into WordCloud in Python

#artificialintelligenceOct-31-2020, 23:55:55 GMT

Word clouds or tag clouds are graphical representations of word frequency that give greater prominence to words that appear more frequently in a source text. The larger the word in the visual the more common the word was in the document(s). Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud. Word clouds are widely used for analyzing data from social network websites. For generating word cloud in Python, modules needed are -- matplotlib, pandas and wordcloud.

artificial intelligence, natural language, word cloud, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (0.98)

Add feedback