AITopics

2309.0761

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Riego, Neil Christian R., Villarba, Danny Bell

Utilization of Multinomial Naive Bayes Algorithm and Term Frequency Inverse Document Frequency (TF-IDF Vectorizer) in Checking the Credibility of News Tweet in the Philippines

arXiv.org Artificial IntelligenceMay-30-2023

The digitalization of news media become a good indicator of progress and signal to more threats. Media disinformation or fake news is one of these threats, and it is necessary to take any action in fighting disinformation. This paper utilizes ground truth-based annotations and TF-IDF as feature extraction for the news articles which is then used as a training data set for Multinomial Naive Bayes. The model has an accuracy of 99.46% in training and 88.98% in predicting unseen data. Tagging fake news as real news is a concerning point on the prediction that is indicated in the F1 score of 89.68%. This could lead to a negative impact. To prevent this to happen it is suggested to further improve the corpus collection, and use an ensemble machine learning to reinforce the prediction

artificial intelligence, dataset, machine learning, (14 more...)

2306.00018

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.05)
South America > Colombia > Meta Department > Villavicencio (0.05)
Asia > Philippines > Visayas > Central Visayas > Province of Cebu > City of Lapu-Lapu (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.82)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceMar-21-2023, 06:10:58 GMT

Understanding TF-IDF in NLP: A Comprehensive Guide

Natural Language Processing (NLP) is an area of computer science that focuses on the interaction between human language and computers. One of the fundamental tasks of NLP is to extract relevant information from large volumes of unstructured data. In this article, we will explore one of the most popular techniques used in NLP called TF-IDF. TF-IDF is a numerical statistic that reflects the importance of a word in a document. It is commonly used in NLP to represent the relevance of a term to a document or a corpus of documents.

corpus, frequency, tf-idf, (14 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceNov-22-2022

Method for Determining the Similarity of Text Documents for the Kazakh language, Taking Into Account Synonyms: Extension to TF-IDF

Bakiyev, Bakhyt

The task of determining the similarity of text documents has received considerable attention in many areas such as Information Retrieval, Text Mining, Natural Language Processing (NLP) and Computational Linguistics. Transferring data to numeric vectors is a complex task where algorithms such as tokenization, stopword filtering, stemming, and weighting of terms are used. The term frequency - inverse document frequency (TF-IDF) is the most widely used term weighting method to facilitate the search for relevant documents. To improve the weighting of terms, a large number of TF-IDF extensions are made. In this paper, another extension of the TF-IDF method is proposed where synonyms are taken into account. The effectiveness of the method is confirmed by experiments on functions such as Cosine, Dice and Jaccard to measure the similarity of text documents for the Kazakh language.

artificial intelligence, information retrieval, natural language, (16 more...)

doi: 10.1109/SIST54437.2022.9945747

2211.12364

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Madatov, Khabibulla, Bekchanov, Shukurla, Vičič, Jernej

Accuracy of the Uzbek stop words detection: a case study on "School corpus"

arXiv.org Artificial IntelligenceSep-15-2022

Stop words are very important for information retrieval and text analysis investigation tasks of natural language processing. Current work presents a method to evaluate the quality of a list of stop words aimed at automatically creating techniques. Although the method proposed in this paper was tested on an automatically-generated list of stop words for the Uzbek language, it can be, with some modifications, applied to similar languages either from the same family or the ones that have an agglutinative nature. Since the Uzbek language belongs to the family of agglutinative languages, it can be explained that the automatic detection of stop words in the language is a more complex process than in inflected languages. Moreover, we integrated our previous work on stop words detection in the example of the "School corpus" by investigating how to automatically analyse the detection of stop words in Uzbek texts. This work is devoted to answering whether there is a good way of evaluating available stop words for Uzbek texts, or whether it is possible to determine what part of the Uzbek sentence contains the majority of the stop words by studying the numerical characteristics of the probability of unique words. The results show acceptable accuracy of the stop words lists.

artificial intelligence, natural language, stop word, (16 more...)

2209.07053

Country:

Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.05)
Asia > Uzbekistan (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)

Genre: Research Report (0.70)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

#artificialintelligenceAug-2-2022, 13:30:46 GMT

Theory Behind the Basics of NLP - Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Natural Language Processing (NLP) can help you to understand any text's sentiments. This is helpful for people to understand the emotions and the type of text they are looking over. Negative and Positive comments can be easily differentiated. NLP wanted to make machines understand the text or comment the same way humans can.

corpus, frequency, vocabulary, (15 more...)

Country: Europe > Holy See > Vatican City (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

#artificialintelligenceApr-24-2022, 14:45:25 GMT

Three Unique Architectures For Deep Learning Based Recommendation Systems

Deep learning based recommendation system architectures make use of multiple simpler approaches in order to remediate the shortcomings of any single approach to extracting, transforming and vectorizing a large corpus of data into a useful recommendation for an end user. High-level extraction architectures are useful for categorization, but lack accuracy. Low-level extraction approaches will produce committed decisions about what to recommend, but, since they lack context, their recommendations may be banal, repetitive or even recursive, creating unintelligent'content bubbles' for the user. High level architectures cannot'zoom in' meaningfully, and low-level architectures cannot'step back' to understand the bigger picture that the data is presenting. In this article we'll take a look at three unique approaches that reconcile these two needs into effective and unified frameworks suitable for recommender systems.

architecture, recommendation, recommender system, (14 more...)

Country: Asia > China (0.04)

Industry:

Transportation > Passenger (0.48)
Transportation > Ground > Road (0.48)
Automobiles & Trucks > Manufacturer (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceApr-7-2022, 20:13:38 GMT

Combining NLP and Machine Learning for Document Classification

Text mining is a popular topic for exploring what text you have in documents etc. Text mining and NLP can help you discover different patterns in the text like uncovering certain words or phases which are commonly used, to identifying certain patterns and linkages between different texts/documents. Combining this work on Text mining you can use Word Clouds, time-series analysis, etc to discover other aspects and patterns in the text. Check out my previous blog posts (post 1, post 2) on performing Text Mining on documents (manifestos from some of the political parties from the last two national government elections in Ireland). These two posts gives you a simple indication of what is possible.

classification, dataset, frequency, (10 more...)

Industry: Government (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.52)

#artificialintelligenceJan-4-2022, 07:20:42 GMT

Text preprocessing techniques- Twitter Data

Text files contain enormous amounts of information. Language data analysis is the most difficult task for a computer to perform since a computer cannot understand the semantics of text. In order to accomplish this, we convert text data into a machine-readable format. Data in text format is converted to numerical values (or vectors) by text processing, so that these vectors may be given to the machine as input and analysed with the algebraic principles. However, there's a chance of data loss if we go through with the transition.

corpus, frequency, vector, (14 more...)

Industry: Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

#artificialintelligenceOct-30-2021, 04:30:12 GMT

Have you thought-How Computer Interacts with the Humans?

NLP stands for Natural Language Processing.It is a branch of artificial intelligence that deals with the interaction between computers and humans using the natural language. NLP has the ability of a computer to understand, analyze, manipulate, and potentially generate human language. Just 21% of the available data is present in the organized form in the 21st century. Millions of tweets, emails and web searches are generated daily, resulting in a huge amount of data increasing by the minute..And most of these data are in the form of text and unstructure.Natural Language Processing plays an important role in structuring data. Sentimental Analysis is the interpretation and classification of emotions in positive,negative or neutral within the text data using text analysis techniques.

frequency, stop word, tokenization, (12 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)