AITopics | Graff, Mario

Collaborating Authors

Graff, Mario

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of Systems' Performance in Natural Language Processing Competitions

Nava-Muñoz, Sergio, Graff, Mario, Escalante, Hugo Jair

arXiv.org Artificial IntelligenceMar-7-2024

Collaborative competitions have gained popularity in the scientific and technological fields. These competitions involve defining tasks, selecting evaluation scores, and devising result verification methods. In the standard scenario, participants receive a training set and are expected to provide a solution for a held-out dataset kept by organizers. An essential challenge for organizers arises when comparing algorithms' performance, assessing multiple participants, and ranking them. Statistical tools are often used for this purpose; however, traditional statistical methods often fail to capture decisive differences between systems' performance. This manuscript describes an evaluation methodology for statistically analyzing competition results and competition. The methodology is designed to be universally applicable; however, it is illustrated using eight natural language competitions as case studies involving classification and regression problems. The proposed methodology offers several advantages, including off-the-shell comparisons with correction mechanisms and the inclusion of confidence intervals. Furthermore, we introduce metrics that allow organizers to assess the difficulty of competitions. Our analysis shows the potential usefulness of our methodology for effectively evaluating competition results.

artificial intelligence, competition, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.04693

Country: North America > Mexico > Puebla (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Consumer Products & Services (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Regionalized models for Spanish language variations based on Twitter

Tellez, Eric S., Moctezuma, Daniela, Miranda, Sabino, Graff, Mario, Ruiz, Guillermo

arXiv.org Artificial IntelligenceDec-9-2022

Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and spoken in the same way in different countries. Understanding local language variations can help to improve model performances on regional tasks, both understanding local structures and also improving the message's content. For instance, think about a machine learning engineer who automatizes some language classification task on a particular region or a social scientist trying to understand a regional event with echoes on social media; both can take advantage of dialect-based language models to understand what is happening with more contextual information hence more precision. This manuscript presents and describes a set of regionalized resources for the Spanish language built on four-year Twitter public messages geotagged in 26 Spanish-speaking countries. We introduce word embeddings based on FastText, language models based on BERT, and per-region sample corpora. We also provide a broad comparison among regions covering lexical and semantical similarities; as well as examples of using regional resources on message classification tasks.

machine learning, natural language, springer nature 2021, (20 more...)

arXiv.org Artificial Intelligence

2110.06128

Country:

North America > Mexico (1.00)
Europe (1.00)
South America > Venezuela (0.67)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Selection Heuristics on Semantic Genetic Programming for Classification Problems

Sánchez, Claudia N., Graff, Mario

arXiv.org Machine LearningJul-16-2019

In a steady-state evolution, tournament selection traditionally uses the fitness function to select the parents, and negative selection chooses an individual to be replaced with an offspring. This contribution focuses on analyzing the behavior, in terms of performance, of different heuristics when used instead of the fitness function in tournament selection. The heuristics analyzed are related to measuring the similarity of the individuals in the semantic space. In addition, the analysis includes random selection and traditional tournament selection. These selection functions were implemented on our Semantic Genetic Programming system, namely EvoDAG, which is inspired by the geometric genetic operators and tested on 30 classification problems with a variable number of samples, variables, and classes. The result indicated that the combination of accuracy and the random selection, in the negative tournament, produces the best combination, and the difference in performances between this combination and the tournament selection is statistically significant. Furthermore, we compare EvoDAG's performance using the selection heuristics against 18 classifiers that included traditional approaches as well as auto-machine-learning techniques. The results indicate that our proposal is competitive with state-of-art classifiers. Finally, it is worth to mention that EvoDAG is available as open source software.

artificial intelligence, evolutionary algorithm, selection, (19 more...)

arXiv.org Machine Learning

1907.07066

Country: North America > United States > New York (0.15)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Feature space transformations and model selection to improve the performance of classifiers

Ortiz-Bejar, Jose, Tellez, Eric S., Graff, Mario

arXiv.org Machine LearningJul-14-2019

Improving the performance of classifiers is the realm of prototype selection and kernel transformations. Prototype selection has been used to reduce the space complexity of k-Nearest Neighbors classifiers and to improve its accuracy, and kernel transformations enhanced the performance of linear classifiers by converting a non-linear separable problem into a linear one in the transformed space. Our proposal combines, in a model selection scheme, these transformations with classic algorithms such as Na\"ive Bayes and k-Nearest Neighbors to produce a competitive classifier. We analyzed our approach on different classification problems and compared it to state-of-the-art classifiers. The results show that the methodology proposed is competitive, obtaining the lowest rank among the classifiers being compared.

arXiv.org Machine Learning

1907.06258

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)

Add feedback

EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis

Graff, Mario, Miranda-Jiménez, Sabino, Tellez, Eric S., Moctezuma, Daniela

arXiv.org Machine LearningNov-29-2018

Sentiment analysis (SA) is a task related to understanding people's feelings in written text; the starting point would be to identify the polarity level (positive, neutral or negative) of a given text, moving on to identify emotions or whether a text is humorous or not. This task has been the subject of several research competitions in a number of languages, e.g., English, Spanish, and Arabic, among others. In this contribution, we propose an SA system, namely EvoMSA, that unifies our participating systems in various SA competitions, making it domain independent and multilingual by processing text using only language-independent techniques. EvoMSA is a classifier, based on Genetic Programming, that works by combining the output of different text classifiers and text models to produce the final prediction. We analyze EvoMSA, with its parameters fixed, on different SA competitions to provide a global overview of its performance, and as the results show, EvoMSA is competitive obtaining top rankings in several SA competitions. Furthermore, we performed an analysis of EvoMSA's components to measure their contribution to the performance; the idea is to facilitate a practitioner or newcomer to implement a competitive SA classifier. Finally, it is worth to mention that EvoMSA is available as open source software.

deep learning, evomsa, neural network, (22 more...)

arXiv.org Machine Learning

1812.02307

Country:

Europe > Spain (0.14)
Europe > Denmark (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(4 more...)

Add feedback

An Automated Text Categorization Framework based on Hyperparameter Optimization

Tellez, Eric S., Moctezuma, Daniela, Miranda-Jímenez, Sabino, Graff, Mario

arXiv.org Artificial IntelligenceSep-14-2017

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.

artificial intelligence, classifier, text processing, (21 more...)

arXiv.org Artificial Intelligence

1704.01975

Country: North America > United States > Colorado (0.14)

Industry:

Information Technology > Services (0.46)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.89)

Add feedback

A Simple Approach to Multilingual Polarity Classification in Twitter

Tellez, Eric S., Jiménez, Sabino Miranda, Graff, Mario, Moctezuma, Daniela, Suárez, Ranyart R., Siordia, Oscar S.

arXiv.org Machine LearningDec-15-2016

Recently, sentiment analysis has received a lot of attention due to the interest in mining opinions of social media users. Sentiment analysis consists in determining the polarity of a given text, i.e., its degree of positiveness or negativeness. Traditionally, Sentiment Analysis algorithms have been tailored to a specific language given the complexity of having a number of lexical variations and errors introduced by the people generating content. In this contribution, our aim is to provide a simple to implement and easy to use multilingual framework, that can serve as a baseline for sentiment analysis contests, and as starting point to build new sentiment analysis systems. We compare our approach in eight different languages, three of them have important international contests, namely, SemEval (English), TASS (Spanish), and SENTIPOLC (Italian). Within the competitions our approach reaches from medium to high positions in the rankings; whereas in the remaining languages our approach outperforms the reported results.

artificial intelligence, semeval, social media, (16 more...)

arXiv.org Machine Learning

1612.0527

Country:

North America > United States > Colorado (0.14)
North America > United States > California (0.14)
Europe > Middle East > Malta (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback