Goto

Collaborating Authors

 stica


Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando Autoencoders

arXiv.org Artificial Intelligence

Currently, digital security mechanisms like Anomaly Detection Systems using Autoencoders (AE) show great potential for bypassing problems intrinsic to the data, such as data imbalance. Because AE use a non-trivial and nonstandardized separation threshold to classify the extracted reconstruction error, the definition of this threshold directly impacts the performance of the detection process. Thus, this work proposes the automatic definition of this threshold using some machine learning algorithms. For this, three algorithms were evaluated: the K-Nearst Neighbors, the K-Means and the Support Vector Machine.


Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileira

arXiv.org Artificial Intelligence

Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning. Linguistic bias in chatgpt: Language models reinforce dialect discrimination. "i don't think these devices are very culturally sensitive."


Detecci\'on Autom\'atica de Patolog\'ias en Notas Cl\'inicas en Espa\~nol Combinando Modelos de Lenguaje y Ontolog\'ias M\'edicos

arXiv.org Artificial Intelligence

In this paper we present a hybrid method for the automatic detection of dermatological pathologies in medical reports. We use a large language model combined with medical ontologies to predict, given a first appointment or follow-up medical report, the pathology a person may suffer from. The results show that teaching the model to learn the type, severity and location on the body of a dermatological pathology as well as in which order it has to learn these three features significantly increases its accuracy. The article presents the demonstration of state-of-the-art results for classification of medical texts with a precision of 0.84, micro and macro F1-score of 0.82 and 0.75, and makes both the method and the dataset used available to the community.


A global AI community requires language-diverse publishing

arXiv.org Artificial Intelligence

In this provocation, we discuss the English dominance of the AI research community, arguing that the requirement for English language publishing upholds and reinforces broader regimes of extraction in AI. While large language models and machine translation have been celebrated as a way to break down barriers, we regard their use as a symptom of linguistic exclusion of scientists and potential readers. We propose alternative futures for a healthier publishing culture, organized around three themes: administering conferences in the languages of the country in which they are held, instructing peer reviewers not to adjudicate the language appropriateness of papers, and offering opportunities to publish and present in multiple languages. We welcome new translations of this piece. Please contact the authors if you would like to contribute one.


Estudo comparativo de meta-heur\'isticas para problemas de colora\c{c}\~oes de grafos

arXiv.org Artificial Intelligence

A classic graph coloring problem is to assign colors to vertices of any graph so that distinct colors are assigned to adjacent vertices. Optimal graph coloring colors a graph with a minimum number of colors, which is its chromatic number . Finding out the chromatic number is a combinatorial optimization problem proven to be computationally intractable, which implies that no algorithm that computes large instances of the problem in a reasonable time is known. F or this reason, approximate methods and metaheuristics form a set of techniques that do not guarantee optimality, but obtain good solutions in a reasonable time. This paper reports a comparative study of the Hill-Climbing, Simulated Annealing, T abu Search, and Iterated Local Search metaheuristics for the classic graph coloring problem considering its time efficiency for processing the DSJC125 and DSJC250 instances of the DIMACS benchmark.


M\'{e}todos para la Selecci\'{o}n y el Ajuste de Caracter\'{i}sticas en el Problema de la Detecci\'{o}n de Spam

arXiv.org Artificial Intelligence

The email is used daily by millions of people to communicate around the globe and it is a mission-critical application for many businesses. Over the last decade, unsolicited bulk email has become a major problem for email users. An overwhelming amount of spam is flowing into users' mailboxes daily. In 2004, an estimated 62% of all email was attributed to spam. Spam is not only frustrating for most email users, it strains the IT infrastructure of organizations and costs businesses billions of dollars in lost productivity. In recent years, spam has evolved from an annoyance into a serious security threat, and is now a prime medium for phishing of sensitive information, as well the spread of malicious software. This work presents a first approach to attack the spam problem. We propose an algorithm that will improve a classifier's results by adjusting its training set data. It improves the document's vocabulary representation by detecting good topic descriptors and discriminators.