Goto

Collaborating Authors

 representation technique


HSFN: Hierarchical Selection for Fake News Detection building Heterogeneous Ensemble

arXiv.org Artificial Intelligence

Psychological biases, such as confirmation bias, make individuals particularly vulnerable to believing and spreading fake news on social media, leading to significant consequences in domains such as public health and politics. Machine learning-based fact-checking systems have been widely studied to mitigate this problem. Among them, ensemble methods are particularly effective in combining multiple classifiers to improve robustness. However, their performance heavily depends on the diversity of the constituent classifiers-selecting genuinely diverse models remains a key challenge, especially when models tend to learn redundant patterns. In this work, we propose a novel automatic classifier selection approach that prioritizes diversity, also extended by performance. The method first computes pairwise diversity between classifiers and applies hierarchical clustering to organize them into groups at different levels of granularity. A HierarchySelect then explores these hierarchical levels to select one pool of classifiers per level, each representing a distinct intra-pool diversity. The most diverse pool is identified and selected for ensemble construction from these. The selection process incorporates an evaluation metric reflecting each classifiers's performance to ensure the ensemble also generalises well. We conduct experiments with 40 heterogeneous classifiers across six datasets from different application domains and with varying numbers of classes. Our method is compared against the Elbow heuristic and state-of-the-art baselines. Results show that our approach achieves the highest accuracy on two of six datasets. The implementation details are available on the project's repository: https://github.com/SaraBCoutinho/HSFN .


On the Effectiveness of Log Representation for Log-based Anomaly Detection

arXiv.org Artificial Intelligence

Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.


An Improved Baseline for Sentence-level Relation Extraction

arXiv.org Artificial Intelligence

Sentence-level relation extraction (RE) aims at identifying the relationship between two entities in a sentence. Many efforts have been devoted to this problem, while the best performing methods are still far from perfect. In this paper, we revisit two problems that affect the performance of existing RE models, namely entity representation and noisy or ill-defined labels. Our improved RE baseline, incorporated with entity representations with typed markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1% on the refined Re-TACRED dataset, demonstrating that the pretrained language models (PLMs) achieve high performance on this task. We release our code to the community for future research.


Embeddings as representation for symbolic music

arXiv.org Artificial Intelligence

Particularly in Natural Language Processing (NLP), the necessity of a good representation of the words that achieves some implicit context understanding is important [Turian et al., 2010]. A typical representation to feed in a machine learning model is the binary one-hot vector, in this case, an array with as many positions as words in the vocabulary is created, and the words are represented by a version of the array containing a one digit in the position corresponding to the word. For example, the sentence "I like eating bread and eating cheese", would have as vocabulary the set "I", "like", "eating", "bread", "and", "cheese", thus the representation of this words would be 6-dimensional binary one-hot vectors like "I" 100000, "like" 010000, cheese 000001. As you can imagine, this representation has no context understanding at all, since all words are completely independent, "bread" and "cheese" are as different between them as "I" and "like", which for a human is not the case.


Hybrid technique for effective knowledge representation & a comparative study

arXiv.org Artificial Intelligence

Knowledge representation(KR) and inference mechanism are most desirable thing to make the system intelligent. System is known to an intelligent if its intelligence is equivalent to the intelligence of human being for a particular domain or general. Because of incomplete ambiguous and uncertain information the task of making intelligent system is very difficult. The objective of this paper is to present the hybrid KR technique for making the system effective & Optimistic. The requirement for (effective & optimistic) is because the system must be able to reply the answer with a confidence of some factor. This paper also presents the comparison between various hybrid KR techniques with the proposed one. .


An AIer's Lament

AI Magazine

It is interesting to note that there is no agreed upon definition of artificial intelligence. Why is this interesting? Because government agencies ask for it, software shops claim to provide it, popular magazines and newspapers publish articles about it, dreamers base their fantasies on it, and pragmatists criticize and denounce it. Such a state of affairs has persisted since Newell, Simon and Shaw wrote their first chess program and proclaimed that in a few years, a computer would be the world champion. Not knowing exactly what we are talking about or expecting is typical of a new field; for example, witness the chaos that centered around program verification of security related aspects of systems a few years ago. The details are too grim to recount in mixed company. However, artificial intelligence has been around for 30 years, so one might wonder why our wheels are still spinning. Below, an attempt is made to answer this question and show why, in a serious sense, artificial intelligence can never demonstrate an outright success within its own discipline. In addition, we will see why the old bromide that "as soon as we understand how to solve a problem, it's no longer artificial intelligence" is necessarily true.