AITopics | Brun, Armelle

Collaborating Authors

Brun, Armelle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Frugal Model for Accurate Early Student Failure Prediction

Ikram, Gagaoua, Brun, Armelle, Boyer, Anne

arXiv.org Artificial IntelligenceJan-10-2025

Predicting student success or failure is vital for timely interventions and personalized support. Early failure prediction is particularly crucial, yet limited data availability in the early stages poses challenges, one of the possible solutions is to make use of additional data from other contexts, however, this might lead to overconsumption with no guarantee of better results. To address this, we propose the Frugal Early Prediction (FEP) model, a new hybrid model that selectively incorporates additional data, promoting data frugality and efficient resource utilization. Experiments conducted on a public dataset from a VLE demonstrate FEP's effectiveness in reducing data usage, a primary goal of this research.Experiments showcase a remarkable 27% reduction in data consumption, compared to a systematic use of additional data, aligning with our commitment to data frugality and offering substantial benefits to educational institutions seeking efficient data consumption. Additionally, FEP also excels in enhancing prediction accuracy. Compared to traditional approaches, FEP achieves an average accuracy gain of 7.3%. This not only highlights the practicality and efficiency of FEP but also its superiority in performance, while respecting resource constraints, providing beneficial findings for educational institutions seeking data frugality.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2502.00017

Country: Europe > France (0.29)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Online (0.48)
Education > Educational Technology > Educational Software (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Do Similar Entities have Similar Embeddings?

Hubert, Nicolas, Paulheim, Heiko, Brun, Armelle, Monticolo, Davy

arXiv.org Artificial IntelligenceDec-16-2023

Knowledge graph embedding models (KGEMs) developed for link prediction learn vector representations for graph entities, known as embeddings. A common tacit assumption is the KGE entity similarity assumption, which states that these KGEMs retain the graph's structure within their embedding space, i.e., position similar entities close to one another. This desirable property make KGEMs widely used in downstream tasks such as recommender systems or drug repurposing. Yet, the alignment of graph similarity with embedding space similarity has rarely been formally evaluated. Typically, KGEMs are assessed based on their sole link prediction capabilities, using ranked-based metrics such as Hits@K or Mean Rank. This paper challenges the prevailing assumption that entity similarity in the graph is inherently mirrored in the embedding space. Therefore, we conduct extensive experiments to measure the capability of KGEMs to cluster similar entities together, and investigate the nature of the underlying factors. Moreover, we study if different KGEMs expose a different notion of similarity. Datasets, pre-trained embeddings and code are available at: https://github.com/nicolas-hbt/similar-embeddings.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.1037

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Data Science > Data Mining (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.67)

Add feedback

Sem@$K$: Is my knowledge graph embedding model semantic-aware?

Hubert, Nicolas, Monnin, Pierre, Brun, Armelle, Monticolo, Davy

arXiv.org Artificial IntelligenceDec-7-2023

Using knowledge graph embedding models (KGEMs) is a popular approach for predicting links in knowledge graphs (KGs). Traditionally, the performance of KGEMs for link prediction is assessed using rank-based metrics, which evaluate their ability to give high scores to ground-truth entities. However, the literature claims that the KGEM evaluation procedure would benefit from adding supplementary dimensions to assess. That is why, in this paper, we extend our previously introduced metric Sem@K that measures the capability of models to predict valid entities w.r.t. domain and range constraints. In particular, we consider a broad range of KGs and take their respective characteristics into account to propose different versions of Sem@K. We also perform an extensive study to qualify the abilities of KGEMs as measured by our metric. Our experiments show that Sem@K provides a new perspective on KGEM quality. Its joint analysis with rank-based metrics offers different conclusions on the predictive power of models. Regarding Sem@K, some KGEMs are inherently better than others, but this semantic superiority is not indicative of their performance w.r.t. rank-based metrics. In this work, we generalize conclusions about the relative performance of KGEMs w.r.t. rank-based and semantic-oriented metrics at the level of families of models. The joint analysis of the aforementioned metrics gives more insight into the peculiarities of each model. This work paves the way for a more comprehensive evaluation of KGEM adequacy for specific downstream tasks.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2301.05601

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Louisiana (0.14)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.69)

Add feedback

Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

Hubert, Nicolas, Paulheim, Heiko, Monnin, Pierre, Brun, Armelle, Monticolo, Davy

arXiv.org Artificial IntelligenceOct-19-2023

Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage RDF/S information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHinE substantially increases the number of semantically valid predictions with equivalent rank-based performance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.03659

Country:

Europe (0.68)
North America > United States > Louisiana (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

PyGraft: Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips

Hubert, Nicolas, Monnin, Pierre, d'Aquin, Mathieu, Brun, Armelle, Monticolo, Davy

arXiv.org Artificial IntelligenceSep-7-2023

Knowledge graphs (KGs) have emerged as a prominent data representation and management paradigm. Being usually underpinned by a schema (e.g. an ontology), KGs capture not only factual information but also contextual knowledge. In some tasks, a few KGs established themselves as standard benchmarks. However, recent works outline that relying on a limited collection of datasets is not sufficient to assess the generalization capability of an approach. In some data-sensitive fields such as education or medicine, access to public datasets is even more limited. To remedy the aforementioned issues, we release PyGraft, a Python-based tool that generates highly customized, domain-agnostic schemas and knowledge graphs. The synthesized schemas encompass various RDFS and OWL constructs, while the synthesized KGs emulate the characteristics and scale of real-world KGs. Logical consistency of the generated resources is ultimately ensured by running a description logic (DL) reasoner. By providing a way of generating both a schema and KG in a single pipeline, PyGraft's aim is to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML), or more generally KG processing. In graph-based ML in particular, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks. PyGraft is available at: https://github.com/nicolas-hbt/pygraft.

artificial intelligence, description logic, schema and knowledge graph, (3 more...)

arXiv.org Artificial Intelligence

2309.03685

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Description Logic (0.53)

Add feedback

Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction

Hubert, Nicolas, Monnin, Pierre, Brun, Armelle, Monticolo, Davy

arXiv.org Artificial IntelligenceAug-8-2023

Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. domain and range constraints might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@10 values, (2) drive KGEMs towards better semantic awareness as measured by the Sem@K metric. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.00286

Country:

Europe (1.00)
Asia > China (0.28)
North America > United States > Louisiana (0.14)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback

Characterizing Financial Market Coverage using Artificial Intelligence

Tshimula, Jean Marie, Nkashama, D'Jeff K., Owusu, Patrick, Frappier, Marc, Tardif, Pierre-Martin, Kabanza, Froduald, Brun, Armelle, Patenaude, Jean-Marc, Wang, Shengrui, Chikhaoui, Belkacem

arXiv.org Artificial IntelligenceFeb-7-2023

This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a text corpus of market coverage videos from Bloomberg and Yahoo Finance. We employ natural language processing to extract insights regarding language use from the market coverage. Moreover, we examine the prominent presence of trending topics and their evolution over time, and the impacts that some individuals and organizations have on the financial market. Our characterization highlights the dynamics of the financial market coverage and provides valuable insights reflecting broad discussions regarding recent financial events and the world economy.

artificial intelligence, natural language, text processing, (15 more...)

arXiv.org Artificial Intelligence

2302.03694

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(10 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Multi-source Data Mining for e-Learning

Daher, Julie Bu, Brun, Armelle, Boyer, Anne

arXiv.org Artificial IntelligenceSep-17-2020

Data mining is the task of discovering interesting, unexpected or valuable structures in large datasets and transforming them into an understandable structure for further use . Different approaches in the domain of data mining have been proposed, among which pattern mining is the most important one. Pattern mining mining involves extracting interesting frequent patterns from data. Pattern mining has grown to be a topic of high interest where it is used for different purposes, for example, recommendations. Some of the most common challenges in this domain include reducing the complexity of the process and avoiding the redundancy within the patterns. So far, pattern mining has mainly focused on the mining of a single data source. However, with the increase in the amount of data, in terms of volume, diversity of sources and nature of data, mining multi-source and heterogeneous data has become an emerging challenge in this domain. This challenge is the main focus of our work where we propose to mine multi-source data in order to extract interesting frequent patterns.

computer based training, educational technology, information, (21 more...)

arXiv.org Artificial Intelligence

2009.08791

Country: Europe > France (0.15)

Genre: Research Report (0.40)

Industry:

Education > Educational Setting > Online (0.53)
Education > Educational Technology > Educational Software > Computer Based Training (0.52)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Dufraux, Adrien, Vincent, Emmanuel, Hannun, Awni, Brun, Armelle, Douze, Matthijs

arXiv.org Artificial IntelligenceOct-16-2019

The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show that Lead2Gold obtains a better ASR accuracy than a competitive baseline which does not account for the (artificially-introduced) transcription noise.

deep learning, speech recognition, transcription, (19 more...)

arXiv.org Artificial Intelligence

1910.07323

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback