AITopics | Moubayed, Noura Al

Plotting

Moubayed, Noura Al

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning

Xiao, Chenghao, Long, Yang, Moubayed, Noura Al

arXiv.org Artificial IntelligenceMay-26-2023

Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SRL through the lens of isotropy, contextualization and learning dynamics. We interpret its successes through the geometry of the representation shifts and show that contrastive learning brings isotropy, and drives high intra-sentence similarity: when in the same sentence, tokens converge to similar positions in the semantic space. We also find that what we formalize as "spurious contextualization" is mitigated for semantically meaningful tokens, while augmented for functional ones. We find that the embedding space is directed towards the origin during training, with more areas now better defined. We ablate these findings by observing the learning dynamics with different training temperatures, batch sizes and pooling methods.

artificial intelligence, natural language, text processing, (15 more...)

arXiv.org Artificial Intelligence

2212.0917

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Zhang, Peng, Huang, Yawen, Hu, Bingzhang, Wang, Shizheng, Duan, Haoran, Moubayed, Noura Al, Zheng, Yefeng, Long, Yang

arXiv.org Artificial IntelligenceNov-2-2022

Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space using the predictive models with only historical data. The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions. The main focuses are summarized as: 1) how to balance the long-sight and short-sight rewards with an optimal strategy; 2) how to make the virtual model interacting with real environment to converge to a final learning policy. Under the experimental settings of Fed-Batch Process, our method consistently outperforms the existing state-of-the-art methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2211.01266

Country: Asia > China (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy > Oil & Gas > Upstream (0.93)
Education > Educational Setting > Online (0.82)
Education > Educational Technology > Educational Software > Computer Based Training (0.43)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MuLD: The Multitask Long Document Benchmark

Hudson, G Thomas, Moubayed, Noura Al

arXiv.org Artificial IntelligenceFeb-15-2022

The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in designing efficient techniques for processing much longer inputs. In this paper, we present MuLD: a new long document benchmark consisting of only documents over 10,000 tokens. By modifying existing NLP tasks, we create a diverse benchmark which requires models to successfully model long-term dependencies in the text. We evaluate how existing models perform, and find that our benchmark is much more challenging than their `short document' equivalents. Furthermore, by evaluating both regular and efficient transformers, we show that models with increased context length are better able to solve the tasks presented, suggesting that future improvements in these models are vital for solving similar long document problems. We release the data and code for baselines to encourage further research on efficient NLP models.

artificial intelligence, multitask long document benchmark, natural language, (1 more...)

arXiv.org Artificial Intelligence

2202.07362

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Measuring Hidden Bias within Face Recognition via Racial Phenotypes

Yucer, Seyma, Tektas, Furkan, Moubayed, Noura Al, Breckon, Toby P.

arXiv.org Artificial IntelligenceOct-19-2021

Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. lighter or darker skins). The use of such sensitive or broad group definitions has disadvantages for bias investigation and subsequent counter-bias solutions design. By contrast, this study introduces an alternative racial bias analysis methodology via facial phenotype attributes for face recognition. We use the set of observable characteristics of an individual face where a race-related facial phenotype is hence specific to the human face and correlated to the racial profile of the subject. We propose categorical test cases to investigate the individual influence of those attributes on bias within face recognition tasks. We compare our phenotype-based grouping methodology with previous grouping strategies and show that phenotype-based groupings uncover hidden bias without reliance upon any potentially protected attributes or ill-defined grouping strategies. Furthermore, we contribute corresponding phenotype attribute category labels for two face recognition tasks: RFW for face verification and VGGFace2 (test set) for face identification.

artificial intelligence, measuring hidden bias, racial phenotype, (2 more...)

arXiv.org Artificial Intelligence

2110.09839

Genre: Research Report (0.77)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

Towards Equal Gender Representation in the Annotations of Toxic Language Detection

Excell, Elizabeth, Moubayed, Noura Al

arXiv.org Artificial IntelligenceJun-3-2021

Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opinions of male annotators. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data. We then apply the learned associations between gender and language to toxic language classifiers, finding that models trained exclusively on female-annotated data perform 1.8% better than those trained solely on male-annotated data and that training models on data after removing all offensive words reduces bias in the model by 55.5% while increasing the sensitivity by 0.4%.

annotator, artificial intelligence, natural language, (18 more...)

arXiv.org Artificial Intelligence

2106.02183

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.94)

Add feedback

Curvature-based Feature Selection with Application in Classifying Electronic Health Records

Zuo, Zheming, Li, Jie, Moubayed, Noura Al

arXiv.org Artificial IntelligenceJan-10-2021

Electronic Health Records (EHRs) are widely applied in healthcare facilities nowadays. Due to the inherent heterogeneity, unbalanced, incompleteness, and high-dimensional nature of EHRs, it is a challenging task to employ machine learning algorithms to analyse such EHRs for prediction and diagnostics within the scope of precision medicine. Dimensionality reduction is an efficient data preprocessing technique for the analysis of high dimensional data that reduces the number of features while improving the performance of the data analysis, e.g. classification. In this paper, we propose an efficient curvature-based feature selection method for supporting more precise diagnosis. The proposed method is a filter-based feature selection method, which directly utilises the Menger Curvature for ranking all the attributes in the given data set. We evaluate the performance of our method against conventional PCA and recent ones including BPCM, GSAM, WCNN, BLS II, VIBES, 2L-MJFA, RFGA, and VAF. Our method achieves state-of-the-art performance on four benchmark healthcare data sets including CCRFDS, BCCDS, BTDS, and DRDDS with impressive 24.73% and 13.93% improvements respectively on BTDS and CCRFDS, 7.97% improvement on BCCDS, and 3.63% improvement on DRDDS. Our CFS source code is publicly available at https://github.com/zhemingzuo/CFS.

feature selection, neural network, oncology, (19 more...)

arXiv.org Artificial Intelligence

2101.03581

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Modality Bias in the TVQA Dataset

Winterbottom, Thomas, Xiao, Sarah, McLean, Alistair, Moubayed, Noura Al

arXiv.org Artificial IntelligenceDec-18-2020

TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require "both vision and language understanding to answer". In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles learn, on-average, to suppress video feature contribution. Our results demonstrate that models trained on only the visual information can answer ~45% of the questions, while using only the subtitles achieves ~68%. We find that a bilinear pooling based joint representation of modalities damages model performance by 9% implying a reliance on modality specific information. We also show that TVQA fails to benefit from the RUBi modality bias reduction technique popularised in VQA. By simply improving text processing using BERT embeddings with the simple model first proposed for TVQA, we achieve state-of-the-art results (72.13%) compared to the highly complex STAGE model (70.50%). We recommend a multimodal evaluation framework that can highlight biases in models and isolate visual and textual reliant subsets of data. Using this framework we propose subsets of TVQA that respond exclusively to either or both modalities in order to facilitate multimodal modelling as TVQA originally intended.

dataset, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2012.1021

Country:

Europe (0.28)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback