expression dataset
CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
Humans naturally use verbal utterances and nonverbal gestures to refer to various objects (known as $\textit{referring expressions}$) in different interactional scenarios. As collecting real human interaction datasets are costly and laborious, synthetic datasets are often used to train models to unambiguously detect relationships among objects. However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures. Additionally, while a few small-scale datasets contain multimodal cues (verbal and nonverbal), these datasets only capture the nonverbal gestures from an exo-centric perspective (observer). As models can use complementary information from multimodal cues to recognize referring expressions, generating multimodal data from multiple views can help to develop robust models.
CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
Humans naturally use verbal utterances and nonverbal gestures to refer to various objects (known as \textit{referring expressions}) in different interactional scenarios. As collecting real human interaction datasets are costly and laborious, synthetic datasets are often used to train models to unambiguously detect relationships among objects. However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures. Additionally, while a few small-scale datasets contain multimodal cues (verbal and nonverbal), these datasets only capture the nonverbal gestures from an exo-centric perspective (observer). As models can use complementary information from multimodal cues to recognize referring expressions, generating multimodal data from multiple views can help to develop robust models.
Handling Numeric Expressions in Automatic Speech Recognition
Huber, Christian, Waibel, Alexander
This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expression, such as years, timestamps, currency amounts, and quantities. For the end-to-end approach we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test dataset show that while approaches based on LLMs perform well on recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction
Sousa, Rita T., Paulheim, Heiko
Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.
- Europe > Germany (0.14)
- North America > United States (0.05)
- Europe > Switzerland (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.56)
Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition
Luo, Bingjun, Wang, Haowen, Wang, Jinpeng, Zhu, Junjie, Zhao, Xibin, Gao, Yue
With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.
Understanding and Mitigating Annotation Bias in Facial Expression Recognition
Chen, Yunliang, Joo, Jungseock
The performance of a computer vision model depends on the size and quality of its training data. Recent studies have unveiled previously-unknown composition biases in common image datasets which then lead to skewed model outputs, and have proposed methods to mitigate these biases. However, most existing works assume that human-generated annotations can be considered gold-standard and unbiased. In this paper, we reveal that this assumption can be problematic, and that special care should be taken to prevent models from learning such annotation biases. We focus on facial expression recognition and compare the label biases between lab-controlled and in-the-wild datasets. We demonstrate that many expression datasets contain significant annotation biases between genders, especially when it comes to the happy and angry expressions, and that traditional methods cannot fully mitigate such biases in trained models. To remove expression annotation bias, we propose an AU-Calibrated Facial Expression Recognition (AUC-FER) framework that utilizes facial action units (AUs) and incorporates the triplet loss into the objective function. Experimental results suggest that the proposed method is more effective in removing expression annotation bias than existing techniques.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Distilling Wikipedia mathematical knowledge into neural network models
Kim, Joanne T., Larma, Mikel Landajuela, Petersen, Brenden K.
Machine learning applications to symbolic mathematics are becoming increasingly popular, yet there lacks a centralized source of real-world symbolic expressions to be used as training data. In contrast, the field of natural language processing leverages resources like Wikipedia that provide enormous amounts of realworld textual data. Adopting the philosophy of "mathematics as language," we bridge this gap by introducing a pipeline for distilling mathematical expressions embedded in Wikipedia into symbolic encodings to be used in downstream machine learning tasks. We demonstrate that a mathematical language model trained on this "corpus" of expressions can be used as a prior to improve the performance of neural-guided search for the task of symbolic regression. "The basis of all human culture is language, and mathematics is a special kind of linguistic activity."
- North America > United States > California > Alameda County > Livermore (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Energy (0.48)
- Government (0.48)
KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis
Griffith, Obi L., Gao, Byron J., Bilenky, Mikhail, Prichyna, Yuliya, Ester, Martin, Jones, Steven J. M.
Numerous studies have used coexpression of large expression datasets to infer functional associations between genes [1], to identify groups of related genes that are important in specific cancers or represent common tumour progression mechanisms [2], to study evolutionary change [3], for integration with other large-scale datasets [4][5], [6], and for the generation of high-quality biological interaction networks [7][8][9] [10]. A number of studies have also attempted to use coexpression to identify coregulation with the hypothesis that if two or more genes are expressed at the same time and location and at similar levels then they may be regulated by the same transcription factors and regulatory elements. This approach has shown promise particularly in simpler model organisms such as A. thaliana and S. cerevisiae [11] [12][13] [14] and many groups are currently working on implementing this idea in mammalian systems. However, traditional clustering methods have not worked particularly well on large datasets for this problem. Most methods assign each gene to only one cluster while in reality many genes likely take part in multiple processes. Also, global coexpression is measured across all conditions, whereas, it is probable that most genes are only tightly coregulated under certain conditions or locations. In recent years, a new field of clustering analysis termed subspace clustering (or biclustering) has gained increasing popularity in the analysis of gene expression data and other biological data [15][16][17][18] [19]. In contrast to traditional clustering methods such as hierarchical clustering, subspace clustering methods do not require expression to be correlated across all conditions for genes to be assigned to the same cluster. This has several advantages for data in which biologically relevant subsets exist (e.g.
- North America > Canada (0.04)
- North America > United States > Texas (0.04)
- Research Report > New Finding (0.47)
- Research Report > Experimental Study (0.30)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.48)