AITopics | datamap

Collaborating Authors

datamap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DataMap: A Portable Application for Visualizing High-Dimensional Data

Ge, Xijin

arXiv.org Artificial IntelligenceApr-15-2025

Motivation: The visualization and analysis of high-dimensional data are essential in biomedical research. There is a need for secure, scalable, and reproducible tools to facilitate data exploration and interpretation. Results: We introduce DataMap, a browser-based application for visualization of high-dimensional data using heatmaps, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE). DataMap runs in the web browser, ensuring data privacy while eliminating the need for installation or a server. The application has an intuitive user interface for data transformation, annotation, and generation of reproducible R code. Availability and Implementation: Freely available as a GitHub page https://gexijin.github.io/datamap/. The source code can be found at https://github.com/gexijin/datamap, and can also be installed as an R package. Contact: Xijin.Ge@sdstate.ed

artificial intelligence, machine learning, visualization, (16 more...)

arXiv.org Artificial Intelligence

2504.08875

Country: North America > United States (0.16)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.96)
Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Security & Privacy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Add feedback

Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Seedat, Nabeel, Imrie, Fergus, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceMar-7-2024

Characterizing samples that are difficult to learn from is crucial to developing highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus regarding the definition and evaluation of "hardness". Unfortunately, current HCMs have only been evaluated on specific types of hardness and often only qualitatively or with respect to downstream performance, overlooking the fundamental quantitative identification task. We address this gap by presenting a fine-grained taxonomy of hardness types. Additionally, we propose the Hardness Characterization Analysis Toolkit (H-CAT), which supports comprehensive and quantitative benchmarking of HCMs across the hardness taxonomy and can easily be extended to new HCMs, hardness types, and datasets. We use H-CAT to evaluate 13 different HCMs across 8 hardness types. This comprehensive evaluation encompassing over 14K setups uncovers strengths and weaknesses of different HCMs, leading to practical tips to guide HCM selection and future development. Our findings highlight the need for more comprehensive HCM evaluation, while we hope our hardness taxonomy and toolkit will advance the principled evaluation and uptake of data-centric AI methods.

allsh aum el2n, aum el2n, datamap, (14 more...)

arXiv.org Artificial Intelligence

2403.04551

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.65)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Improving Spoken Language Identification with Map-Mix

Rajaa, Shangeth, Anandan, Kriti, Dalmia, Swaraj, Gupta, Tarun, Chng, Eng Siong

arXiv.org Artificial IntelligenceFeb-16-2023

The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages. However, the performance significantly degrades when the languages are not very distinct from each other, for example, in the case of dialects. Low resource dialect classification remains a challenging problem to solve. We present a new data augmentation method that leverages model training dynamics of individual data points to improve sampling for latent mixup. The method works well in low-resource settings where generalization is paramount. Our datamaps-based mixup technique, which we call Map-Mix improves weighted F1 scores by 2% compared to the random mixup baseline and results in a significantly well-calibrated model. The code for our method is open sourced on https://github.com/skit-ai/Map-Mix.

machine learning, mixup, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.08229

Country: Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Skit-S2I: An Indian Accented Speech to Intent dataset

Rajaa, Shangeth, Dalmia, Swaraj, Nethil, Kumarmanas

arXiv.org Artificial IntelligenceDec-26-2022

Conventional conversation assistants extract text transcripts from the speech signal using automatic speech recognition (ASR) and then predict intent from the transcriptions. Using end-to-end spoken language understanding (SLU), the intents of the speaker are predicted directly from the speech signal without requiring intermediate text transcripts. As a result, the model can optimize directly for intent classification and avoid cascading errors from ASR. The end-to-end SLU system also helps in reducing the latency of the intent prediction model. Although many datasets are available publicly for text-to-intent tasks, the availability of labeled speech-to-intent datasets is limited, and there are no datasets available in the Indian accent. In this paper, we release the Skit-S2I dataset, the first publicly available Indian-accented SLU dataset in the banking domain in a conversational tonality. We experiment with multiple baselines, compare different pretrained speech encoder's representations, and find that SSL pretrained representations perform slightly better than ASR pretrained representations lacking prosodic features for speech-to-intent classification. The dataset and baseline code is available at \url{https://github.com/skit-ai/speech-to-intent-dataset}

artificial intelligence, dataset, speech recognition, (18 more...)

arXiv.org Artificial Intelligence

2212.13015

Country:

North America > United States > Pennsylvania (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback