AITopics | Barker, Jon

Collaborating Authors

Barker, Jon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Ranzinger, Mike, Barker, Jon, Heinrich, Greg, Molchanov, Pavlo, Catanzaro, Bryan, Tao, Andrew

arXiv.org Artificial IntelligenceOct-2-2024

Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models." We build upon this body of work by studying the effect of the teachers' activation statistics, particularly the impact of the loss function on the resulting student model quality. We explore a standard toolkit of statistical normalization techniques to better align the different distributions and assess their effects. Further, we examine the impact on downstream teacher-matching metrics, which motivates the use of Hadamard matrices. With these matrices, we demonstrate useful properties, showing how they can be used for isotropic standardization, where each dimension of a multivariate distribution is standardized using the same scale. We call this technique "PHI Standardization" (PHI-S) and empirically demonstrate that it produces the best student model across the suite of methods studied.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.0168

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (1.00)

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids

Dabike, Gerardo Roa, Akeroyd, Michael A., Bannister, Scott, Barker, Jon, Cox, Trevor J., Fazenda, Bruno, Firth, Jennifer, Graetzer, Simone, Greasley, Alinka, Vos, Rebecca R., Whitmer, William M.

arXiv.org Artificial IntelligenceJan-29-2024

This paper reports on the design and results of the 2024 ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The Cadenza project is working to enhance the audio quality of music for those with a hearing loss. The scenario for the challenge was listening to stereo reproduction over loudspeakers via hearing aids. The task was to: decompose pop/rock music into vocal, drums, bass and other (VDBO); rebalance the different tracks with specified gains and then remixing back to stereo. End-to-end approaches were also accepted. 17 systems were submitted by 11 teams. Causal systems performed poorer than non-causal approaches. 9 systems beat the baseline. A common approach was to fine-tuning pretrained demixing models. The best approach used an ensemble of models.

artificial intelligence, machine learning, music, (14 more...)

arXiv.org Artificial Intelligence

2310.0348

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.86)
Health & Medicine > Therapeutic Area > Immunology (0.86)
Health & Medicine > Therapeutic Area > Otolaryngology (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

Mogridge, Rhiannon, Close, George, Sutherland, Robert, Hain, Thomas, Barker, Jon, Goetze, Stefan, Ragni, Anton

arXiv.org Artificial IntelligenceJan-24-2024

Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.

artificial intelligence, correctness, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.13611

Country:

Europe (0.14)
Asia > Taiwan (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Otolaryngology (0.38)
Health & Medicine > Health Care Technology (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

Ehrlich, Max, Barker, Jon, Padmanabhan, Namitha, Davis, Larry, Tao, Andrew, Catanzaro, Bryan, Shrivastava, Abhinav

arXiv.org Artificial IntelligenceOct-30-2023

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.

artificial intelligence, machine learning, video, (20 more...)

arXiv.org Artificial Intelligence

2202.00011

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

Dabike, Gerardo Roa, Bannister, Scott, Firth, Jennifer, Graetzer, Simone, Vos, Rebecca, Akeroyd, Michael A., Barker, Jon, Cox, Trevor J., Fazenda, Bruno, Greasley, Alinka, Whitmer, William

arXiv.org Artificial IntelligenceOct-9-2023

The Cadenza project aims to improve the audio quality of music for those who have a hearing loss. This is being done through a series of signal processing challenges, to foster better and more inclusive technologies. In the first round, two common listening scenarios are considered: listening to music over headphones, and with a hearing aid in a car. The first scenario is cast as a demixing-remixing problem, where the music is decomposed into vocals, bass, drums and other components. These can then be intelligently remixed in a personalized way, to increase the audio quality for a person who has a hearing loss. In the second scenario, music is coming from car loudspeakers, and the music has to be enhanced to overcome the masking effect of the car noise. This is done by taking into account the music, the hearing ability of the listener, the hearing aid and the speed of the car. The audio quality of the submissions will be evaluated using the Hearing Aid Audio Quality Index (HAAQI) for objective assessment and by a panel of people with hearing loss for subjective evaluation.

artificial intelligence, hearing loss, music, (13 more...)

arXiv.org Artificial Intelligence

2310.05799

Country: Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Otolaryngology (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The Use of Voice Source Features for Sung Speech Recognition

Dabike, Gerardo Roa, Barker, Jon

arXiv.org Artificial IntelligenceFeb-23-2021

In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung speech recognition, arguing that conclusions previously drawn from spoken speech studies may not be valid in the sung speech domain. We first use a parallel singing/speaking corpus (NUS-48E) to illustrate differences in sung vs spoken voicing characteristics including pitch range, syllables duration, vibrato, jitter and shimmer. We then use this analysis to inform speech recognition experiments on the sung speech DSing corpus, using a state of the art acoustic model and augmenting conventional features with various voice source parameters. Experiments are run with three standard (increasingly large) training sets, DSing1 (15.1 hours), DSing3 (44.7 hours) and DSing30 (149.1 hours). Pitch combined with degree of voicing produces a significant decrease in WER from 38.1% to 36.7% when training with DSing1 however smaller decreases in WER observed when training with the larger more varied DSing3 and DSing30 sets were not seen to be statistically significant. Voicing quality characteristics did not improve recognition performance although analysis suggests that they do contribute to an improved discrimination between voiced/unvoiced phoneme pairs.

artificial intelligence, speech, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2102.10376

Country: North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Malware Detection by Eating a Whole EXE

Raff, Edward (Booz Allen Hamilton, University of Maryland) | Barker, Jon (Nvidia) | Sylvester, Jared (Booz Allen Hamilton) | Brandon, Robert (Booz Allen Hamilton, University of Maryland) | Catanzaro, Bryan (Nvidia) | Nicholas, Charles K. (University of Maryland)

AAAI ConferencesApr-6-2018

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community. Building a neural network for such a problem presents a number of interesting challenges that have not occurred in tasks such as image processing or NLP. In particular, we note that detection from raw bytes presents a sequence problem with over two million time steps and a problem where batch normalization appear to hinder the learning process. We present our initial work in building a solution to tackle this problem, which has linear complexity dependence on the sequence length, and allows for interpretable sub-regions of the binary to be identified. In doing so we will discuss the many challenges in building a neural network to process data at this scale, and the methods we used to work around them.

exe, malware detection

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

Add feedback