AITopics | Cherep, Manuel

Collaborating Authors

Cherep, Manuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bridging the Data Provenance Gap Across Text, Speech and Video

Longpre, Shayne, Singh, Nikhil, Cherep, Manuel, Tiwary, Kushagra, Materzynska, Joanna, Brannon, William, Mahari, Robert, Dey, Manan, Hamdy, Mohammed, Saxena, Nayan, Anis, Ahmad Mustafa, Alghamdi, Emad A., Chien, Vu Minh, Obeng-Marnu, Naana, Yin, Da, Qian, Kun, Li, Yizhi, Liang, Minnie, Dinh, An, Mohanty, Shrestha, Mataciunas, Deividas, South, Tobin, Zhang, Jianguo, Lee, Ariel N., Lund, Campbell S., Klamm, Christopher, Sileo, Damien, Misra, Diganta, Shippole, Enrico, Klyman, Kevin, Miranda, Lester JV, Muennighoff, Niklas, Ye, Seonghyeon, Kim, Seungone, Gupta, Vipul, Sharma, Vivek, Zhou, Xuhui, Xiong, Caiming, Villa, Luis, Biderman, Stella, Pentland, Alex, Hooker, Sara, Kabbara, Jad

arXiv.org Artificial IntelligenceDec-18-2024

Progress in AI is driven largely by the scale and quality of training data. Despite this, there is a deficit of empirical analysis examining the attributes of well-established datasets beyond text. In this work we conduct the largest and first-of-its-kind longitudinal audit across modalities--popular text, speech, and video datasets--from their detailed sourcing trends and use restrictions to their geographical and linguistic representation. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets, eclipsing all other sources since 2019. Secondly, tracing the chain of dataset derivations we find that while less than 33% of datasets are restrictively licensed, over 80% of the source content in widely-used text, speech, and video datasets, carry non-commercial restrictions. Finally, counter to the rising number of languages and geographies represented in public AI training datasets, our audit demonstrates measures of relative geographical and multilingual representation have failed to significantly improve their coverage since 2013. We believe the breadth of our audit enables us to empirically examine trends in data sourcing, restrictions, and Western-centricity at an ecosystem-level, and that visibility into these questions are essential to progress in responsible AI. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire multimodal audit, allowing practitioners to trace data provenance across text, speech, and video.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.17847

Country:

Europe (1.00)
Asia > Middle East (0.92)
Africa (0.67)
(3 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Information Technology > Security & Privacy (0.67)
Media > News (0.67)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contrastive Learning from Synthetic Audio Doppelgangers

Cherep, Manuel, Singh, Nikhil

arXiv.org Artificial IntelligenceJun-9-2024

Learning robust audio representations currently demands extensive datasets of real-world sound recordings. By applying artificial transformations to these recordings, models can learn to recognize similarities despite subtle variations through techniques like contrastive learning. However, these transformations are only approximations of the true diversity found in real-world sounds, which are generated by complex interactions of physical processes, from vocal cord vibrations to the resonance of musical instruments. We propose a solution to both the data scale and transformation limitations, leveraging synthetic audio. By randomly perturbing the parameters of a sound synthesizer, we generate audio doppelg\"angers-synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. These variations, difficult to achieve through transformations of existing audio, provide a rich source of contrastive information. Despite the shift to randomly generated synthetic data, our method produces strong representations, competitive with real data on standard audio classification benchmarks. Notably, our approach is lightweight, requires no data storage, and has only a single hyperparameter, which we extensively analyze. We offer this method as a complement to existing strategies for contrastive learning in audio, using synthesized sounds to reduce the data burden on practitioners.

artificial intelligence, machine learning, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2406.05923

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment (0.88)
Media > Music (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback

Creative Text-to-Audio Generation via Synthesizer Programming

Cherep, Manuel, Singh, Nikhil, Shand, Jessica

arXiv.org Artificial IntelligenceJun-1-2024

Neural audio synthesis methods now allow specifying ideas in natural language. However, these methods produce results that cannot be easily tweaked, as they are based on large latent spaces and up to billions of uninterpretable parameters. We propose a text-to-audio generation method that leverages a virtual modular sound synthesizer with only 78 parameters. Synthesizers have long been used by skilled sound designers for media like music and film due to their flexibility and intuitive controls. Our method, CTAG, iteratively updates a synthesizer's parameters to produce high-quality audio renderings of text prompts that can be easily inspected and tweaked. Sounds produced this way are also more abstract, capturing essential conceptual features over fine-grained acoustic details, akin to how simple sketches can vividly convey visual concepts. Our results show how CTAG produces sounds that are distinctive, perceived as artistic, and yet similarly identifiable to recent neural audio synthesis models, positioning it as a valuable and complementary tool.

creative text-to-audio generation, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.00294

Country:

Europe > Austria (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback