Goto

Collaborating Authors

 Large Language Model


MuLan: A Joint Embedding of Music Audio and Natural Language

arXiv.org Artificial Intelligence

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.


AI And The Limits Of Language

#artificialintelligence

Jacob Browning is a postdoc in NYU's Department of Computer Science working on the philosophy of AI. Yann LeCun is a Turing Award-winning machine learning researcher and an NYU Silver professor. When a Google engineer recently declared Google's AI chatbot a person, pandemonium ensued. The chatbot, LaMDA, is a large language model (LLM) that is designed to predict the likely next words to whatever lines of text it is given. Since many conversations are somewhat predictable, these systems can infer how to keep a conversation going productively. LaMDA did this so impressively that the engineer, Blake Lemoine, began to wonder about whether there was a ghost in the machine.


Evaluating Diverse Knowledge Sources for Online One-shot Learning of Novel Tasks

#artificialintelligence

Online autonomous agents are able to draw on a wide variety of potential sources of task knowledge; however current approaches invariably focus on only one or two. Here we investigate the challenges and impact of exploiting diverse knowledge sources to learn, in one-shot, new tasks for a simulated household mobile robot. The resulting agent, developed in the Soar cognitive architecture, uses the following sources of domain and task knowledge: interaction with the environment, task execution and planning knowledge, human natural language instruction, and responses retrieved from a large language model (GPT-3). We explore the distinct contributions of these knowledge sources and evaluate the performance of different combinations in terms of learning correct task knowledge, human workload, and computational costs. The results from combining all sources demonstrate that integration improves one-shot task learning overall in terms of computational costs and human workload.


Assembly AI offers AI-as-a-service API to ease model development

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Over the last decade, artificial intelligence (AI) technologies have increasingly relied on neural networks to perform pattern recognition, machine learning (ML) and prediction. However, with ML models that consist of billions of parameters, training becomes more complicated as the model is unable to fit on a single GPU. Large language models (LLMs) such as GPT-3 and Gopher cost millions of dollars and require vast amounts of computing resources, making it challenging for cash and resource-constrained organizations to enter the field.


La veille de la cybersécurité

#artificialintelligence

AI adoption may be steadily rising, but a closer examination shows that most enterprise companies may not be quite ready for the big time when it comes to artificial intelligence. Recent data from Palo Alto, California-based AI unicorn SambaNova Systems, for example, shows that more than two-thirds of organizations think using artificial intelligence (AI) will cut costs by automating processes and using employees more efficiently. But only 18% are rolling out large-scale, enterprise-class AI initiatives. The rest are introducing AI individually across multiple programs, rather than risking an investment in big-picture, large-scale adoption. That will create an increasing amount of distance between companies that are AI leaders and innovators and those that fall behind, said Marshall Choy, senior vice president of product at SambaNova, which offers custom-built dataflow-as-a-service (and won VentureBeat's AI Innovation Award for Edge AI in 2021). Companies that are more mature in AI and able to invest in large-scale adoption will reap the rewards, he told VentureBeat, while the ones introducing AI across multiple programs will suffer from information and insight silos.


Why some AI companies are securing massive funding despite economic downturn

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Tech startups are going through tough times as a result of a slowdown in growth capital. Investment firms are advising their portfolio companies to extend their runway. Companies are suffering from valuation markdowns and resorting to layoffs to cut costs.


Disney-backed Inworld raises cash for AI-powered characters – TechCrunch

#artificialintelligence

If software is eating the world, AI isn't far behind. AI-powered text-, art- and audio-generating systems will soon make -- and already are making -- their way into the tools people use every day, from programming environments and spellcheck plugins to concept art creation platforms. The video game industry is no exception to this, and that hardly comes as a surprise. As illustrated by games like AI Dungeon, AI -- while imperfect -- can inject surprising creativity and novelty into branching narrative storytelling. Inworld AI was founded on this premise.


Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

arXiv.org Artificial Intelligence

Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems. Nevertheless, if inexperienced developers fail to describe their problems clearly, it is hard for them to attract sufficient attention and get the anticipated answers. We propose M$_3$NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets. Developers may use the generated titles to find closely related posts and complete their problem descriptions. M$_3$NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding and generation ability. To alleviate the ambiguity issue that the same code snippets could be aligned with different titles under varying contexts, we propose the maximal marginal multiple nucleus sampling strategy to generate multiple high-quality and diverse title candidates at a time for the developers to choose from. We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M$_3$NSCT5. The automatic evaluation results on the BLEU and ROUGE metrics demonstrate the superiority of M$_3$NSCT5 over six state-of-the-art baseline models. Moreover, a human evaluation with trustworthy results also demonstrates the great potential of our approach for real-world application.


PEER: A Collaborative Language Model

arXiv.org Artificial Intelligence

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself: PEER can write drafts, add suggestions, propose edits and provide explanations for its actions. Crucially, we train multiple instances of PEER able to infill various parts of the writing process, enabling the use of self-training techniques for increasing the quality, amount and diversity of training data. This unlocks PEER's full potential by making it applicable in domains for which no edit histories are available and improving its ability to follow instructions, to write useful comments, and to explain its actions. We show that PEER achieves strong performance across various domains and editing tasks.


Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

arXiv.org Artificial Intelligence

Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training. Zero-Shot (ZS) learning overcomes this restriction by predicting classes based on adaptable class descriptions. This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning. To this end, we compare the very recent patchout spectrogram transformer with two classic convolutional architectures. We evaluate these three architectures on three tasks and on three different benchmark datasets: general-purpose tagging on AudioSet, environmental sound classification on ESC-50, and instrument tagging on OpenMIC. Our results show that the self-attention-based embedding methods outperform both compared convolutional architectures in all of these settings. By designing training and test data accordingly, we observe that prediction performance suffers significantly when the `semantic distance' between training and new test classes is large, an effect that will deserve more detailed investigations.