AITopics | hard parameter

Collaborating Authors

hard parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Yan, Brian, Chang, Xuankai, Anastasopoulos, Antonios, Fujita, Yuya, Watanabe, Shinji

arXiv.org Artificial IntelligenceSep-27-2023

Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modally. Our method reduces the speech-text modality gap via a pre-processing stage which converts speech and text inputs into two discrete token sequences of similar length -- this allows models to indiscriminately process both modalities simply using a joint vocabulary. With experiments on MuST-C, we demonstrate that our multi-tasking framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU without any external MT data. Further, we show that this framework incorporates external MT data, yielding +0.8 BLEU, and also improves transfer learning from pre-trained textual models, yielding +1.8 BLEU.

cross-modal multi-tasking, hard parameter, speech-to-text translation

arXiv.org Artificial Intelligence

2309.15826

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)

Add feedback

Multi-task learning in Machine Learning

#artificialintelligenceJun-30-2021, 04:40:24 GMT

In most machine learning contexts, we are concerned with solving a single task at a time. Regardless of what that task is, the problem is typically framed as using data to solve a single task or optimize a single metric at a time. However, this approach will eventually hit a performance ceiling, oftentimes due to the size of the data-set or the ability of the model to learn meaningful representations from it. Multi-task learning, on the other hand, is a machine learning approach in which we try to learn multiple tasks simultaneously, optimizing multiple loss functions at once. Rather than training independent models for each task, we allow a single model to learn to complete all of the tasks at once. In this process, the model uses all of the available data across the different tasks to learn generalized representations of the data that are useful in multiple contexts.

artificial intelligence, machine learning, representation, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Sharp Bias-variance Tradeoffs of Hard Parameter Sharing in High-dimensional Linear Regression

Zhang, Hongyang R., Yang, Fan, Wu, Sen, Su, Weijie J., Ré, Christopher

arXiv.org Machine LearningOct-22-2020

Hard parameter sharing for multi-task learning is widely used in empirical research despite the fact that its generalization properties have not been well established in many cases. This paper studies its generalization properties in a fundamental setting: How does hard parameter sharing work given multiple linear regression tasks? We develop new techniques and establish a number of new results in the high-dimensional setting, where the sample size and feature dimension increase at a fixed ratio. First, we show a sharp bias-variance decomposition of hard parameter sharing, given multiple tasks with the same features. Second, we characterize the asymptotic bias-variance limit for two tasks, even when they have arbitrarily different sample size ratios and covariate shifts. We also demonstrate that these limiting estimates for the empirical loss are incredibly accurate in moderate dimensions. Finally, we explain an intriguing phenomenon where increasing one task's sample size helps another task initially by reducing variance but hurts eventually due to increasing bias. This suggests progressively adding data for optimizing hard parameter sharing, and we validate its efficiency in text classification tasks.

artificial intelligence, equation, machine learning, (17 more...)

arXiv.org Machine Learning

2010.1175

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.46)
Semiconductors & Electronics (0.45)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

A Brief Review of Deep Multi-task Learning and Auxiliary Task Learning

Vafaeikia, Partoo, Namdar, Khashayar, Khalvati, Farzad

arXiv.org Machine LearningJul-2-2020

Multi-task learning (MTL) is broadly used across various applications of machine learning and has several advantages in comparison with the single-task learning. Since layers are shared between different tasks and features are not repeatedly calculated for each task, the amount of memory used is reduced and the inference speed is improved. In addition, if tasks share complimentary information, they act as regularizers for each other which results in the improvement of the prediction performance of each task [1]. This has been proven in various areas such as detection and classification [2], computer vision [3, 4], depth estimation [5], natural language processing [6-8] and drug discovery [9]. The goal of this review paper is to provide an overview of various deep multi-task learning (dMTL) solutions and possible improvements in performance through efficient auxiliary tasks selection.

architecture, auxiliary task, multi-task learning, (16 more...)

arXiv.org Machine Learning

2007.01126

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Overview (0.90)

Industry: Health & Medicine (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning what to share between loosely related tasks

Ruder, Sebastian, Bingel, Joachim, Augenstein, Isabelle, Søgaard, Anders

arXiv.org Artificial IntelligenceJan-16-2018

Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice.

artificial intelligence, machine learning, proceedings, (17 more...)

arXiv.org Artificial Intelligence

1705.08142

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback