ney
Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect
Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti
Estimation of the Average Treatment Effect (ATE) is a core problem in causal inference with strong connections to Off-Policy Evaluation in Reinforcement Learning. This paper considers the problem of adaptively selecting the treatment allocation probability in order to improve estimation of the ATE. The majority of prior work on adaptive ATE estimation focus on asymptotic guarantees, and in turn overlooks important practical considerations such as the difficulty of learning the optimal treatment allocation as well as hyper-parameter selection. Existing non-asymptotic methods are limited by poor empirical performance and exponential scaling of the Neyman regret with respect to problem parameters. In order to address these gaps, we propose and analyze the Clipped Second Moment Tracking (ClipSMT) algorithm, a variant of an existing algorithm with strong asymptotic optimality guarantees, and provide finite sample bounds on its Neyman regret. Our analysis shows that ClipSMT achieves exponential improvements in Neyman regret on two fronts: improving the dependence on $T$ from $O(\sqrt{T})$ to $O(\log T)$, as well as reducing the exponential dependence on problem parameters to a polynomial dependence. Finally, we conclude with simulations which show the marked improvement of ClipSMT over existing approaches.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
Lüscher, Christoph, Zeineldeen, Mohammad, Yang, Zijian, Raissi, Tina, Vieting, Peter, Le-Duc, Khai, Wang, Weiyue, Schlüter, Ralf, Ney, Hermann
Language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietnamese-speaking patient. Currently, a doctor can call the Triaphon service to get assistance from an interpreter in order to help facilitate communication. The HYKIST goal is to support the usually non-professional bilingual interpreter with an automatic speech translation system to improve patient care and help overcome language barriers. In this work, we present our ASR system development efforts for this conversational telephone speech translation task in the medical domain for two languages pairs, data collection, various acoustic model architectures and dialect-induced difficulties.
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition
Zhou, Wei, Beck, Eugen, Berger, Simon, Schlüter, Ralf, Ney, Hermann
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- North America > United States > Georgia > Chatham County > Savannah (0.04)
- Europe > Austria > Styria > Graz (0.04)
- (2 more...)
Conformer-based Hybrid ASR System for Switchboard Dataset
Zeineldeen, Mohammad, Xu, Jingjing, Lüscher, Christoph, Michel, Wilfried, Gerstenberger, Alexander, Schlüter, Ralf, Ney, Hermann
The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe. We study different training aspects and methods to improve word-error-rate as well as to increase training speed. We apply time downsampling methods for efficient training and use transposed convolutions to upsample the output sequence again. We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results compared to other architectures. It generalizes very well on Hub5'01 test set and outperforms the BLSTM-based hybrid model significantly.
A study of latent monotonic attention variants
Zeyer, Albert, Schlüter, Ralf, Ney, Hermann
End-to-end models reach state-of-the-art performance for speech recognition, but global soft attention is not monotonic, which might lead to convergence problems, to instability, to bad generalisation, cannot be used for online streaming, and is also inefficient in calculation. Monotonicity can potentially fix all of this. There are several ad-hoc solutions or heuristics to introduce monotonicity, but a principled introduction is rarely found in literature so far. In this paper, we present a mathematically clean solution to introduce monotonicity, by introducing a new latent variable which represents the audio position or segment boundaries. We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model. We can show that our monotonic models perform as good as the global soft attention model. We perform our experiments on Switchboard 300h. We carefully outline the details of our training and release our code and configs.
- Asia (0.68)
- Europe > Germany (0.28)
- North America > United States (0.28)
- North America > Canada (0.28)
Challenges of DoD's Ethical Principles for AI
In February of this year, the Department of Defense (DoD) issued five Ethical Principles for Artificial Intelligence (AI): Responsible, Equitable, Traceable, Reliable and Governable. The DoD principles build off recommendations from 2019 by the Defense Innovation Board and the interim report of the National Security Commission on AI (NSCAI). The defense industry and others in the private sector have also been considering ethical issues regarding AI, including the issue of whether businesses should have an AI code of ethics. When cyber first became an issue about 22-years ago, the trend was to raise awareness and think through the consequences. Similarly, now we are developing awareness of the issues and beginning to think through the consequences of AI.
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)