AITopics | Riley, Michael

Collaborating Authors

Riley, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring and Improving Drafts in Blockwise Parallel Decoding

Kim, Taehyeon, Suresh, Ananda Theertha, Papineni, Kishore, Riley, Michael, Kumar, Sanjiv, Benton, Adrian

arXiv.org Artificial IntelligenceJun-5-2024

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. [38] as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified and conditionally accepted by the autoregressive model. This paper contributes to the understanding and improvement of block drafts in two ways. First, we analyze the token distributions produced by multiple prediction heads. Secondly, we leverage this analysis to develop algorithms to improve BPD inference speed by refining the block drafts using n-gram and neural language models. Experiments demonstrate that refined block drafts yield a +5-21% increase in block efficiency (i.e., the number of accepted tokens from the block draft) across diverse datasets.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.09221

Country: North America > United States > Louisiana (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Liu, Chenguang, Chen, Jianjun, Chen, Yunfei, Payton, Ryan, Riley, Michael, Yang, Shuang-Hua

arXiv.org Artificial IntelligenceDec-16-2023

Perception of the driving environment is critical for collision avoidance and route planning to ensure driving safety. Cooperative perception has been widely studied as an effective approach to addressing the shortcomings of single-vehicle perception. However, the practical limitations of vehicle-to-vehicle (V2V) communications have not been adequately investigated. In particular, current cooperative fusion models rely on supervised models and do not address dynamic performance degradation caused by arbitrary channel impairments. In this paper, a self-supervised adaptive weighting model is proposed for intermediate fusion to mitigate the adverse effects of channel distortion. The performance of cooperative perception is investigated in different system settings. Rician fading and imperfect channel state information (CSI) are also considered. Numerical results demonstrate that the proposed adaptive weighting algorithm significantly outperforms the benchmarks without weighting. Visualization examples validate that the proposed weighting algorithm can flexibly adapt to various channel conditions. Moreover, the adaptive weighting algorithm demonstrates good generalization to untrained channels and test datasets from different domains.

artificial intelligence, machine learning, perception, (18 more...)

arXiv.org Artificial Intelligence

2312.10342

Country:

Asia > China (0.94)
Europe > United Kingdom > England > Durham > Durham (0.14)
Europe > United Kingdom > England > Berkshire > Reading (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.93)
Education (0.68)
Transportation (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.66)

Add feedback

Large-scale Language Model Rescoring on Long-form Data

Chen, Tongzhou, Allauzen, Cyril, Huang, Yinghui, Park, Daniel, Rybach, David, Huang, W. Ronny, Cabrera, Rodrigo, Audhkhasi, Kartik, Ramabhadran, Bhuvana, Moreno, Pedro J., Riley, Michael

arXiv.org Artificial IntelligenceSep-5-2023

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM. Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

large language model, large-scale language model rescoring, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2306.08133

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Add feedback

LAST: Scalable Lattice-Based Speech Modelling in JAX

Wu, Ke, Variani, Ehsan, Bagby, Tom, Riley, Michael

arXiv.org Artificial IntelligenceApr-25-2023

Despite these WFSA algorithms We refer readers to [8, 9] for a comprehensive introduction to being well-known in the literature, new challenges finite state automata and related algorithms. A weighted finite arise from performance characteristics of modern architectures, state automaton (WFSA) A = (Σ, Q, i, f, E) over a semiring and from nuances in automatic differentiation. We describe (K,,, 0, 1) is specified by a finite alphabet Σ, a finite set a suite of generally applicable techniques employed in of states Q, an initial state i Q, a final state f, and a finite LAST to address these challenges, and demonstrate their effectiveness set of arcs E Q (Σ {ǫ}) K Q (ǫ denotes the with benchmarks on TPUv3 and V100 GPU.

artificial intelligence, lattice, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.13134

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Alignment Entropy Regularization

Variani, Ehsan, Wu, Ke, Rybach, David, Allauzen, Cyril, Riley, Michael

arXiv.org Artificial IntelligenceDec-22-2022

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. Furthermore, we evaluate the effect of entropy regularization in encouraging the model to distribute the probability mass only on a smaller subset of allowed alignments. Experiments show that entropy regularization enables a much simpler decoding method without sacrificing word error rate, and provides better time alignment quality.

alignment, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2212.12442

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.98)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.55)

Add feedback

Learning discrete distributions: user vs item-level privacy

Liu, Yuhan, Suresh, Ananda Theertha, Yu, Felix, Kumar, Sanjiv, Riley, Michael

arXiv.org Machine LearningJul-28-2020

Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(m\alpha^2) + k/\sqrt{m}\epsilon\alpha)$ and further show that it is nearly-optimal under certain regimes. Thus the privacy penalty is $\mathcal{O}(\sqrt{m})$ times smaller compared to the standard mechanisms. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.

algorithm, artificial intelligence, inductive learning, (19 more...)

arXiv.org Machine Learning

2007.1366

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback