Goto

Collaborating Authors

 cornish


How I Built ASR for Endangered Languages with a Spoken Dictionary

Bartley, Christopher, Ragni, Anton

arXiv.org Artificial Intelligence

Nearly half of the world's languages are endangered. Speech technologies such as Automatic Speech Recognition (ASR) are central to revival efforts, yet most languages remain unsupported because standard pipelines expect utterance-level supervised data. Speech data often exist for endangered languages but rarely match these formats. Manx Gaelic ($\sim$2,200 speakers), for example, has had transcribed speech since 1948, yet remains unsupported by modern systems. In this paper, we explore how little data, and in what form, is needed to build ASR for critically endangered languages. We show that a short-form pronunciation resource is a viable alternative, and that 40 minutes of such data produces usable ASR for Manx ($<$50\% WER). We replicate our approach, applying it to Cornish ($\sim$600 speakers), another critically endangered language. Results show that the barrier to entry, in quantity and form, is far lower than previously thought, giving hope to endangered language communities that cannot afford to meet the requirements arbitrarily imposed upon them.


Neural Network Symmetrisation in Concrete Settings

Cornish, Rob

arXiv.org Artificial Intelligence

Cornish (2024) recently gave a general theory of neural network symmetrisation in the abstract context of Markov categories. We give a high-level overview of these results, and their concrete implications for the symmetrisation of deterministic functions and of Markov kernels.


SymDiff: Equivariant Diffusion via Stochastic Symmetrisation

Zhang, Leo, Ashouritaklimi, Kianoosh, Teh, Yee Whye, Cornish, Rob

arXiv.org Machine Learning

We propose SymDiff, a novel method for constructing equivariant diffusion models using the recently introduced framework of stochastic symmetrisation. SymDiff resembles a learned data augmentation that is deployed at sampling time, and is lightweight, computationally efficient, and easy to implement on top of arbitrary off-the-shelf models. Notably, in contrast to previous work, SymDiff typically does not require any neural network components that are intrinsically equivariant, avoiding the need for complex parameterizations and the use of higher-order geometric features. Instead, our method can leverage highly scalable modern architectures as drop-in replacements for these more constrained alternatives. We show that this additional flexibility yields significant empirical benefit on $\mathrm{E}(3)$-equivariant molecular generation. To the best of our knowledge, this is the first application of symmetrisation to generative modelling, suggesting its potential in this domain more generally.


Can companies police the biases found in artificial intelligence?

#artificialintelligence

Artificial intelligence has seeped into almost every corner of our lives, including how people are hired for work. AI is used to screen and evaluate applicants, but there's a problem with that. Research has shown that AI can produce biased results, especially against women and minorities. That's something that Kenneth Chenault, chairman and managing director at the venture capital firm General Catalyst, is trying to address with his Data and Trust Alliance. Chenault is the co-chair of the organization.


ICE Turned To DMV Driver's License Databases For Help With Facial Recognition

NPR Technology

Now we're going to look more broadly at what's been revealed today about ICE turning to DMV offices for help with facial recognition - that is, using driver's license photographs and algorithms to identify people suspected of being in the country illegally. Now, this collaboration was unearthed by a team at Georgetown University, and here to brief us is NPR's Aarti Shahani. CORNISH: I understand that in the past, ICE has gone to DMV offices and just asked for records on immigrants. We just heard about the case in Vermont that alleges that much. What exactly is new here?


Microsoft President Brad Smith Discusses The Ethics Of Artificial Intelligence

#artificialintelligence

Just because we can use it, should we? That's the question more and more people are asking about face recognition technology, software that's already in our phones and our social media feeds and many security systems. San Francisco leaders have voted to ban the police from using it, and even some in the tech industry say there should be limits. BRAD SMITH: It's the kind of technology that can do a lot of good for a lot of people, but it can be misused. It can be used in ways that lead to discrimination and bias.


Want To Know How Far Artificial Intelligence Has Come? Just Look At CAPTCHA

NPR Technology

We're going to look now at the state of artificial intelligence this month in All Tech Considered. You've probably seen that statement online alongside a prompt that says something like, type the letters you see, or, click on all the stoplights. Do it right, and you get to go on to the next page. These games are developed by Google. Researcher Jason Polakis of the University of Illinois at Chicago has proven that, in fact, robots are pretty good at CAPTCHAs.


Scholars Delve Deeper Into The Ethics Of Artificial Intelligence

#artificialintelligence

In 1941, science-fiction writer Isaac Asimov stated "The Three Laws of Robotics," in his short story "Runaround." Law One: A robot may not injure a human being or, through inaction, allow a human being to come to harm. Law Two: A robot must obey orders given it by human beings except where such orders would conflict with the First Law. Law Three: A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These laws come from the world of science fiction, but the real world is catching up.


On the Opportunities and Pitfalls of Nesting Monte Carlo Estimators

Rainforth, Tom, Cornish, Robert, Yang, Hongseok, Warrington, Andrew, Wood, Frank

arXiv.org Machine Learning

We present a formalization of nested Monte Carlo (NMC) estimation, whereby terms in an outer estimator themselves involve calculation of separate, nested, Monte Carlo (MC) estimators. We demonstrate that, under mild conditions, NMC can provide consistent estimates of nested expectations, including cases involving arbitrary levels of nesting; establish corresponding rates of convergence; and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naïve nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. Finally, we use one of these reformulations to derive a new estimator for use in discrete Bayesian experimental design problems which has a better convergence rate than existing methods. Our results have implications for a wide range of fields from probabilistic programming to deep generative models and serve both as an invitation for further inquiry and a caveat against careless use.


Rise Of Artificial Intelligence Met With Mixed Reaction At SXSW

NPR Technology

We head to Austin now for the annual South by Southwest Conference in this week's All Tech Considered. CORNISH: Now, South by Southwest is known for the music, but running alongside the shows are panels that bring leaders across industries together to discuss what's cutting edge. And one emerging technology being talked about a lot is artificial intelligence. For more on that, NPR's Laura Sydell joins us from Austin. CORNISH: To begin, obviously, people are talking about AI across the tech industry.