Calgary
Rats Are Invasive Menaces. These Cameras Spy on Them
Off the coast of Southern California, amid a literal sea of troubles--warming waters, microplastic pollution, overfishing--is a 96-square-mile conservation success story. Santa Cruz Island once teemed with feral pigs and invasive Argentine ants until the Nature Conservancy unleashed a coordinated campaign of eradication. That's allowed the adorable island fox to bounce back from the brink of extinction. The battle was won, but the war wasn't over, because the Nature Conservancy now has to defend that territory from yet another invader: rats. The scourge of islands everywhere, rats get ashore and breed like crazy, devouring just about everything in their paths--native plant seeds, bird and reptile eggs, local people's crops.
Embodied, Situated, and Grounded Intelligence: Implications for AI
Millhouse, Tyler, Moses, Melanie, Mitchell, Melanie
In April of 2022, the Santa Fe Institute hosted a workshop on embodied, situated, and grounded intelligence as part of the Institute's Foundations of Intelligence project. The workshop brought together computer scientists, psychologists, philosophers, social scientists, and others to discuss the science of embodiment and related issues in human intelligence, and its implications for building robust, human-level AI. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Wang, Chen, Liu, Yuchen, Chen, Boxing, Zhang, Jiajun, Luo, Wei, Huang, Zhongqiang, Zong, Chengqing
End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more accessible, making zero-shot speech translation a potential direction. Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods. In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. Specifically, we introduce a vector quantization module to discretize the continuous representations of speech and text into a finite set of virtual tokens, and use ASR data to map corresponding speech and text to the same virtual token in a shared codebook. This way, source language speech can be embedded in the same semantic space as the source language text, which can be then transformed into target language text with an MT module. Experiments on multiple language pairs demonstrate that our zero-shot ST method significantly improves the SOTA, and even performers on par with the strong supervised ST baselines.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation
Bengs, Viktor, Hüllermeier, Eyke, Waegeman, Willem
Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.
The COVID That Wasn't: Counterfactual Journalism Using GPT
In this paper, we explore the use of large language models to assess human interpretations of real world events. To do so, we use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19 given the headlines of actual articles written during the pandemic. We then compare stylistic qualities of our artificially generated corpus with a news corpus, in this case 5,082 articles produced by CBC News between January 23 and May 5, 2020. We find our artificially generated articles exhibits a considerably more negative attitude towards COVID and a significantly lower reliance on geopolitical framing. Our methods and results hold importance for researchers seeking to simulate large scale cultural processes via recent breakthroughs in text generation.
Indoor Localization with Robust Global Channel Charting: A Time-Distance-Based Approach
Stahlke, Maximilian, Yammine, George, Feigl, Tobias, Eskofier, Bjoern M., Mutschler, Christopher
Fingerprinting-based positioning significantly improves the indoor localization performance in non-line-of-sight-dominated areas. However, its deployment and maintenance is cost-intensive as it needs ground-truth reference systems for both the initial training and the adaption to environmental changes. In contrast, channel charting (CC) works without explicit reference information and only requires the spatial correlations of channel state information (CSI). While CC has shown promising results in modelling the geometry of the radio environment, a deeper insight into CC for localization using multi-anchor large-bandwidth measurements is still pending. We contribute a novel distance metric for time-synchronized single-input/single-output CSIs that approaches a linear correlation to the Euclidean distance. This allows to learn the environment's global geometry without annotations. To efficiently optimize the global channel chart we approximate the metric with a Siamese neural network. This enables full CC-assisted fingerprinting and positioning only using a linear transformation from the chart to the real-world coordinates. We compare our approach to the state-of-the-art of CC on two different real-world data sets recorded with a 5G and UWB radio setup. Our approach outperforms others with localization accuracies of 0.69m for the UWB and 1.4m for the 5G setup. We show that CC-assisted fingerprinting enables highly accurate localization and reduces (or eliminates) the need for annotated training data.
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Triantafyllopoulos, Andreas, Schuller, Björn W., İymen, Gökçe, Sezgin, Metin, He, Xiangheng, Yang, Zijiang, Tzirakis, Panagiotis, Liu, Shuo, Mertes, Silvan, André, Elisabeth, Fu, Ruibo, Tao, Jianhua
Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.
Differentially Private Speaker Anonymization
Shamsabadi, Ali Shahin, Srivastava, Brij Mohan Lal, Bellet, Aurélien, Vauquier, Nathalie, Vincent, Emmanuel, Maouche, Mohamed, Tommasi, Marc, Papernot, Nicolas
Sharing real-world speech utterances is key to the training and deployment of voice-based services. However, it also raises privacy risks as speech contains a wealth of personal data. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. State-of-the-art techniques operate by disentangling the speaker information (represented via a speaker embedding) from these attributes and re-synthesizing speech based on the speaker embedding of another speaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this work, we show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers. We plug these extractors in the state-of-the-art anonymization pipeline and generate, for the first time, private speech utterances with a provable upper bound on the speaker information they contain. We evaluate empirically the privacy and utility resulting from our differentially private speaker anonymization approach on the LibriSpeech data set. Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.
Global AI firm, Sidetrade, Chooses Calgary for North America Expansion
Global AI-powered Order-to-Cash platform, Sidetrade, announced an acceleration to its North America offensive strategy with plans to invest $24 million and add 110 full-time jobs in Calgary over the next three years. Just one year since the launch of its North America operations, Sidetrade has exceeded expectations with 58% of its new bookings now from the North America market. The SaaS provider has been recognized by Gartner as one of just three Leaders in the 2022 Magic Quadrant for Invoice to Cash applications. Sidetrade is now accelerating its expansion into North America by investing $24 million in the next three years and hiring in the region. Brad Parry, President and CEO of Calgary Economic Development, said: "Sidetrade's expansion in Calgary as its North American headquarters speaks to the city's leading business environment and the exciting momentum in our tech and innovation ecosystem. Alberta and Calgary are centres for AI excellence with highly skilled talent, and as a global leader in AI, Sidetrade joins a growing roster of multinational companies that call Calgary home, where bright minds with big ideas are solving global challenges."
Sketched Reality: Sketching Bi-Directional Interactions Between Virtual and Physical Worlds with AR and Actuated Tangible UI
Kaimoto, Hiroki, Monteiro, Kyzyl, Faridan, Mehrad, Li, Jiatong, Farajian, Samin, Kakehi, Yasuaki, Nakagaki, Ken, Suzuki, Ryo
This paper introduces Sketched Reality, an approach that combines AR sketching and actuated tangible user interfaces (TUI) for bidirectional sketching interaction. Bi-directional sketching enables virtual sketches and physical objects to "affect" each other through physical actuation and digital computation. In the existing AR sketching, the relationship between virtual and physical worlds is only one-directional -- while physical interaction can affect virtual sketches, virtual sketches have no return effect on the physical objects or environment. In contrast, bi-directional sketching interaction allows the seamless coupling between sketches and actuated TUIs. In this paper, we employ tabletop-size small robots (Sony Toio) and an iPad-based AR sketching tool to demonstrate the concept. In our system, virtual sketches drawn and simulated on an iPad (e.g., lines, walls, pendulums, and springs) can move, actuate, collide, and constrain physical Toio robots, as if virtual sketches and the physical objects exist in the same space through seamless coupling between AR and robot motion. This paper contributes a set of novel interactions and a design space of bi-directional AR sketching. We demonstrate a series of potential applications, such as tangible physics education, explorable mechanism, tangible gaming for children, and in-situ robot programming via sketching.