South America
Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
Peng, Rui, Wang, Ronggang, Lai, Yawen, Tang, Luyang, Cai, Yangang
Self-supervised methods play an increasingly important role in monocular depth estimation due to their great potential and low annotation cost. To close the gap with supervised methods, recent works take advantage of extra constraints, e.g., semantic segmentation. However, these methods will inevitably increase the burden on the model. In this paper, we show theoretical and empirical evidence that the potential capacity of self-supervised monocular depth estimation can be excavated without increasing this cost. In particular, we propose (1) a novel data augmentation approach called data grafting, which forces the model to explore more cues to infer depth besides the vertical image position, (2) an exploratory self-distillation loss, which is supervised by the self-distillation label generated by our new post-processing method - selective post-processing, and (3) the full-scale network, designed to endow the encoder with the specialization of depth estimation task and enhance the representational power of the model. Extensive experiments show that our contributions can bring significant performance improvement to the baseline with even less computational overhead, and our model, named EPCDepth, surpasses the previous state-of-the-art methods even those supervised by additional constraints.
First-person footage from PS5 racing game 'looks identical to real life'
First-person footage from Ride 4, the new racing game for PlayStation 5 and Xbox, is circulating – and it looks almost identical to real life. Ride 4 from Italian video game developer Milestone based in Milan brings to life 34 racing tracks from around the world, including Donington and Snetterton in the UK, which were all meticulously digitally recreated with the aid of laser scanning. The gameplay features little touches such as overcast lighting and rain on the tracks, in an incredibly convincing recreation of the British weather. Gamers can also choose from more than 250 bikes from 22 official manufacturers, such as Honda, Suzuki and Yamaha. Ride 4 is now available on PlayStation5 and Xbox Series X, as well as online gaming platform Steam via a PC.
Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones
McNeely, Trey, Vincent, Galen, Izbicki, Rafael, Wood, Kimberly M., Lee, Ann B.
Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited for prediction problems with complex sequence data, extracting interpretable scientific information with such methods is difficult. Here we leverage powerful AI prediction algorithms and classical statistical inference to identify patterns in the evolution of TC convective structure leading up to the rapid intensification of a storm, hence providing forecasters and scientists with key insight into TC behavior.
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
Eyzaguirre, Cristóbal, del Río, Felipe, Araujo, Vladimir, Soto, Álvaro
Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT's regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.
Ideas
A woman living in Kenya's Dadaab, which is among the world's largest refugee camps, wanders across the vast, dusty site to a central hut lined with computers. Like many others who have been brutally displaced and then warehoused at the margins of our global system, her days are spent toiling away for a new capitalist vanguard thousands of miles away in Silicon Valley. A day's work might include labelling videos, transcribing audio, or showing algorithms how to identify various photos of cats. Amid a drought of real employment, "clickwork" represents one of few formal options for Dadaab's residents, though the work is volatile, arduous, and, when waged, paid by the piece. Cramped and airless workspaces, festooned with a jumble of cables and loose wires, are the antithesis to the near-celestial campuses where the new masters of the universe reside.
Named Entity Recognition and Classification on Historical Documents: A Survey
Ehrmann, Maud, Hamdi, Ahmed, Pontes, Elvys Linhares, Romanello, Matteo, Doucet, Antoine
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.
Theory of overparametrization in quantum neural networks
Larocca, Martin, Ju, Nathan, García-Martín, Diego, Coles, Patrick J., Cerezo, M.
The prospect of achieving quantum advantage with Quantum Neural Networks (QNNs) is exciting. Understanding how QNN properties (e.g., the number of parameters $M$) affect the loss landscape is crucial to the design of scalable QNN architectures. Here, we rigorously analyze the overparametrization phenomenon in QNNs with periodic structure. We define overparametrization as the regime where the QNN has more than a critical number of parameters $M_c$ that allows it to explore all relevant directions in state space. Our main results show that the dimension of the Lie algebra obtained from the generators of the QNN is an upper bound for $M_c$, and for the maximal rank that the quantum Fisher information and Hessian matrices can reach. Underparametrized QNNs have spurious local minima in the loss landscape that start disappearing when $M\geq M_c$. Thus, the overparametrization onset corresponds to a computational phase transition where the QNN trainability is greatly improved by a more favorable landscape. We then connect the notion of overparametrization to the QNN capacity, so that when a QNN is overparametrized, its capacity achieves its maximum possible value. We run numerical simulations for eigensolver, compilation, and autoencoding applications to showcase the overparametrization computational phase transition. We note that our results also apply to variational quantum algorithms and quantum optimal control.
Learning the noise fingerprint of quantum devices
Martina, Stefano, Buffoni, Lorenzo, Gherardini, Stefano, Caruso, Filippo
In the quantum technologies context, no quantum device can be considered an isolated (ideal) quantum system. For this reason, the acronym Noisy Intermediate-Scale Quantum (NISQ) technology has been recently introduced [1] to identify the class of early devices in which noise in quantum gates dramatically limits the size of circuits and algorithms that can be reliably performed [2, 3]. As early quantum devices become more widespread, a question that naturally arises is to understand, at the experimental level, whether in a generic quantum device the signature left by inner noise processes exhibits universal features or is characteristic of the specific quantum platform. Moreover, one may wonder to determine if such a noise signature has a time-dependent profile or can be effectively considered stable, in the sense of constant over time, while the device is operating. The answers to these questions are expected to be crucial in defining a proper strategy to mitigate the influence of noise and systematic errors [4-8], possibly going beyond standard quantum sensing techniques [9-14] and overcoming current limitations on probes dimension and resolution [9, 10, 15-18].
FooBaR: Fault Fooling Backdoor Attack on Neural Network Training
Breier, Jakub, Hou, Xiaolu, Ochoa, Martín, Solano, Jesus
Neural network implementations are known to be vulnerable to physical attack vectors such as fault injection attacks. As of now, these attacks were only utilized during the inference phase with the intention to cause a misclassification. In this work, we explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting. In particular, we discuss attacks against ReLU activation functions that make it possible to generate a family of malicious inputs, which are called fooling inputs, to be used at inference time to induce controlled misclassifications. Such malicious inputs are obtained by mathematically solving a system of linear equations that would cause a particular behaviour on the attacked activation functions, similar to the one induced in training through faulting. We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs. We evaluate our approach against multi-layer perceptron networks and convolutional networks on a popular image classification task obtaining high attack success rates (from 60% to 100%) and high classification confidence when as little as 25 neurons are attacked while preserving high accuracy on the originally intended classification task.
Joint speaker diarisation and tracking in switching state-space model
Wong, Jeremy H. M., Gong, Yifan
Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeting. This paper relaxes this assumption, by proposing to explicitly track the movements of speakers while jointly performing diarisation within a unified model. A state-space model is proposed, where the hidden state expresses the identity of the current active speaker and the predicted locations of all speakers. The model is implemented as a particle filter. Experiments on a Microsoft rich meeting transcription task show that the proposed joint location tracking and diarisation approach is able to perform comparably with other methods that use location information.