Collaborating Authors


Efficient Deep Learning: From Theory to Practice


Modern machine learning often relies on deep neural networks that are prohibitively expensive in terms of the memory and computational footprint. This in turn significantly inhibits the potential range of applications where we are faced with non-negligible resource constraints, e.g., real-time data processing, embedded devices, and robotics. In this thesis, we develop theoretically-grounded algorithms to reduce the size and inference cost of modern, large-scale neural networks. By taking a theoretical approach from first principles, we intend to understand and analytically describe the performance-size trade-offs of deep networks, i.e., the generalization properties. We then leverage such insights to devise practical algorithms for obtaining more efficient neural networks via pruning or compression. Beyond theoretical aspects and the inference time efficiency of neural networks, we study how compression can yield novel insights into the design and training of neural networks. We investigate the practical aspects of the generalization properties of pruned neural networks beyond simple metrics such as test accuracy. Finally, we show how in certain applications pruning neural networks can improve the training and hence the generalization performance.

Resolution of the Burrows-Wheeler Transform Conjecture

Communications of the ACM

The Burrows-Wheeler Transform (BWT) is an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the main component of popular lossless compression programs (such as bzip2) as well as recent powerful compressed indexes (such as the r-index7), central in modern bioinformatics. The compressibility of BWT is quantified by the number r of equal-letter runs in the output. Despite the practical significance of BWT, no nontrivial upper bound on r is known. By contrast, the sizes of nearly all other known compression methods have been shown to be either always within a poly-log n factor (where n is the length of the text) from z, the size of Lempel–Ziv (LZ77) parsing of the text, or much larger in the worst case (by an nε factor for ε 0). In this paper, we show that r (z log2 n) holds for every text. This result has numerous implications for text indexing and data compression; in particular: (1) it proves that many results related to BWT automatically apply to methods based on LZ77, for example, it is possible to obtain functionality of the suffix tree in (z polylog n) space; (2) it shows that many text processing tasks can be solved in the optimal time assuming the text is compressible using LZ77 by a sufficiently large polylog n factor; and (3) it implies the first nontrivial relation between the number of runs in the BWT of the text and of its reverse. In addition, we provide an (z polylog n)-time algorithm converting the LZ77 parsing into the run-length compressed BWT. To achieve this, we develop several new data structures and techniques of independent interest. In particular, we define compressed string synchronizing sets (generalizing the recently introduced powerful technique of string synchronizing sets11) and show how to efficiently construct them. Next, we propose a new variant of wavelet trees for sequences of long strings, establish a nontrivial bound on their size, and describe efficient construction algorithms. Finally, we develop new indexes that can be constructed directly from the LZ77 parsing and efficiently support pattern matching queries on text substrings. Lossless data compression aims to exploit redundancy in the input data to represent it in a small space.

Sony WH-1000XM5 review: In a league of their own


The rumors were (mostly) true. Sony did indeed have a follow-up to its stellar WH-1000XM4 ready for a proper debut. Today the company announced the WH-1000XM5 ($400), its latest flagship noise-canceling headphones equipped with all of the things we've come to expect from Sony's 1000X line. This time around the company gave its premium cans a big exterior redesign. In the process, it massively increased comfort while also expanding the incredible performance in terms of noise cancelation and overall sound quality.

Qualcomm Touts Eight AI "Firsts"


Whenever people take photos or speak to a digital assistant using a mobile phone, they often don't realize that they just took advantage of Artificial Intelligence (AI). If they think of AI at all, it is typically in the context of Autonomous Vehicles or perhaps Facebook's (Meta's) massive data centers. While AI is becoming ubiquitous and distributed across edge devices and cloud servers, many challenges remain to realize the connected intelligent edge vision CEO Cristiano Amon has for AI to enable automated perception, reasoning, and action. For AI to enable the levels of automation and personalization Qualcomm AI Research VP Jilei Hou believes that AI hardware and software must become much smaller, faster, more efficient, lower power, and able to learn continuously at the edge in the real world. This provides the perfect complement to remote processing in the cloud, whose reach has been further advanced through Qualcomm's 5G technology.

How AI could help enterprises to reduce data storage costs


We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. The amount of data managed by the world's enterprises is growing. According to one source, the total amount of data created, captured, copied and consumed globally was about 64.2 zettabytes in 2020 -- equal to a trillion gigabytes. Unsurprisingly, companies report that the cost of storing their data is also climbing. In a 2018 Enterprise Storage Forum survey, business leaders said that the high costs of operation, a lack of storage capacity, and aging equipment were among their top concerns.

Fidelity vs. Realism in Deepfake Videos


Not all deepfake practitioners share the same objective: the impetus of the image synthesis research sector – backed by influential proponents such as Adobe, NVIDIA and Facebook – is to advance the state of the art so that machine learning techniques can eventually recreate or synthesize human activity at high resolution and under the most challenging conditions (fidelity). By contrast, the objective of those who wish to use deepfake technologies to spread disinformation is to create plausible simulations of real people by many other methods than the mere veracity of deepfaked faces. In this scenario, adjunct factors such as context and plausibility are almost equal to a video's potential to simulate faces (realism). This'sleight-of-hand' approach extends to the degradation of final image quality of a deepfake video, so that the entire video (and not just the deceptive portion represented by a deepfaked face) has a cohesive'look' that's accurate to the expected quality for the medium. 'Cohesive' doesn't have to mean'good' – it's enough that the quality is consistent across the original and the inserted, adulterated content, and adheres to expectations.

Machine Learning Reimagines the Building Blocks of Computing


Like tiny gears inside a watch, algorithms execute well-defined tasks within more complicated programs. They're ubiquitous, and in part because of this, they've been painstakingly optimized over time. When a programmer needs to sort a list, for example, they'll reach for a standard "sort" algorithm that's been used for decades. Now researchers are taking a fresh look at traditional algorithms, using the branch of artificial intelligence known as machine learning. Their approach, called algorithms with predictions, takes advantage of the insights machine learning tools can provide into the data that traditional algorithms handle.

Learnable Nonlinear Compression for Robust Speaker Verification Artificial Intelligence

In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.

NeurIPS 2021 – 10 Papers You Shouldn't Miss


Authors' TL;DR We use self-supervised play to train artificial agents to communicate by drawing and then show that with the appropriate inductive bias a human can successfully play the same games with the pretrained drawing agent.

PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems Artificial Intelligence

In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems. In the absence of channel reciprocity in frequency division duplex (FDD) systems, the user needs to send the CSI to the BS. Often the large overhead associated with this CSI feedback in FDD systems becomes the bottleneck in improving the system performance. In this paper, we propose an AI-based CSI feedback based on an auto-encoder architecture that encodes the CSI at UE into a low-dimensional latent space and decodes it back at the BS by effectively reducing the feedback overhead while minimizing the loss during recovery. Our simulation results show that the AI-based proposed architecture outperforms the state-of-the-art high-resolution linear combination codebook using the DFT basis adopted in the 5G New Radio (NR) system.