Collaborating Authors


Google Cloud's New TPU v4 ML Hub Packs 9 Exaflops of AI


Almost exactly a year ago, Google launched its Tensor Processing Unit (TPU) v4 chips at Google I/O 2021, promising twice the performance compared to the TPU v3. At the time, Google CEO Sundar Pichai said that Google's datacenters would "soon have dozens of TPU v4 Pods, many of which will be operating at or near 90 percent carbon-free energy." Now, at Google I/O 2022, Pichai revealed the blue-ribbon fruit of those labors: a TPU v4-powered datacenter in Mayes County, Oklahoma, that Google says is the world's largest publicly available machine learning hub. "This machine learning hub has eight Cloud TPU v4 Pods, custom-built on the same networking infrastructure that powers Google's largest neural models," Pichai said. Google's TPU v4 Pods consist of 4,096 TPU v4 chips, each of which delivers 275 teraflops of ML-targeted bfloat16 ("brain floating point") performance.

Google unveils the world's largest publicly available machine learning hub


Google I/O 2022, Google's largest developer conference, kicked off with a keynote speech from Alphabet CEO Sundar Pichai. The keynote speech had major announcements including the launch of Pixel watch, updates on PaLM and LaMDA, advancements in AR and immersive technology etc. Let us look at the key highlights. "Recently we announced plans to invest USD 9.5 billion in data centers and offices across the US. One of our state-of-the-art data centers is in Mayes County, Oklahoma. I'm excited to announce that, there, we are launching the world's largest, publicly-available machine learning hub for our Google Cloud customers," Sundar Pichai said.

Introducing Voice Search Experience at


Communication is a natural part of our everyday lives. People interact using voice and text, forming sentences to express what they desire. And yet, most of the search and discovery patterns out there rely on menu items and filter facets. Building on our mission at "Making it easier for everyone to experience the world", the ML & AI Product teams based in Tel Aviv decided to challenge the conventional search patterns by allowing the most natural way for everyone to communicate: using their voice. This is the story of how we built a native in-app voice assistant at, and as far as I know, the first voice search available today by a global online travel company.

Virtual Adversarial Training for Semi-supervised Breast Mass Classification Artificial Intelligence

This study aims to develop a novel computer-aided diagnosis (CAD) scheme for mammographic breast mass classification using semi-supervised learning. Although supervised deep learning has achieved huge success across various medical image analysis tasks, its success relies on large amounts of high-quality annotations, which can be challenging to acquire in practice. To overcome this limitation, we propose employing a semi-supervised method, i.e., virtual adversarial training (VAT), to leverage and learn useful information underlying in unlabeled data for better classification of breast masses. Accordingly, our VAT-based models have two types of losses, namely supervised and virtual adversarial losses. The former loss acts as in supervised classification, while the latter loss aims at enhancing model robustness against virtual adversarial perturbation, thus improving model generalizability. To evaluate the performance of our VAT-based CAD scheme, we retrospectively assembled a total of 1024 breast mass images, with equal number of benign and malignant masses. A large CNN and a small CNN were used in this investigation, and both were trained with and without the adversarial loss. When the labeled ratios were 40% and 80%, VAT-based CNNs delivered the highest classification accuracy of 0.740 and 0.760, respectively. The experimental results suggest that the VAT-based CAD scheme can effectively utilize meaningful knowledge from unlabeled data to better classify mammographic breast mass images.

Machine-learned, light-field camera detects 3D facial expressions – News Medical


The facial expressions in the acquired 3D images were distinguished through machine learning with an average of 85% accuracy – a statistically …

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph Artificial Intelligence

Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with directed coordination graphs, and propose a distributed RL algorithm where the local policy evaluations are based on local value functions. The local value function of each agent is obtained by local communication with its neighbors through a directed learning-induced communication graph, without using any consensus algorithm. A zeroth-order optimization (ZOO) approach based on parameter perturbation is employed to achieve gradient estimation. By comparing with existing ZOO-based RL algorithms, we show that our proposed distributed RL algorithm guarantees high scalability. A distributed resource allocation example is shown to illustrate the effectiveness of our algorithm.

LoMar: A Local Defense Against Poisoning Attack on Federated Learning Artificial Intelligence

Federated learning (FL) provides a high efficient decentralized machine learning framework, where the training data remains distributed at remote clients in a network. Though FL enables a privacy-preserving mobile edge computing framework using IoT devices, recent studies have shown that this approach is susceptible to poisoning attacks from the side of remote clients. To address the poisoning attacks on FL, we provide a \textit{two-phase} defense algorithm called {Lo}cal {Ma}licious Facto{r} (LoMar). In phase I, LoMar scores model updates from each remote client by measuring the relative distribution over their neighbors using a kernel density estimation method. In phase II, an optimal threshold is approximated to distinguish malicious and clean updates from a statistical perspective. Comprehensive experiments on four real-world datasets have been conducted, and the experimental results show that our defense strategy can effectively protect the FL system. {Specifically, the defense performance on Amazon dataset under a label-flipping attack indicates that, compared with FG+Krum, LoMar increases the target label testing accuracy from $96.0\%$ to $98.8\%$, and the overall averaged testing accuracy from $90.1\%$ to $97.0\%$.

TempAMLSI : Temporal Action Model Learning based on Grammar Induction Artificial Intelligence

Hand-encoding PDDL domains is generally accepted as difficult, tedious and error-prone. The difficulty is even greater when temporal domains have to be encoded. Indeed, actions have a duration and their effects are not instantaneous. In this paper, we present TempAMLSI, an algorithm based on the AMLSI approach able to learn temporal domains. TempAMLSI is based on the classical assumption done in temporal planning that it is possible to convert a non-temporal domain into a temporal domain. TempAMLSI is the first approach able to learn temporal domain with single hard envelope and Cushing's intervals. We show experimentally that TempAMLSI is able to learn accurate temporal domains, i.e., temporal domain that can be used directly to solve new planning problem, with different forms of action concurrency.

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations Artificial Intelligence

A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel scene. In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass. To calculate the scene representation, we propose a generalization of the Vision Transformer to sets of images, enabling global information integration, and hence 3D reasoning. An efficient decoder transformer parameterizes the light field by attending into the scene representation to render novel views. Learning is supervised end-to-end by minimizing a novel-view reconstruction error. We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets, including a new dataset created for the paper. Further, we demonstrate that SRT scales to support interactive visualization and semantic segmentation of real-world outdoor environments using Street View imagery.

Lensless multicore-fiber microendoscope for real-time tailored light field generation with phase encoder neural network (CoreNet) Artificial Intelligence

The generation of tailored light with multi-core fiber (MCF) lensless microendoscopes is widely used in biomedicine. However, the computer-generated holograms (CGHs) used for such applications are typically generated by iterative algorithms, which demand high computation effort, limiting advanced applications like in vivo optogenetic stimulation and fiber-optic cell manipulation. The random and discrete distribution of the fiber cores induces strong spatial aliasing to the CGHs, hence, an approach that can rapidly generate tailored CGHs for MCFs is highly demanded. We demonstrate a novel phase encoder deep neural network (CoreNet), which can generate accurate tailored CGHs for MCFs at a near video-rate. Simulations show that CoreNet can speed up the computation time by two magnitudes and increase the fidelity of the generated light field compared to the conventional CGH techniques. For the first time, real-time generated tailored CGHs are on-the-fly loaded to the phase-only SLM for dynamic light fields generation through the MCF microendoscope in experiments. This paves the avenue for real-time cell rotation and several further applications that require real-time high-fidelity light delivery in biomedicine.