Goto

Collaborating Authors

 baidu research


Baidu Research: 10 Technology Trends in 2021 - KDnuggets

#artificialintelligence

While global economic and social uncertainties in 2020 caused significant stress, progress in intelligent technologies continued. The digital and intelligent transformation of all industries significantly accelerated, with AI technologies showing great potential in combatting COVID-19 and helping people resume work. Understanding future technology trends may never have been as important as it is today. Baidu Research is releasing our prediction of the 10 technology trends in 2021, hoping that these clear technology signposts will guide us to embrace the new opportunities and embark on new journeys in the age of intelligence. In 2020, COVID-19 drove the integration of AI and emerging technologies like 5G, big data, and IoT.


Baidu Research

#artificialintelligence

With natural language processing widely being adopted across a growing number of use cases from search engines to mobile smart assistants, leading pre-training language models like Baidu's ERNIE (Enhanced Representation through kNowledge IntEgration) are receiving strong attention within the machine learning community due to their advancements. After substantial progress since its release earlier this year, today we are excited to announce that ERNIE has achieved new state-of-the-art performance on GLUE and become the world's first model to score over 90 in terms of the macro-average score (90.1).


Baidu Research

#artificialintelligence

Simulations are also increasingly essential to training autonomous driving systems, but generating high-definition computer graphics within limited budgets remains a difficult challenge. In March, we presented our augmented autonomous driving simulation (AADS) in the journal Science Robotics. Our solution augmented real-world pictures with simulated traffic flow to create photorealistic simulation images and renderings. More specifically, we used LiDAR and cameras to scan street scenes and, from that acquired trajectory data, generated plausible traffic flows for cars and pedestrians and composed them into the background. These composite images can be resynthesized with different viewpoints and sensor models (camera or LiDAR) to simulate different use cases.


Baidu Research

#artificialintelligence

In terms of quantum hardware, the performance of programmable medium-sized noisy quantum devices will be further improved and have the ability of error correction. Quantum algorithms with certain practical value will be able to run on them, and the application of quantum artificial intelligence will be greatly developed. In terms of quantum software, high-quality quantum computing platforms and software will emerge and be deeply integrated with AI and cloud computing technologies. In addition, with the emergence of the quantum computing industry chain, quantum computing will surely garner more attention in more application fields. More and more industry giants have invested in R&D resources for strategic layout, which has the opportunity to bring a new face to the future AI and cloud computing fields.


Baidu Research

#artificialintelligence

However, besides co-occurrence, there is other valuable lexical, syntactic and semantic information in training corpora. For example, named entities, such as names, locations and organizations, could contain conceptual information. Sentence order and proximity between sentences would allow models to learn structure-aware representations. What's more, semantic similarity at the document level or discourse relations among sentences could train the models to learn semantic-aware representations. Hypothetically speaking, would it be possible to further improve performance if the model was trained to constantly learn a larger variety of tasks?


Baidu Research's breast cancer detection algorithm outperforms human pathologists

#artificialintelligence

Baidu Research today announced it has developed a deep learning algorithm that in initial tests outperforms human pathologists in its ability to identify breast cancer metastasis. The convolutional neural net was trained by splitting 400 large images into grids of tens of thousands of smaller images, then randomly selecting 200,000 of those smaller images. The algorithm then performs analysis to classify each of the smaller photos as well as its neighboring cells. A variety of algorithms have been introduced to help pathologists examine images that can be gigabytes in size by cutting them into smaller parts. Baidu Research's algorithm attempts to move this technique forward by mimicking a pathologist's method to examine the area surrounding a breast cancer tumor cell, at once examining individual cells and nearby cells.


Neural Voice Cloning with a Few Samples - Baidu Research

#artificialintelligence

Speaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way. The advantages of speaker encoding include fast cloning time (only a few seconds) and low number of parameters to represent each speaker, making it favorable for low-resource deployment.Besides accurately estimating the speaker embeddings, we observe that speaker encoders learn to map different speakers to embedding space in a meaningful way. For example, different genders or accents from various regions are clustered together. This was created by applying operations in this learned latent space, to convert the gender or region of accent of one speaker.


Deep Speech 3: Even more end-to-end speech recognition - Baidu Research

#artificialintelligence

Accurate speech recognition systems are vital to many businesses, whether they are a virtual assistant taking commands, video reviews that understand user feedback, or improve customer service. However, today's world-class speech recognition systems can only function with user data from third party providers or by recruiting graduates from the world's top speech and language technology programs. At Baidu Research, we have been working on developing a speech recognition system that can be built, debugged, and improved by a team with little to no experience in speech recognition technology (but with a solid understanding of machine learning). We believe a highly simplified speech recognition pipeline should democratize speech recognition research, just like convolutional neural networks revolutionized computer vision. Along this endeavor we developed Deep Speech 1 as a proof-of-concept to show a simple model can be highly competitive with state-of-art models.


Mixed Precision Training - Baidu Research

#artificialintelligence

Figure 2: Mixed precision training for deep learning models. Secondly, we introduce a technique called loss-scaling that allows us to recover some of the small valued gradients. During training, some weight gradients have very small exponents that become zero in FP16 format. To overcome this problem, we scale the loss using a scaling factor at the start of back-propagation. Through the chain-rule, the gradients are also scaled up and become representable in FP16.


PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes - Baidu Research

#artificialintelligence

Two open source communities--PaddlePaddle, the deep learning framework originated in Baidu, and Kubernetes, the most famous containerized application scheduler--are announcing the Elastic Deep Learning (EDL) feature in PaddlePaddle's new release codenamed Fluid. Fluid EDL includes a Kubernetes controller, PaddlePaddle auto-scaler, which changes the number of processes of distributed jobs according to the idle hardware resource in the cluster, and a new fault-tolerable architecture as described in the PaddlePaddle design doc. Industrial deep learning requires significant computation power. Research labs and companies often build GPU clusters managed by SLURM, MPI, or SGE. These clusters either run a submitted job if it requires less than the idle resource, or pend the job for an unpredictably long time.