Collaborating Authors


On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems Artificial Intelligence

Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device training using a user's small size of local samples will incur severe overfitting and undermine the model's generalization ability. In this work, we propose a new device-cloud collaborative learning framework, called CoDA, to break the dilemmas of purely cloud-based learning and on-device learning. The key principle of CoDA is to retrieve similar samples from the cloud's global pool to augment each user's local dataset to train the recommendation model. Specifically, after a coarse-grained sample matching on the cloud, a personalized sample classifier is further trained on each device for a fine-grained sample filtering, which can learn the boundary between the local data distribution and the outside data distribution. We also build an end-to-end pipeline to support the flows of data, model, computation, and control between the cloud and each device. We have deployed CoDA in a recommendation scenario of Mobile Taobao. Online A/B testing results show the remarkable performance improvement of CoDA over both cloud-based learning without model personalization and on-device training without data augmentation. Overhead testing on a real device demonstrates the computation, storage, and communication efficiency of the on-device tasks in CoDA.

Top 10 Keyword Extraction API


This article is brought to you by the Eden AI team. We allow you to test and use in production a large number of AI engines from different providers directly through our API and platform. You are a solution provider and want to integrate Eden AI, contact us at : Intro: In this article, we are going to see how we can easily integrate a Keyword Extraction engine in your project and how to choose and access the right engine according to your data. Definition: Keyword extraction (a

Phoebe: A Learning-based Checkpoint Optimizer Artificial Intelligence

Easy-to-use programming interfaces paired with cloud-scale processing engines have enabled big data system users to author arbitrarily complex analytical jobs over massive volumes of data. However, as the complexity and scale of analytical jobs increase, they encounter a number of unforeseen problems, hotspots with large intermediate data on temporary storage, longer job recovery time after failures, and worse query optimizer estimates being examples of issues that we are facing at Microsoft. To address these issues, we propose Phoebe, an efficient learning-based checkpoint optimizer. Given a set of constraints and an objective function at compile-time, Phoebe is able to determine the decomposition of job plans, and the optimal set of checkpoints to preserve their outputs to durable global storage. Phoebe consists of three machine learning predictors and one optimization module. For each stage of a job, Phoebe makes accurate predictions for: (1) the execution time, (2) the output size, and (3) the start/end time taking into account the inter-stage dependencies. Using these predictions, we formulate checkpoint optimization as an integer programming problem and propose a scalable heuristic algorithm that meets the latency requirement of the production environment. We demonstrate the effectiveness of Phoebe in production workloads, and show that we can free the temporary storage on hotspots by more than 70% and restart failed jobs 68% faster on average with minimum performance impact. Phoebe also illustrates that adding multiple sets of checkpoints is not cost-efficient, which dramatically reduces the complexity of the optimization.

Towards Personalized and Human-in-the-Loop Document Summarization Artificial Intelligence

The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.

DeepTriage: Automated Transfer Assistance for Incidents in Cloud Services Artificial Intelligence

As cloud services are growing and generating high revenues, the cost of downtime in these services is becoming significantly expensive. To reduce loss and service downtime, a critical primary step is to execute incident triage, the process of assigning a service incident to the correct responsible team, in a timely manner. An incorrect assignment risks additional incident reroutings and increases its time to mitigate by 10x. However, automated incident triage in large cloud services faces many challenges: (1) a highly imbalanced incident distribution from a large number of teams, (2) wide variety in formats of input data or data sources, (3) scaling to meet production-grade requirements, and (4) gaining engineers' trust in using machine learning recommendations. To address these challenges, we introduce DeepTriage, an intelligent incident transfer service combining multiple machine learning techniques - gradient boosted classifiers, clustering methods, and deep neural networks - in an ensemble to recommend the responsible team to triage an incident. Experimental results on real incidents in Microsoft Azure show that our service achieves 82.9% F1 score. For highly impacted incidents, DeepTriage achieves F1 score from 76.3% - 91.3%. We have applied best practices and state-of-the-art frameworks to scale DeepTriage to handle incident routing for all cloud services. DeepTriage has been deployed in Azure since October 2017 and is used by thousands of teams daily.

IBM's Watson AIOps automates IT anomaly detection and remediation


Today during its annual IBM Think conference, IBM announced the launch of Watson AIOps, a service that taps AI to automate the real-time detection, diagnosing, and remediation of network anomalies. It also unveiled new offerings targeting the rollout of 5G technologies and the devices on those networks, as well as a coalition of telecommunications partners -- the IBM Telco Network Cloud Ecosystem -- that will work with IBM to deploy edge computing technologies. Watson AIOps marks IBM's foray into the mammoth AIOps market, which is expected to grow from $2.55 billion in 2018 to $11.02 billion by 2023, according to Markets and Markets. That might be a conservative projection in light of the pandemic, which is forcing IT teams to increasingly conduct their work remotely. In lieu of access to infrastructure, tools like Watson AIOps could help prevent major outages, the cost of which a study from Aberdeen pegged at $260,000 per hour.

A Survey on the Use of Preferences for Virtual Machine Placement in Cloud Data Centers Artificial Intelligence

With the rapid development of virtualization techniques, cloud data centers allow for cost effective, flexible, and customizable deployments of applications on virtualized infrastructure. Virtual machine (VM) placement aims to assign each virtual machine to a server in the cloud environment. VM Placement is of paramount importance to the design of cloud data centers. Typically, VM placement involves complex relations and multiple design factors as well as local policies that govern the assignment decisions. It also involves different constituents including cloud administrators and customers that might have disparate preferences while opting for a placement solution. Thus, it is often valuable to not only return an optimized solution to the VM placement problem but also a solution that reflects the given preferences of the constituents. In this paper, we provide a detailed review on the role of preferences in the recent literature on VM placement. We further discuss key challenges and identify possible research opportunities to better incorporate preferences within the context of VM placement.

Kubernetes For AI Hyperparameter Search Experiments NVIDIA Developer Blog


The software industry has recently seen a huge shift in how software deployments are done thanks to technologies such as containers and orchestrators. While container technologies have been around, credit goes to Docker for making containers mainstream, by greatly simplifying the process of creating, managing and deploying containerized applications. Teams of developers and data scientists are increasingly moving their training and inference workloads from one-developer-one-workstation model to shared centralized infrastructure, to improve resource utilization and sharing. With container orchestration tools such as Kubernetes, Docker Swarm and Marathon, developers and data scientists get more control over how and when their apps are run and ops teams don't have to deal with deploying and managing workloads. NVIDIA actively contributes to making container technologies and orchestrators GPU friendly, enabling the same deployment best practices that exists for traditional software development and deployment to be applied to AI software development.

Microsoft Teardown


We dive into the strategies Microsoft is pursuing across cloud, enterprise IT, AI, gaming, and more to see how the company is positioning itself for the future. As the world's most valuable company, and with a current market cap hovering around $780B, Microsoft may be the next company to reach the $1T threshold. While it may not grab as many headlines as its buzzier tech giant counterparts, the company is quietly adapting across its core business areas, led by a future-focused Satya Nadella. Since assuming the CEO role in 2014, Nadella has deprioritized the Windows offering that initially helped Microsoft become a household name, refocusing the company's efforts on implementing AI across all its products and services. That's not the only change: in addition to an increased focus on AI, cloud and subscription services have become unifying themes across products. And to maintain its dominance in enterprise technology, Microsoft is expanding in new areas -- like gaming and personal computing -- that leverage the company's own cloud infrastructure. Below, we outline Microsoft's key priorities, initiatives, investments, and acquisitions across its various business segments. The majority of Microsoft's revenue comes from its enterprise technologies, which fall under its Intelligent Cloud and Productivity & Business Processes segments. The Productivity & Business Processes segment includes software products like Office 365, Skype, LinkedIn, and Microsoft's ERP (enterprise resource planning) and CRM (customer relationship management) platform, Dynamic 365. Microsoft's Intelligence Cloud segment includes cloud platform Azure, the Visual Studio developer platform, and Windows Server, a version of Microsoft's proprietary operating system optimized for running in the cloud. Outside of enterprise technology, Microsoft generates revenue from products like Xbox and Microsoft Surface, among others areas. These products are bucketed into the company's More Personal Computing segment. In addition to its in-house efforts, Microsoft has a number of initiatives that look to support promising young businesses. These include Microsoft's venture capital arm, M12, Microsoft's accelerator, ScaleUp, and other initiatives like Microsoft for Startups.

Post-prognostics decision in Cyber-Physical Systems Artificial Intelligence

Abstract-- Prognostics and Health Management (PHM) offers several benefits for predictive maintenance. It predicts the future behavior of a system as well as its Remaining Useful Life (RUL). This RUL is used to planned the maintenance operation to avoid the failure, the stop time and optimize the cost of the maintenance and failure. However, with the development of the industry the assets are nowadays distributed this is why the PHM needs to be developed using the new IT. In our work we propose a PHM solution based on Cyber physical system where the physical side is connected to the analyze process of the PHM which are developed in the cloud to be shared and to benefit of the cloud characteristics Keywords-- Cyber physical systems CPS, Prognostics Health Management PHM, Decision post-prognostics, cloud computing, Internet of Things.