Goto

Collaborating Authors

Results


Global Big Data Conference - AI Summary

#artificialintelligence

China Poised To Dominate The Artificial Intelligence (AI) Market Posted on: Mar 15 – 2021 Many westerners have no idea that China is actually setting a course to become a world leader in AI, and that a number of key Chinese AI companies have already become a major part of everyday life in China, the United States, and the rest of the world. China's leaders have made AI a strategic priority and are driving the Chinese tech industry to define standards and norms for global artificial intelligence practices. There are a plethora of potential AI applications for this market, so China is fertile ground for AI companies to thrive. The government and business sector in China understand that AI is revolutionizing virtually all aspects of consumers' lives, and they are embracing the new technology wholeheartedly and providing strong strategic guidance for innovative AI tech companies. Many westerners have no idea that China is actually setting a course to become a world leader in AI, and that a number of key Chinese AI companies have already become a major part of everyday life in China, the United States, and the rest of the world.


Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape

#artificialintelligence

It's been a hot, hot year in the world of data, machine learning and AI. Just when you thought it couldn't grow any more explosively, the data/AI landscape just did: rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc. It has also been a year of multiple threads and stories intertwining. One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings. Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, Datarobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index – both indexes will be updated soon). But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups. Whether they were founded a few years or a few months ago, many experienced a growth spurt in the last year or so. As we will discuss, part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market. In the last year, there's been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicle, etc.), and a bit less AI hype as a result. Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use cases. Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entire new categories (data observability, reverse ETL, metrics stores, etc.) appearing and/or drastically accelerating. To keep track of this evolution, this is our eighth annual landscape and "state of the union" of the data and AI ecosystem – co-authored this year with my FirstMark colleague John Wu. (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II) and 2020.) For those who have remarked over the years how insanely busy the chart is, you'll love our new acronym – Machine learning, Artificial intelligence and Data (MAD) – this is now officially the MAD landscape! We've learned over the years that those posts are read by a broad group of people, so we have tried to provide a little bit for everyone – a macro view that will hopefully be interesting and approachable to most; and then a slightly more granular overview of trends in data infrastructure and ML/AI for people with deeper familiarity with the industry. This (long!) post is organized as follows: Let's start with the high level view of the market. As the number of companies in the space keeps increasing every year, the inevitable questions are: why is this happening?


The 2021 machine learning, AI, and data landscape

#artificialintelligence

Just when you thought it couldn't grow any more explosively, the data/AI landscape just did: the rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc. It has also been a year of multiple threads and stories intertwining. One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings. Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, DataRobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index). But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups. Whether they were founded a few years or a few months ago, many experienced a growth spurt in the past year or so. Part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market. In the past year, there's been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicles, etc.), and a bit less AI hype as a result. Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use trend cases. Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entirely new categories (data observability, reverse ETL, metrics stores, etc.) appearing or drastically accelerating. To keep track of this evolution, this is our eighth annual landscape and "state of the union" of the data and AI ecosystem -- coauthored this year with my FirstMark colleague John Wu. (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019: Part I and Part II, and 2020.) For those who have remarked over the years how insanely busy the chart is, you'll love our new acronym: Machine learning, Artificial intelligence, and Data (MAD) -- this is now officially the MAD landscape! We've learned over the years that those posts are read by a broad group of people, so we have tried to provide a little bit for everyone -- a macro view that will hopefully be interesting and approachable to most, and then a slightly more granular overview of trends in data infrastructure and ML/AI for people with a deeper familiarity with the industry. Let's start with a high-level view of the market. As the number of companies in the space keeps increasing every year, the inevitable questions are: Why is this happening? How long can it keep going?


NI-UDA: Graph Adversarial Domain Adaptation from Non-shared-and-Imbalanced Big Data to Small Imbalanced Applications

arXiv.org Artificial Intelligence

We propose a new general Graph Adversarial Domain Adaptation (GADA) based on semantic knowledge reasoning of class structure for solving the problem of unsupervised domain adaptation (UDA) from the big data with non-shared and imbalanced classes to specified small and imbalanced applications (NI-UDA), where non-shared classes mean the label space out of the target domain. Our goal is to leverage priori hierarchy knowledge to enhance domain adversarial aligned feature representation with graph reasoning. In this paper, to address two challenges in NI-UDA, we equip adversarial domain adaptation with Hierarchy Graph Reasoning (HGR) layer and the Source Classifier Filter (SCF). For sparse classes transfer challenge, our HGR layer can aggregate local feature to hierarchy graph nodes by node prediction and enhance domain adversarial aligned feature with hierarchy graph reasoning for sparse classes. Our HGR contributes to learn direct semantic patterns for sparse classes by hierarchy attention in self-attention, non-linear mapping and graph normalization. our SCF is proposed for the challenge of knowledge sharing from non-shared data without negative transfer effect by filtering low-confidence non-shared data in HGR layer. Experiments on two benchmark datasets show our GADA methods consistently improve the state-of-the-art adversarial UDA algorithms, e.g. GADA(HGR) can greatly improve f1 of the MDD by \textbf{7.19\%} and GVB-GD by \textbf{7.89\%} respectively on imbalanced source task in Meal300 dataset. The code is available at https://gadatransfer.wixsite.com/gada.


True-data Testbed for 5G/B5G Intelligent Network

arXiv.org Artificial Intelligence

Future beyond fifth-generation (B5G) and sixth-generation (6G) mobile communications will shift from facilitating interpersonal communications to supporting Internet of Everything (IoE), where intelligent communications with full integration of big data and artificial intelligence (AI) will play an important role in improving network efficiency and providing high-quality service. As a rapid evolving paradigm, the AI-empowered mobile communications demand large amounts of data acquired from real network environment for systematic test and verification. Hence, we build the world's first true-data testbed for 5G/B5G intelligent network (TTIN), which comprises 5G/B5G on-site experimental networks, data acquisition & data warehouse, and AI engine & network optimization. In the TTIN, true network data acquisition, storage, standardization, and analysis are available, which enable system-level online verification of B5G/6G-orientated key technologies and support data-driven network optimization through the closed-loop control mechanism. This paper elaborates on the system architecture and module design of TTIN. Detailed technical specifications and some of the established use cases are also showcased.


Federated Learning for Privacy-Preserving AI

Communications of the ACM

There has been remarkable success of machine learning (ML) technologies in empowering practical artificial intelligence (AI) applications, such as automatic speech recognition and computer vision. However, we are facing two major challenges in adopting AI today. One is that data in most industries exist in the form of isolated islands. The other is the ever-increasing demand for privacy-preserving AI. Conventional AI approaches based on centralized data collection cannot meet these challenges.


A Distributed Privacy-Preserving Learning Dynamics in General Social Networks

arXiv.org Artificial Intelligence

In this paper, we study a distributed privacy-preserving learning problem in general social networks. Specifically, we consider a very general problem setting where the agents in a given multi-hop social network are required to make sequential decisions to choose among a set of options featured by unknown stochastic quality signals. Each agent is allowed to interact with its peers through multi-hop communications but with its privacy preserved. To serve the above goals, we propose a four-staged distributed social learning algorithm. In a nutshell, our algorithm proceeds iteratively, and in every round, each agent i) randomly perturbs its adoption for privacy-preserving purpose, ii) disseminates the perturbed adoption over the social network in a nearly uniform manner through random walking, iii) selects an option by referring to its peers' perturbed latest adoptions, and iv) decides whether or not to adopt the selected option according to its latest quality signal. By our solid theoretical analysis, we provide answers to two fundamental algorithmic questions about the performance of our four-staged algorithm: on one hand, we illustrate the convergence of our algorithm when there are a sufficient number of agents in the social network, each of which are with incomplete and perturbed knowledge as input; on the other hand, we reveal the quantitative trade-off between the privacy loss and the communication overhead towards the convergence. We also perform extensive simulations to validate our theoretical analysis and to verify the efficacy of our algorithm.


Adversarial Privacy Preserving Graph Embedding against Inference Attack

arXiv.org Machine Learning

Recently, the surge in popularity of Internet of Things (IoT), mobile devices, social media, etc. has opened up a large source for graph data. Graph embedding has been proved extremely useful to learn low-dimensional feature representations from graph structured data. These feature representations can be used for a variety of prediction tasks from node classification to link prediction. However, existing graph embedding methods do not consider users' privacy to prevent inference attacks. That is, adversaries can infer users' sensitive information by analyzing node representations learned from graph embedding algorithms. In this paper, we propose Adversarial Privacy Graph Embedding (APGE), a graph adversarial training framework that integrates the disentangling and purging mechanisms to remove users' private information from learned node representations. The proposed method preserves the structural information and utility attributes of a graph while concealing users' private attributes from inference attacks. Extensive experiments on real-world graph datasets demonstrate the superior performance of APGE compared to the state-of-the-arts. Our source code can be found at https://github.com/uJ62JHD/Privacy-Preserving-Social-Network-Embedding.


Federated Extra-Trees with Privacy Preserving

arXiv.org Machine Learning

It is commonly observed that the data are scattered everywhere and difficult to be centralized. The data privacy and security also become a sensitive topic. The laws and regulations such as the European Union's General Data Protection Regulation (GDPR) are designed to protect the public's data privacy. However, machine learning requires a large amount of data for better performance, and the current circumstances put deploying real-life AI applications in an extremely difficult situation. To tackle these challenges, in this paper we propose a novel privacy-preserving federated machine learning model, named Federated Extra-Trees, which applies local differential privacy in the federated trees model. A secure multi-institutional machine learning system was developed to provide superior performance by processing the modeling jointly on different clients without exchanging any raw data. We have validated the accuracy of our work by conducting extensive experiments on public datasets and the efficiency and robustness were also verified by simulating the real-world scenarios. Overall, we presented an extensible, scalable and practical solution to handle the data island problem.


Report highlights Asia Pacific digital transformation market -

#artificialintelligence

Research and Markets has published a new report titled: "Digital Transformation Asia Pacific: 5G, Artificial Intelligence, Internet of Things, and Smart Cities in APAC 2019 – 2024." This report identifies market opportunities for deployment and operations of key technologies within the Asia-Pacific region. The report cites examples such as H3C Technologies plans to offer a comprehensive digital transformation platform within Thailand. The platform includes core cloud and edge computing, big data, interconnectivity, information security, IoT, AI, and 5G solutions. The report predicts what will happen with 5G technology, to identifying how 5G will digitally transform business in Asia Pacific.