The Internet-of-Things generates vast quantities of data, much of it attributable to an individual's activity and behaviour. Holding and processing such personal data in a central location presents a significant privacy risk to individuals (of being identified or of their sensitive data being leaked). However, analytics based on machine learning and in particular deep learning benefit greatly from large amounts of data to develop high performance predictive models. Traditionally, data and models are stored and processed in a data centre environment where models are trained in a single location. This work reviews research around an alternative approach to machine learning known as federated learning which seeks to train machine learning models in a distributed fashion on devices in the user's domain, rather than by a centralised entity. Furthermore, we review additional privacy preserving methods applied to federated learning used to protect individuals from being identified during training and once a model is trained. Throughout this review, we identify the strengths and weaknesses of different methods applied to federated learning and finally, we outline future directions for privacy preserving federated learning research, particularly focusing on Internet-of-Things applications.
When a user connects to the Internet to fulfill his needs, he often encounters a huge amount of related information. Recommender systems are the techniques for massively filtering information and offering the items that users find them satisfying and interesting. The advances in machine learning methods, especially deep learning, have led to great achievements in recommender systems, although these systems still suffer from challenges such as cold-start and sparsity problems. To solve these problems, context information such as user communication network is usually used. In this paper, we have proposed a novel recommendation method based on Matrix Factorization and graph analysis methods, namely Louvain for community detection and HITS for finding the most important node within the trust network. In addition, we leverage deep Autoencoders to initialize users and items latent factors, and the Node2vec deep embedding method gathers users' latent factors from the user trust graph. The proposed method is implemented on Ciao and Epinions standard datasets. The experimental results and comparisons demonstrate that the proposed approach is superior to the existing state-of-the-art recommendation methods. Our approach outperforms other comparative methods and achieves great improvements, i.e., 15.56% RMSE improvement for Epinions and 18.41% RMSE improvement for Ciao.
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.
The growing availability of user-specific data has welcomed the exciting era of personalized recommendation, a paradigm that uncovers the heterogeneity across individuals and provides tailored service decisions that lead to improved outcomes. Such heterogeneity is ubiquitous across a variety of application domains (including online advertising, medical treatment assignment, product/news recommendation (, ,,,)) and manifests itself as different individuals responding differently to the recommended items. Rising to this opportunity, contextual bandits ([8, 39, 22, 1, 3]) have emerged to be the predominant mathematical formalism that provides an elegant and powerful formulation: its three core components, the features (representing individual characteristics), the actions (representing the recommendation), and the rewards (representing the observed feedback), capture the salient aspects of the problem and provide fertile ground for developing algorithms that balance exploring and exploiting users' heterogeneity. As such, the last decade has witnessed extensive research efforts in developing effective and efficient contextual bandits algorithms. In particular, two types of algorithms-upper confidence bounds (UCB) based algorithms ([29, 20, 15, 26, 30]) and Thompson sampling (TS) based algorithms ([4, 5, 40, 41, 2])-stand out from this flourishing and fruitful line of work: their theoretical guarantees have been analyzed in many settings, often yielding (near-)optimal regret bounds; their empirical performance have been thoroughly validated, often providing insights into their practical efficacy (including the consensus that TS based algorithms, although sometimes suffering from intensive computation for posterior updates, are generally more effective than their UCB counterparts, whose performance can be sensitive to hyper-parameter tuning). To a large extent, these two family of algorithms have been widely deployed in many modern recommendation engines.
Are you looking for the Best Python Tutorial Online To Learn Python Fast? The best way to learn python is with the list of the Best Python Courses online, books, Training, and Certification Program, which will help you to become an expert in Python programming language and Python programmer. The largest curated list for everything you need to know about Python. Don't be afraid, you will be happy to know that if you have a little idea about programming experience than it's easy for beginners like you to use and learn Python, so let get started! Also, we have included some bonus python certification book to help you to become a Python certified programmer. Learning Python from different sources are now available and installing Python is easy. Many Linux and UNIX distributions include a recent Python. Also, many Windows computers now come with Python already installed. If you don't know how to install Python you can find a few notes on the BeginnersGuide /Download on the wiki page.
Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users' data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this paper, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users' private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data need to be accessed by all the users are kept by the recommender to reduce the storage costs of users' devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data depend on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of linear model and the feature interaction model. To protect the model privacy, the linear models are saved on users' side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt secure aggregation strategy in federated learning paradigm to learn it. To this end, PriRec keeps users' private raw data and models in users' own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy.
Companies may be achieving only a third of the value they could be getting from data science in industry applications. In this paper, we propose a methodology for categorizing and answering 'The Big Three' questions (what is going on, what is causing it, and what actions can I take that will optimize what I care about) using data science. The applications of data science seem to be nearly endless in today's modern landscape, with each company jockeying for position in the new data and insights economy. Yet, data scientists seem to be solely focused on using classification, regression, and clustering methods to answer the question 'what is going on'. Answering questions about why things are happening or how to take optimal actions to improve metrics are relegated to niche fields of research and generally neglected in industry data science analysis. We survey technical methods to answer these other important questions, describe areas in which some of these methods are being applied, and provide a practical example of how to apply our methodology and selected methods to a real business use case.
Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision making problems in complex uncertain environments. Basically, RL proposes a computational approach that allows learning through interaction in an environment of stochastic behavior, with agents taking actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited super-human performance in games like Go or Starcraft 2, which led to its adoption in many other domains including Cloud Computing. Particularly, workflow autoscaling exploits the Cloud elasticity to optimize the execution of workflows according to a given optimization criteria. This is a decision-making problem in which it is necessary to establish when and how to scale-up/down computational resources; and how to assign them to the upcoming processing workload. Such actions have to be taken considering some optimization criteria in the Cloud, a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in Cloud. In this work we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and provide a prospective of future research in the area.
Online Social Networks(OSNs) have established virtual platforms enabling people to express their opinions, interests and thoughts in a variety of contexts and domains, allowing legitimate users as well as spammers and other untrustworthy users to publish and spread their content. Hence, the concept of social trust has attracted the attention of information processors/data scientists and information consumers/business firms. One of the main reasons for acquiring the value of Social Big Data (SBD) is to provide frameworks and methodologies using which the credibility of OSNs users can be evaluated. These approaches should be scalable to accommodate large-scale social data. Hence, there is a need for well comprehending of social trust to improve and expand the analysis process and inferring the credibility of SBD. Given the exposed environment's settings and fewer limitations related to OSNs, the medium allows legitimate and genuine users as well as spammers and other low trustworthy users to publish and spread their content. Hence, this paper presents an approach incorporates semantic analysis and machine learning modules to measure and predict users' trustworthiness in numerous domains in different time periods. The evaluation of the conducted experiment validates the applicability of the incorporated machine learning techniques to predict highly trustworthy domain-based users.
Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration.