Goto

Collaborating Authors

 Distributed Architectures


A Compositional Approach to Creating Architecture Frameworks with an Application to Distributed AI Systems

arXiv.org Artificial Intelligence

Artificial intelligence (AI) in its various forms finds more and more its way into complex distributed systems. For instance, it is used locally, as part of a sensor system, on the edge for low-latency high-performance inference, or in the cloud, e.g. for data mining. Modern complex systems, such as connected vehicles, are often part of an Internet of Things (IoT). To manage complexity, architectures are described with architecture frameworks, which are composed of a number of architectural views connected through correspondence rules. Despite some attempts, the definition of a mathematical foundation for architecture frameworks that are suitable for the development of distributed AI systems still requires investigation and study. In this paper, we propose to extend the state of the art on architecture framework by providing a mathematical model for system architectures, which is scalable and supports co-evolution of different aspects for example of an AI system. Based on Design Science Research, this study starts by identifying the challenges with architectural frameworks. Then, we derive from the identified challenges four rules and we formulate them by exploiting concepts from category theory. We show how compositional thinking can provide rules for the creation and management of architectural frameworks for complex systems, for example distributed systems with AI. The aim of the paper is not to provide viewpoints or architecture models specific to AI systems, but instead to provide guidelines based on a mathematical formulation on how a consistent framework can be built up with existing, or newly created, viewpoints. To put in practice and test the approach, the identified and formulated rules are applied to derive an architectural framework for the EU Horizon 2020 project ``Very efficient deep learning in the IoT" (VEDLIoT) in the form of a case study.


Device Selection for the Coexistence of URLLC and Distributed Learning Services

arXiv.org Artificial Intelligence

Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.


Why distributed AI is key to pushing the AI innovation envelope

#artificialintelligence

The future of AI is distributed, said Ion Stoica, co-founder, executive chairman and president of Anyscale on the first day of VB Transform. And that's because model complexity shows no signs of slowing down. "For the past couple of years, the compute requirements to train a state-of-the-art model, depending on the data set, grow between 10 times and 35 times every 18 months," he said. Just five years ago the largest models were fitting on a single GPU; fast forward to today and just to fit the parameters of the most advanced models, it takes hundreds or even thousands of GPUs. PaLM, or the Pathway Language Model from Google, has 530 billion parameters -- and that's only about half of the largest, at more than 1 trillion parameters.


ISO/IEC AI workshop - JTC 1

#artificialintelligence

Babak Hodjat is the CTO for AI at Cognizant where he leads a team of developers and researchers bringing advanced AI solutions to businesses. Babak is the former co-founder and CEO of Sentient, responsible for the core technology behind the world's largest distributed artificial intelligence system. Babak was also the founder of the world's first AI-driven hedge-fund, Sentient Investment Management. Babak is a serial entrepreneur, having started a number of Silicon Valley companies as main inventor and technologist. Prior to co-founding Sentient, Babak was senior director of engineering at Sybase iAnywhere, where he led mobile solutions engineering.


Birmingham

AAAI Conferences

We are interested in mixed human and agent systems in the context of networked computer games. These games require a fully distributed computer system. State changes must be transmitted by network messages subject to possibly significant latency. The system then is composed of agents' mutually inconsistent views of the world state that cannot be reconciled because no single agent's state is naturally more correct than another's. The paper discusses the implications of this inconsistency for distributed AI systems. While our example is computer games, we argue the implications affect a much larger class of human/AI problems.


Roadmap for Edge AI: A Dagstuhl Perspective

arXiv.org Artificial Intelligence

Based on the collective input of Dagstuhl Seminar (21342), this paper presents a comprehensive discussion on AI methods and capabilities in the context of edge computing, referred as Edge AI. In a nutshell, we envision Edge AI to provide adaptation for data-driven applications, enhance network and radio access, and allow the creation, optimization, and deployment of distributed AI/ML pipelines with given quality of experience, trust, security and privacy targets. The Edge AI community investigates novel ML methods for the edge computing environment, spanning multiple sub-fields of computer science, engineering and ICT. The goal is to share an envisioned roadmap that can bring together key actors and enablers to further advance the domain of Edge AI.


Distributed Artificial Intelligence

#artificialintelligence

Let's start from the broader classification. Distributed Artificial Intelligence (DAI) is a class of technologies and methods that span from swarm intelligence to multi-agent technologies and that basically concerns the development of distributed solutions for a specific problem. It can mainly be used for learning, reasoning, and planning, and it is one of the subsets of AI where simulation has a way greater importance than point-prediction. In this class of systems, autonomous learning processing agents (distributed at large scale and independent) reach conclusions or a semi-equilibrium through interaction and communication (even asynchronously). One of the big benefits of those with respect to neural networks is that they do not require the same amount of data to work -- far to say though these are simple systems.


Institutional Metaphors for Designing Large-Scale Distributed AI versus AI Techniques for Running Institutions

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) started out with an ambition to reproduce the human mind, but, as the sheer scale of that ambition became manifest, it quickly retreated into either studying specialized intelligent behaviours, or proposing over-arching architectural concepts for interfacing specialized intelligent behaviour components, conceived of as agents in a kind of organization. This agent-based modeling paradigm, in turn, proves to have interesting applications in understanding, simulating, and predicting the behaviour of social and legal structures on an aggregate level. For these reasons, this chapter examines a number of relevant cross-cutting concerns, conceptualizations, modeling problems and design challenges in large-scale distributed Artificial Intelligence, as well as in institutional systems, and identifies potential grounds for novel advances.


Pervasive AI for IoT Applications: Resource-efficient Distributed Artificial Intelligence

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services, spanning from recommendation systems to robotics control and military surveillance. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. Designing accurate models using such data streams, to predict future insights and revolutionize the decision-taking process, inaugurates pervasive systems as a worthy paradigm for a better quality-of-life. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems from mainly data collection to executing distributed computations with a promising alternative to centralized learning, presenting various challenges. In this context, a wise cooperation and resource scheduling should be envisaged among IoT devices (e.g., smartphones, smart vehicles) and infrastructure (e.g. edge nodes, and base stations) to avoid communication and computation overheads and ensure maximum performance. In this paper, we conduct a comprehensive survey of the recent techniques developed to overcome these resource challenges in pervasive AI systems. Specifically, we first present an overview of the pervasive computing, its architecture, and its intersection with artificial intelligence. We then review the background, applications and performance metrics of AI, particularly Deep Learning (DL) and online learning, running in a ubiquitous system. Next, we provide a deep literature review of communication-efficient techniques, from both algorithmic and system perspectives, of distributed inference, training and online learning tasks across the combination of IoT devices, edge devices and cloud servers. Finally, we discuss our future vision and research challenges.


Analyzing Power Grid, ICT, and Market Without Domain Knowledge Using Distributed Artificial Intelligence

arXiv.org Artificial Intelligence

Modern cyber-physical systems (CPS), such as our energy infrastructure, are becoming increasingly complex: An ever-higher share of Artificial Intelligence (AI)-based technologies use the Information and Communication Technology (ICT) facet of energy systems for operation optimization, cost efficiency, and to reach CO2 goals worldwide. At the same time, markets with increased flexibility and ever shorter trade horizons enable the multi-stakeholder situation that is emerging in this setting. These systems still form critical infrastructures that need to perform with highest reliability. However, today's CPS are becoming too complex to be analyzed in the traditional monolithic approach, where each domain, e.g., power grid and ICT as well as the energy market, are considered as separate entities while ignoring dependencies and side-effects. To achieve an overall analysis, we introduce the concept for an application of distributed artificial intelligence as a self-adaptive analysis tool that is able to analyze the dependencies between domains in CPS by attacking them. It eschews pre-configured domain knowledge, instead exploring the CPS domains for emergent risk situations and exploitable loopholes in codices, with a focus on rational market actors that exploit the system while still following the market rules.