AITopics

2207.07825

Country:

North America > United States > New York (0.04)
Asia > Japan (0.04)
North America > United States > New Jersey (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Water & Waste Management > Water Management (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dengler, Nils, Großklaus, David, Bennewitz, Maren

Learning Goal-Oriented Non-Prehensile Pushing in Cluttered Scenes

arXiv.org Artificial IntelligenceJul-15-2022

Pushing objects through cluttered scenes is a challenging task, especially when the objects to be pushed have initially unknown dynamics and touching other entities has to be avoided to reduce the risk of damage. In this paper, we approach this problem by applying deep reinforcement learning to generate pushing actions for a robotic manipulator acting on a planar surface where objects have to be pushed to goal locations while avoiding other items in the same workspace. With the latent space learned from a depth image of the scene and other observations of the environment, such as contact information between the end effector and the object as well as distance to the goal, our framework is able to learn contact-rich pushing actions that avoid collisions with other objects. As the experimental results with a six degrees of freedom robotic arm show, our system is able to successfully push objects from start to end positions while avoiding nearby objects. Furthermore, we evaluate our learned policy in comparison to a state-of-the-art pushing controller for mobile robots and show that our agent performs better in terms of success rate, collisions with other objects, and continuous object contact in various scenarios.

agent, observation space, obstacle, (13 more...)

2203.02389

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Carroll, Micah, Dragan, Anca, Russell, Stuart, Hadfield-Menell, Dylan

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

arXiv.org Artificial IntelligenceJul-14-2022

The content that a recommender system (RS) shows to users influences them. Therefore, when choosing a recommender to deploy, one is implicitly also choosing to induce specific internal states in users. Even more, systems trained via long-horizon optimization will have direct incentives to manipulate users: in this work, we focus on the incentive to shift user preferences so they are easier to satisfy. We argue that - before deployment - system designers should: estimate the shifts a recommender would induce; evaluate whether such shifts would be undesirable; and perhaps even actively optimize to avoid problematic shifts. These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe". In simulated experiments, we show that our learned preference dynamics model is effective in estimating user preferences and how they would respond to new recommenders. Additionally, we show that recommenders that optimize for staying in the trust region can avoid manipulative behaviors while still generating engagement.

arxiv, penalizing preference shift induced, slate, (11 more...)

2204.11966

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
(8 more...)

Genre: Research Report (0.83)

Industry:

Information Technology > Services (0.46)
Media (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Mazzaglia, Pietro, Verbelen, Tim, Çatal, Ozan, Dhoedt, Bart

The Free Energy Principle for Perception and Action: A Deep Learning Perspective

The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.

artificial intelligence, machine learning, survey article, (16 more...)

doi: 10.3390/e24020301

2207.06415

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre:

Overview (0.92)
Research Report (0.63)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Wang, Zifu, Blaschko, Matthew B.

MRF-UNets: Searching UNet with Markov Random Fields

UNet [27] is widely used in semantic segmentation due to its simplicity and effectiveness. However, its manually-designed architecture is applied to a large number of problem settings, either with no architecture optimizations, or with manual tuning, which is time consuming and can be sub-optimal. In this work, firstly, we propose Markov Random Field Neural Architecture Search (MRF-NAS) that extends and improves the recent Adaptive and Optimal Network Width Search (AOWS) method [4] with (i) a more general MRF framework (ii) diverse M-best loopy inference (iii) differentiable parameter learning. This provides the necessary NAS framework to efficiently explore network architectures that induce loopy inference graphs, including loops that arise from skip connections. With UNet as the backbone, we find an architecture, MRF-UNet, that shows several interesting characteristics. Secondly, through the lens of these characteristics, we identify the sub-optimality of the original UNet architecture and further improve our results with MRF-UNetV2. Experiments show that our MRF-UNets significantly outperform several benchmarks on three aerial image datasets and two medical image datasets while maintaining low computational costs.

architecture, inference, mrf-unet, (13 more...)

2207.06168

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > Thailand (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Bai, Qinbo, Bedi, Amrit Singh, Agarwal, Mridul, Koppel, Alec, Aggarwal, Vaneet

Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to solve CMDP problems in a model-free manner to achieve $\epsilon$-optimal cumulative reward with $\epsilon$ feasible policies. An $\epsilon$-feasible policy implies that it suffers from constraint violation. An important question here is whether we can achieve $\epsilon$-optimal cumulative reward with zero constraint violations or not. To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations. In the prior works, the best available sample complexity for the $\epsilon$-optimal policy with zero constraint violation is $\tilde{\mathcal{O}}\left(1/\epsilon^5\right)$. Hence, the proposed algorithm provides a significant improvement as compared to the state of the art.

algorithm, constraint, constraint violation, (14 more...)

2109.06332

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Haeri, Hossein, Ahmadzadeh, Reza, Jerath, Kshitij

Reward-Sharing Relational Networks in Multi-Agent Reinforcement Learning as a Framework for Emergent Behavior

In this work, we integrate `social' interactions into the MARL setup through a user-defined relational network and examine the effects of agent-agent relations on the rise of emergent behaviors. Leveraging insights from sociology and neuroscience, our proposed framework models agent relationships using the notion of Reward-Sharing Relational Networks (RSRN), where network edge weights act as a measure of how much one agent is invested in the success of (or `cares about') another. We construct relational rewards as a function of the RSRN interaction weights to collectively train the multi-agent system via a multi-agent reinforcement learning algorithm. The performance of the system is tested for a 3-agent scenario with different relational network structures (e.g., self-interested, communitarian, and authoritarian networks). Our results indicate that reward-sharing relational networks can significantly influence learned behaviors. We posit that RSRN can act as a framework where different relational networks produce distinct emergent behaviors, often analogous to the intuited sociological understanding of such networks.

agent, emergent behavior, relational network, (12 more...)

2207.05886

Country:

North America > United States > Washington (0.04)
North America > United States > Massachusetts > Middlesex County > Lowell (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Chen, Fan, Zhang, Junyu, Wen, Zaiwen

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $\Omega\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-\gamma)^3\epsilon^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-\gamma)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.

constraint, inequality, probability, (14 more...)

2207.06147

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Visuo-Tactile Manipulation Planning Using Reinforcement Learning with Affordance Representation

Liang, Wenyu, Fang, Fen, Acar, Cihan, Toh, Wei Qi, Sun, Ying, Xu, Qianli, Wu, Yan

Robots are increasingly expected to manipulate objects in ever more unstructured environments where the object properties have high perceptual uncertainty from any single sensory modality. This directly impacts successful object manipulation. In this work, we propose a reinforcement learning-based motion planning framework for object manipulation which makes use of both on-the-fly multisensory feedback and a learned attention-guided deep affordance model as perceptual states. The affordance model is learned from multiple sensory modalities, including vision and touch (tactile and force/torque), which is designed to predict and indicate the manipulable regions of multiple affordances (i.e., graspability and pushability) for objects with similar appearances but different intrinsic properties (e.g., mass distribution). A DQN-based deep reinforcement learning algorithm is then trained to select the optimal action for successful object manipulation. To validate the performance of the proposed framework, our method is evaluated and benchmarked using both an open dataset and our collected dataset. The results show that the proposed method and overall framework outperform existing methods and achieve better accuracy and higher efficiency.

affordance, exploration, information, (16 more...)

2207.06608

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Aslansefat, Koorosh, Nikolaou, Panagiota, Walker, Martin, Akram, Mohammed Naveed, Sorokos, Ioannis, Reich, Jan, Kolios, Panayiotis, Michael, Maria K., Theocharides, Theocharis, Ellinas, Georgios, Schneider, Daniel, Papadopoulos, Yiannis

SafeDrones: Real-Time Reliability Evaluation of UAVs using Executable Digital Dependable Identities

arXiv.org Artificial IntelligenceJul-12-2022

The use of Unmanned Arial Vehicles (UAVs) offers many advantages across a variety of applications. However, safety assurance is a key barrier to widespread usage, especially given the unpredictable operational and environmental factors experienced by UAVs, which are hard to capture solely at design-time. This paper proposes a new reliability modeling approach called SafeDrones to help address this issue by enabling runtime reliability and risk assessment of UAVs. It is a prototype instantiation of the Executable Digital Dependable Identity (EDDI) concept, which aims to create a model-based solution for real-time, data-driven dependability assurance for multi-robot systems. By providing real-time reliability estimates, SafeDrones allows UAVs to update their missions accordingly in an adaptive manner.

machine learning, real time system, uav, (17 more...)

doi: 10.1007/978-3-031-15842-1_18

2207.05643

Country:

North America > United States > Virginia (0.04)
North America > United States > Michigan (0.04)
North America > United States > District of Columbia > Washington (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)