Goto

Collaborating Authors

 Oceania


Optimal Use of Experience in First Person Shooter Environments

arXiv.org Machine Learning

Although reinforcement learning has made great strides recently, a continuing limitation is that it requires an extremely high number of interactions with the environment. In this paper, we explore the effectiveness of reusing experience from the experience replay buffer in the Deep Q-Learning algorithm. We test the effectiveness of applying learning update steps multiple times per environmental step in the VizDoom environment and show first, this requires a change in the learning rate, and second that it does not improve the performance of the agent. Furthermore, we show that updating less frequently is effective up to a ratio of 4:1, after which performance degrades significantly. These results quantitatively confirm the widespread practice of performing learning updates every 4th environmental step.


3d Deep Learning Github

#artificialintelligence

As a result, in the early pe-riod, people use deep learning as a tool to learn high level features from low level cues usually hand-crafted. Feb 2017 - IEEE Transactions on Image Processing M. Well then write a Python script that will use OpenCV and GoogleLeNet pre-trained on ImageNet to classify images. This course will introduce the fundamental technologies for autonomous vehicle sensors, perception and machine learning, from electromagnetic spectrum characteristics and signal acquisition, vehicle extrospective sensor data analysis, perspective geometry models, image and point cloud processing, to machinedeep learning approaches. Using Keras and Deep Deterministic Policy Gradient to play TORCS. Deep learning is an exciting, young field that specializes in discovering and extracting intricate structures in large, unstructured datasets for parameterizing artificial neural networks with many layers.


The Value of Collaboration in Convex Machine Learning with Differential Privacy

arXiv.org Machine Learning

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.


Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

arXiv.org Artificial Intelligence

Rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning stopping rules. In this paper, we examine the deployment of different stopping strategies in given learning environments which vary from highly stringent for mission critical operations to highly tolerant for non-mission critical operations, and emphasis is placed on the former with particular application to aviation safety. In policy evaluation, two sequential phases of learning are identified, and we describe the outcomes variations using a probabilistic model, with closedform expressions obtained for the key measures of performance. Decision rules that map the trial observations to policy choices are also formulated. In addition, simulation experiments are performed, which corroborate the validity of the theoretical results.


A Halo Merger Tree Generation and Evaluation Framework

arXiv.org Machine Learning

Semi-analytic models are best suited to compare galaxy formation and evolution theories with observations. These models rely heavily on halo merger trees, and their realistic features (i.e., no drastic changes on halo mass or jumps on physical locations). Our aim is to provide a new framework for halo merger tree generation that takes advantage of the results of large volume simulations, with a modest computational cost. We treat halo merger tree construction as a matrix generation problem, and propose a Generative Adversarial Network that learns to generate realistic halo merger trees. We evaluate our proposal on merger trees from the EAGLE simulation suite, and show the quality of the generated trees.


Researchers develop 'vaccine' against attacks on machine learning

#artificialintelligence

Algorithms'learn' from the data they are trained on to create a machine learning model that can perform a given task effectively without needing specific instructions, such as making predictions or accurately classifying images and emails. These techniques are already used widely, for example to identify spam emails, diagnose diseases from X-rays, predict crop yields and will soon drive our cars. While the technology holds enormous potential to positively transform our world, artificial intelligence and machine learning are vulnerable to adversarial attacks, a technique employed to fool machine learning models through the input of malicious data causing them to malfunction. Dr Richard Nock, machine learning group leader at CSIRO's Data61 said that by adding a layer of noise (i.e. an adversary) over an image, attackers can deceive machine learning models into misclassifying the image. "Adversarial attacks have proven capable of tricking a machine learning model into incorrectly labelling a traffic stop sign as speed sign, which could have disastrous effects in the real world. "Our new techniques prevent adversarial attacks using a process similar to vaccination," Dr Nock said. "We implement a weak version of an adversary, such as small modifications or distortion to a collection of images, to create a more'difficult' training data set.


Continuous intelligence: Building a Modern Digital Business for agility and growth

#artificialintelligence

Business today is more than simply matching traditional competitors, it's about exploiting digital technologies to create new opportunities, and being able to repeat this. The economy is quickly going digital and Australian businesses must evolve into Modern Digital Businesses (MDBs) which strategically use intelligence assets to improve operations and deploy new products and services, in order to stay competitive and create value for their customers. A group of digital business leaders recently gathered at ThoughtWorks Live in Sydney and Melbourne, to share their insights into how organisations can take advantage of data to adapt and thrive in the digital economy. This report includes strategic and practical advice taken from the event for any business leader โ€“ regardless of their organisation's digital maturity โ€“ on best practices for taking advantage of data and driving change. A Continuous Intelligence (CI) framework starts with the process of acquiring data and, with the help of analytics and machine learning, derive insights from it to be able to make confident decisions and actions โ€“ which are in turn reviewed and validated, to ensure the organisation continuously improves its decision-making capabilities. Steps organisations can take to apply CI to building an MDB, which is agile and technology-driven are also covered.


Executive Leadership Insights - TRANSEARCH International Australia

#artificialintelligence

To train this AI, we only need to input articles, books, languages, and internet sources. Therefore, the bias and complexity could be reduced and minimized. The model could be used for general purpose selection. Apart from that, the existing voice recognizers and chatbots can support better candidate communication. The aim of first calls by contingent firms is simply laying out the basic job descriptions and asking for a YES or NO, which chatbots could easily handle.


Wheat myth comes a cropper

#artificialintelligence

The myth that modern wheat varieties are more heavily reliant on pesticides and fertilisers than older varieties has been debunked by new research. The University of Queensland's Dr Kai Voss-Fels said modern wheat varieties have out-performed older varieties in side-by-side field trials under both optimum and harsh growing conditions. "There is a view that intensive selection and breeding, which has produced the high-yielding wheat cultivars used in modern cropping, has also made them less resilient and more dependent on chemicals to thrive," Dr Voss-Fels said. "However, the data published today unequivocally shows that modern wheat out-performs older varieties, even under conditions of reduced amounts of fertilisers, fungicides and water. "We also found that genetic diversity within the relatively narrow modern wheat gene pool is rich enough to potentially generate a further 23 per cent increase in yields."


Wasserstein Reinforcement Learning

arXiv.org Machine Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) to improve several classes of state-of-the-art reinforcement learning (RL) algorithms. We show that WD regularizers acting on appropriate policy embeddings efficiently incorporate behavioral characteristics into policy optimization. We demonstrate that they improve Evolution Strategy methods by encouraging more efficient exploration, can be applied in imitation learning and to speed up training of Trust Region Policy Optimization methods. Since the exact computation of WDs is expensive, we develop approximate algorithms based on the combination of different methods: dual formulation of the optimal transport problem, alternating optimization and random feature maps, to effectively replace exact WD computations in the RL tasks considered. We provide theoretical analysis of our algorithms and exhaustive empirical evaluation in a variety of RL settings.