Lagoudakis, Michail G.
Detecting Olives with Synthetic or Real Data? Olive the Above
Karabatis, Yianni, Lin, Xiaomin, Sanket, Nitin J., Lagoudakis, Michail G., Aloimonos, Yiannis
Modern robotics has enabled the advancement in yield estimation for precision agriculture. However, when applied to the olive industry, the high variation of olive colors and their similarity to the background leaf canopy presents a challenge. Labeling several thousands of very dense olive grove images for segmentation is a labor-intensive task. This paper presents a novel approach to detecting olives without the need to manually label data. In this work, we present the world's first olive detection dataset comprised of synthetic and real olive tree images. This is accomplished by generating an auto-labeled photorealistic 3D model of an olive tree. Its geometry is then simplified for lightweight rendering purposes. In addition, experiments are conducted with a mix of synthetically generated and real images, yielding an improvement of up to 66% compared to when only using a small sample of real data. When access to real, human-labeled data is limited, a combination of mostly synthetic data and a small amount of real data can enhance olive detection.
Rollout Sampling Approximate Policy Iteration
Dimitrakakis, Christos, Lagoudakis, Michail G.
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.
Reports on the Twenty-First National Conference on Artificial Intelligence (AAAI-06) Workshop Program
Achtner, Wolfgang, Aimeur, Esma, Anand, Sarabjot Singh, Appelt, Doug, Ashish, Naveen, Barnes, Tiffany, Beck, Joseph E., Dias, M. Bernardine, Doshi, Prashant, Drummond, Chris, Elazmeh, William, Felner, Ariel, Freitag, Dayne, Geffner, Hector, Geib, Christopher W., Goodwin, Richard, Holte, Robert C., Hutter, Frank, Isaac, Fair, Japkowicz, Nathalie, Kaminka, Gal A., Koenig, Sven, Lagoudakis, Michail G., Leake, David B., Lewis, Lundy, Liu, Hugo, Metzler, Ted, Mihalcea, Rada, Mobasher, Bamshad, Poupart, Pascal, Pynadath, David V., Roth-Berghofer, Thomas, Ruml, Wheeler, Schulz, Stefan, Schwarz, Sven, Seneff, Stephanie, Sheth, Amit, Sun, Ron, Thielscher, Michael, Upal, Afzal, Williams, Jason, Young, Steve, Zelenko, Dmitry
The Workshop program of the Twenty-First Conference on Artificial Intelligence was held July 16-17, 2006 in Boston, Massachusetts. The program was chaired by Joyce Chai and Keith Decker. The titles of the 17 workshops were AIDriven Technologies for Service-Oriented Computing; Auction Mechanisms for Robot Coordination; Cognitive Modeling and Agent-Based Social Simulations, Cognitive Robotics; Computational Aesthetics: Artificial Intelligence Approaches to Beauty and Happiness; Educational Data Mining; Evaluation Methods for Machine Learning; Event Extraction and Synthesis; Heuristic Search, Memory- Based Heuristics, and Their Applications; Human Implications of Human-Robot Interaction; Intelligent Techniques in Web Personalization; Learning for Search; Modeling and Retrieval of Context; Modeling Others from Observations; and Statistical and Empirical Approaches for Spoken Dialogue Systems.
Reports on the Twenty-First National Conference on Artificial Intelligence (AAAI-06) Workshop Program
Achtner, Wolfgang, Aimeur, Esma, Anand, Sarabjot Singh, Appelt, Doug, Ashish, Naveen, Barnes, Tiffany, Beck, Joseph E., Dias, M. Bernardine, Doshi, Prashant, Drummond, Chris, Elazmeh, William, Felner, Ariel, Freitag, Dayne, Geffner, Hector, Geib, Christopher W., Goodwin, Richard, Holte, Robert C., Hutter, Frank, Isaac, Fair, Japkowicz, Nathalie, Kaminka, Gal A., Koenig, Sven, Lagoudakis, Michail G., Leake, David B., Lewis, Lundy, Liu, Hugo, Metzler, Ted, Mihalcea, Rada, Mobasher, Bamshad, Poupart, Pascal, Pynadath, David V., Roth-Berghofer, Thomas, Ruml, Wheeler, Schulz, Stefan, Schwarz, Sven, Seneff, Stephanie, Sheth, Amit, Sun, Ron, Thielscher, Michael, Upal, Afzal, Williams, Jason, Young, Steve, Zelenko, Dmitry
The Workshop program of the Twenty-First Conference on Artificial Intelligence was held July 16-17, 2006 in Boston, Massachusetts. The program was chaired by Joyce Chai and Keith Decker. The titles of the 17 workshops were AIDriven Technologies for Service-Oriented Computing; Auction Mechanisms for Robot Coordination; Cognitive Modeling and Agent-Based Social Simulations, Cognitive Robotics; Computational Aesthetics: Artificial Intelligence Approaches to Beauty and Happiness; Educational Data Mining; Evaluation Methods for Machine Learning; Event Extraction and Synthesis; Heuristic Search, Memory- Based Heuristics, and Their Applications; Human Implications of Human-Robot Interaction; Intelligent Techniques in Web Personalization; Learning for Search; Modeling and Retrieval of Context; Modeling Others from Observations; and Statistical and Empirical Approaches for Spoken Dialogue Systems.
Learning in Zero-Sum Team Markov Games Using Factored Value Functions
Lagoudakis, Michail G., Parr, Ronald
We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned policies can be executed in a distributed manner. The value function is represented as a factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally intensive representations with extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The approach is demonstrated with an example that shows the efficiency gains over naive enumeration.
Learning in Zero-Sum Team Markov Games Using Factored Value Functions
Lagoudakis, Michail G., Parr, Ronald
We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating againstan opposing team of agents. Our method requires full observability and communication during learning, but the learned policies canbe executed in a distributed manner. The value function is represented asa factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally intensive representationswith extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The approach isdemonstrated with an example that shows the efficiency gains over naive enumeration.
Model-Free Least-Squares Policy Iteration
Lagoudakis, Michail G., Parr, Ronald
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.
Model-Free Least-Squares Policy Iteration
Lagoudakis, Michail G., Parr, Ronald
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.