setter
Devious humour and painful puns: will the cryptic crossword remain the last thing AI can't conquer?
The Times hosts an annual crossword-solving competition and it remains, until such time as the Guardian has its own version, the gold standard. This year's competitors included a dog. Rather, an AI represented as a jolly coffee-drinking dog named Ross (a name hidden in "crossword"), and who is embedded on the Crossword Genius smartphone app. The human competitors at the event โ which took place at Times' parent company News UK's London headquarters, in the shadow of the Shard โ were, as usual, bafflingly fast: pondering the next clue while scribbling the letters of the previous. An AI can conceivably "think" about multiple puzzles at once: so did it outwit us mortals?
Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning
Shah, Avidan, Tran, Danny, Tang, Yuhan
Curriculum learning has been growing in the domain of reinforcement learning as a method of improving training efficiency for various tasks. It involves modifying the difficulty (lessons) of the environment as the agent learns, in order to encourage more optimal agent behavior and higher reward states. However, most curriculum learning methods currently involve discrete transitions of the curriculum or predefined steps by the programmer or using automatic curriculum learning on only a small subset training such as only on an adversary. In this paper, we propose a novel approach to curriculum learning that uses a Setter Model to automatically generate an action space, adversary strength, initialization, and bunching strength. Transportation and traffic optimization is a well known area of study, especially for reinforcement learning based solutions. We specifically look at the bus bunching problem for the context of this study. The main idea of the problem is to minimize the delays caused by inefficient bus timings for passengers arriving and departing from a system of buses. While the heavy exploration in the area makes innovation and improvement with regards to performance marginal, it simultaneously provides an effective baseline for developing new generalized techniques. Our group is particularly interested in examining curriculum learning and its effect on training efficiency and overall performance. We decide to try a lesser known approach to curriculum learning, in which the curriculum is not fixed or discretely thresholded. Our method for automated curriculum learning involves a curriculum that is dynamically chosen and learned by an adversary network made to increase the difficulty of the agent's training, and defined by multiple forms of input. Our results are shown in the following sections of this paper.
Rock Climbing Route Generation and Grading as Computational Creativity
In this paper, we bridge work in rock climbing route generation and grading into the computational creativity community. We provide the necessary background to situate that literature and demonstrate the domain's intellectual merit in the computational creativity community. We provide a guiding set of desiderata for future work in this area. We propose an approach to computational route grading. Finally, we identify important gaps in the literature and consider how they may be filled. This paper thus also serves as a pilot study, planting a flag for our ongoing research in this domain.
ChatGPT: can artificial intelligence create crosswords?
First, if you're a solver of the Mephisto series โ which is unusual in giving the actual names of its setters โ and have wondered what Paul McKenna does when he's not setting, you can now find out. The same setter is the Financial Times' Jason, and that paper interviews him as part of "an occasional series": Did your school mention crossword compiling in career discussions? It was never mentioned as a career option. I am a construction manager in the oil and gas pipeline industry. It is still a rare event for us to welcome a new compiler to the series.
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
Abramson, Josh, Ahuja, Arun, Carnevale, Federico, Georgiev, Petko, Goldin, Alex, Hung, Alden, Landon, Jessica, Lhotka, Jirka, Lillicrap, Timothy, Muldal, Alistair, Powell, George, Santoro, Adam, Scully, Guy, Srivastava, Sanjana, von Glehn, Tamara, Wayne, Greg, Wong, Nathaniel, Yan, Chen, Zhu, Rui
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4.
VREN: Volleyball Rally Dataset with Expression Notation Language
Xia, Haotian, Tracy, Rhys, Zhao, Yun, Fraisse, Erwan, Wang, Yuan-Fang, Petzold, Linda
This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science.
Imitating Interactive Intelligence
Abramson, Josh, Ahuja, Arun, Brussee, Arthur, Carnevale, Federico, Cassin, Mary, Clark, Stephen, Dudzik, Andrew, Georgiev, Petko, Guy, Aurelia, Harley, Tim, Hill, Felix, Hung, Alden, Kenton, Zachary, Landon, Jessica, Lillicrap, Timothy, Mathewson, Kory, Muldal, Alistair, Santoro, Adam, Savinov, Nikolay, Varma, Vikrant, Wayne, Greg, Wong, Nathaniel, Yan, Chen, Zhu, Rui
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
Classes and Instances ยท Crafting Interpreters
If you see a mistake, find something unclear, or have a suggestion, please let me know. Don't worry, I won't spam you.) Caring too much for objects can destroy you. Only--if you care for a thing enough, it takes on a life of its own, doesn't it? And isn't the whole point of things--beautiful things--that they connect you to some larger beauty? The last area left to implement in clox is object-oriented programming. OOP is a bundle of intertwined features: classes, instances, fields, methods, initializers, and inheritance. Using relatively high-level Java, we packed all that into two chapters. Now that we're coding in C, which feels like building a model of the Eiffel tower out of toothpicks, we'll devote three chapters to covering the same territory.
Automated curricula through setter-solver interactions
Racaniere, Sebastien, Lampinen, Andrew K., Santoro, Adam, Reichert, David P., Firoiu, Vlad, Lillicrap, Timothy P.
A BSTRACT Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula-the breakdown of tasks into simpler, static challenges with dense rewards-to build up to complex behaviors. While curricula are also useful for artificial agents, handcrafting them is time consuming. This has lead researchers to explore automatic curriculum generation. Here we explore automatic curriculum generation in rich, dynamic environments. Using a setter-solver paradigm we show the importance of considering goal validity, goal feasibility, and goal coverage to construct useful curricula. We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work. Finally, we demonstrate the value of a novel technique that guides agents towards a desired goal distribution. Altogether, these results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and to our knowledge are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes. 1 I NTRODUCTION Reinforcement learning (RL) algorithms use correlations between policies and environmental rewards to reinforce and improve agent performance. But such correlation-based learning may struggle in dynamic environments with constantly changing settings or goals, because policies that correlate with rewards in one episode may fail to correlate with rewards in a subsequent episode. Correlation-based learning may also struggle in sparsely rewarding environments since by definition there are fewer rewards, and hence fewer instances when policy-reward correlations can be measured and learned from. In the most problematic tasks, agents may fail to begin learning at all. While RL has been used to achieve expert-level performance in some sparsely rewarding games (Silver et al., 2016; OpenAI, 2018; Vinyals et al., 2019), success has often required carefully engineered curricula to bootstrap learning, such as learning from millions of expert games or handcrafted shaping rewards. In some cases self-play between agents as they improve can serve as a powerful automatic curriculum for achieving expert or superhuman performance (Silver et al., 2018; Vinyals et al., 2019).
Strange Beta: An Assistance System for Indoor Rock Climbing Route Setting Using Chaotic Variations and Machine Learning
Phillips, Caleb, Becker, Lee, Bradley, Elizabeth
This paper applies machine learning and the mathematics of chaos to the task of designing indoor rock-climbing routes. Chaotic variation has been used to great advantage on music and dance, but the challenges here are quite different, beginning with the representation. We present a formalized system for transcribing rock climbing problems, then describe a variation generator that is designed to support human route-setters in designing new and interesting climbing problems. This variation generator, termed Strange Beta, combines chaos and machine learning, using the former to introduce novelty and the latter to smooth transitions in a manner that is consistent with the style of the climbs This entails parsing the domain-specific natural language that rock climbers use to describe routes and movement and then learning the patterns in the results. We validated this approach with a pilot study in a small university rock climbing gym, followed by a large blinded study in a commercial climbing gym, in cooperation with experienced climbers and expert route setters. The results show that {\sc Strange Beta} can help a human setter produce routes that are at least as good as, and in some cases better than, those produced in the traditional manner.