to

### Giving a Genetic Algorithm Control in a Physics Engine

The most common/commercial application of AI is to gain insights or make predictions on a dataset. Even though I think that this is interesting, these techniques have been widely covered by others. This article covers a more experimental application of machine learning: this is allowing machine learning to manipulate a physics simulation. There is a lot of different possibilities of putting an AI into a physics environment, but all of them involve manipulating and controlling objects and forces in the environment. To get good results from this project, we need to add limitations to the AI's control in the environment.

### Sniffy Bug: a fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments

Tiny drones are ideal candidates for fully autonomous jobs that are too dangerous or time-consuming for humans. A commonly shared dream by engineers and fire & rescue services, would be to have swarms of such drones help in search-and-rescue scenarios [1], for instance to localize gas leaks without endangering human lives. Tiny drones are ideal for such tasks, since they are small enough to navigate in narrow spaces, safe, agile, and very inexpensive. However, their small footprint also makes the design of an autonomous swarm extremely challenging, both from a software and hardware perspective. From a software perspective, it is really challenging to come up with an algorithm capable of autonomous and collaborative navigation within such tight resource constraints.

### Differential Evolution from Scratch in Python

Differential evolution is a heuristic approach for the global optimisation of nonlinear and non- differentiable continuous space functions. The differential evolution algorithm belongs to a broader family of evolutionary computing algorithms. Similar to other popular direct search approaches, such as genetic algorithms and evolution strategies, the differential evolution algorithm starts with an initial population of candidate solutions. These candidate solutions are iteratively improved by introducing mutations into the population, and retaining the fittest candidate solutions that yield a lower objective function value. The differential evolution algorithm is advantageous over the aforementioned popular approaches because it can handle nonlinear and non-differentiable multi-dimensional objective functions, while requiring very few control parameters to steer the minimisation.

### A Gentle Introduction to Premature Convergence

Population-based optimization algorithms, like evolutionary algorithms and swarm intelligence, often describe their dynamics in terms of the interplay between selective pressures and convergence. For example, strong selective pressures result in faster convergence and likely premature convergence. Weaker selective pressures may result in a slower convergence (greater computational cost) although perhaps locate a better or even global optima. An operator with a high selective pressure decreases diversity in the population more rapidly than operators with a low selective pressure, which may lead to premature convergence to suboptimal solutions. A high selective pressure limits the exploration abilities of the population.

### Meta-control of social learning strategies

Social learning, copying other's behavior without actual experience, offers a cost-effective means of knowledge acquisition. However, it raises the fundamental question of which individuals have reliable information: successful individuals versus the majority. The former and the latter are known respectively as success-based and conformist social learning strategies. We show here that while the success-based strategy fully exploits the benign environment of low uncertainly, it fails in uncertain environments. On the other hand, the conformist strategy can effectively mitigate this adverse effect. Based on these findings, we hypothesized that meta-control of individual and social learning strategies provides effective and sample-efficient learning in volatile and uncertain environments. Simulations on a set of environments with various levels of volatility and uncertainty confirmed our hypothesis. The results imply that meta-control of social learning affords agents the leverage to resolve environmental uncertainty with minimal exploration cost, by exploiting others' learning as an external knowledge base.

### A Guide to Genetic 'Learning' Algorithms for Optimization

Genetic algorithms are random, adaptive heuristic search algorithms that act on a population of doable solutions. Genetic algorithms are based on the ideas of natural selection and genetics. New solutions are typically made by'mutating' members of this population, and by'mating' 2 resolutions along to create a replacement solution. The upper solutions are selected to breed and change and so the more severe ones are discarded. They are probabilistic search methods; this implies that the states that they explore are not determined entirely by the properties of the problems.

### SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.

### A Dataset-Level Geometric Framework for Ensemble Classifiers

Ensemble classifiers have been investigated by many in the artificial intelligence and machine learning community. Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning. However, understanding of them is incomplete at best, with some properties even misunderstood. In this paper, we present a group of properties of these two schemes formally under a dataset-level geometric framework. Two key factors, every component base classifier's performance and dissimilarity between each pair of component classifiers are evaluated by the same metric - the Euclidean distance. Consequently, ensembling becomes a deterministic problem and the performance of an ensemble can be calculated directly by a formula. We prove several theorems of interest and explain their implications for ensembles. In particular, we compare and contrast the effect of the number of component classifiers on these two types of ensemble schemes. Empirical investigation is also conducted to verify the theoretical results when other metrics such as accuracy are used. We believe that the results from this paper are very useful for us to understand the fundamental properties of these two combination schemes and the principles of ensemble classifiers in general. The results are also helpful for us to investigate some issues in ensemble classifiers, such as ensemble performance prediction, selecting a small number of base classifiers to obtain efficient and effective ensembles.

### Automated Self-Supervised Learning for Graphs

Graph self-supervised learning has gained increasing attention due to its capacity to learn expressive node representations. Many pretext tasks, or loss functions have been designed from distinct perspectives. However, we observe that different pretext tasks affect downstream tasks differently cross datasets, which suggests that searching pretext tasks is crucial for graph self-supervised learning. Different from existing works focusing on designing single pretext tasks, this work aims to investigate how to automatically leverage multiple pretext tasks effectively. Nevertheless, evaluating representations derived from multiple pretext tasks without direct access to ground truth labels makes this problem challenging. To address this obstacle, we make use of a key principle of many real-world graphs, i.e., homophily, or the principle that like attracts like,'' as the guidance to effectively search various self-supervised pretext tasks. We provide theoretical understanding and empirical evidence to justify the flexibility of homophily in this search task. Then we propose the AutoSSL framework which can automatically search over combinations of various self-supervised tasks. By evaluating the framework on 7 real-world datasets, our experimental results show that AutoSSL can significantly boost the performance on downstream tasks including node clustering and node classification compared with training under individual tasks. Code will be released at https://github.com/ChandlerBang/AutoSSL.

### Invariant Information Bottleneck for Domain Generalization

The main challenge for domain generalization (DG) is to overcome the potential distributional shift between multiple training domains and unseen test domains. One popular class of DG algorithms aims to learn representations that have an invariant causal relation across the training domains. However, certain features, called \emph{pseudo-invariant features}, may be invariant in the training domain but not the test domain and can substantially decreases the performance of existing algorithms. To address this issue, we propose a novel algorithm, called Invariant Information Bottleneck (IIB), that learns a minimally sufficient representation that is invariant across training and testing domains. By minimizing the mutual information between the representation and inputs, IIB alleviates its reliance on pseudo-invariant features, which is desirable for DG. To verify the effectiveness of the IIB principle, we conduct extensive experiments on large-scale DG benchmarks. The results show that IIB outperforms invariant learning baseline (e.g. IRM) by an average of 2.8\% and 3.8\% accuracy over two evaluation metrics.