[{"content":"A Quick Presentation of my Presentation Well, my research at the University of Tsukuba (Tsukuba, Japan) is about the study of the evolution of diploidy in the context of single genome soft robots. I\u0026rsquo;ll simply let you enjoy this presentation without any context. If I have time, I\u0026rsquo;ll try to make a video to present it in a more formal way.\nHere is the pdf of this presentation.\nInside this pdf, you\u0026rsquo;ll find the videos of the early results of my experiments. Enjoy the decoding !\nAcknowledgements I also want to give a very special thanks to my supervisor Pr. Claus Aranha for welcoming me in his lab and for his invaluable support, guidance and patience throughout this project.\n","permalink":"https://matteovacher.github.io/news/evogym-lab/first-steps/","summary":"A presentation of my research on Diploidy and Single Genome Soft Robots","title":"First Steps of my Research"},{"content":"Introduction Hello dear Reader, on this page I will explain my complete and systemic implementation of the EvoGym library. When I first read the work of my predecessor Fabio Tanaka, who did an extraordinary work on Single Genome Soft Robots, I was really confused on the whole implementation of the different programs that co-existed in his workspace. To implement my own version of Hyper-Neat, I spent hours understanding what he has done and why. This is mainly because of this reason that I\u0026rsquo;ll explain how my code works and how I will work in the future on the EvoGym library. This article is made for people who want to learn more about Entity Component System, or the EvoGym Library and how to use it, or just how I manage my projects to reproduce my work.\nTo finish the introduction, you can find the example of such an implementation on a GitHub repository that I have made specially for this article here. The key idea of this small example is to evolve the the controller neural network of the robot in EvoGym with an evolutionary algorithm.\nWhat is Entity Component System ? First of all, this idea of using Entity Component System (ECS) has not emerged from me but from my supervisor (Claus Aranha, University of Tsukuba). I hope that this article will help others as much as it helped me organize my work, and for this particular reason, I wanted to thank Claus.\nA brief example Well, ECS is usually involved while creating video games, therefore the example I\u0026rsquo;m about to write will be about the creation of video games. I\u0026rsquo;m not a creator of video games myself, but it surely joins one big interest of mine : Simulations.\nSo, that being said, let\u0026rsquo;s go back to creating video games. Imagine yourself, two seconds trying to implements Non Playable Characters (NPC) in a fantasy world where the hero has to explore a vast land filled with lots of available races for these NPCs (Goblins, Elves\u0026hellip;). In the mountains of the East you find Goblins, a cruel race that will steal every piece of gold that you carry. On the West part of this land resides dozens of Human towns with lots of market places. Well at some point a tiny and very kind goblin decided to quit his life of violence in the mountains and decided to reside on one of the many human town and to become a merchant. Normally in Object-Oriented Programming you would implement a Monster Class and then implement a Goblin Class that inherits from the Monster one, then the same goes for humans and their merchants. At the end of the day, it appears complicated to add other features to our poor little Goblin that just want to live a normal life as merchant.\nThen, a simple way to implements all of this, and to manage every entity without going deeply into Class inheritance is to think about adding component to our entity/goblin. For example, in the case of our charming Goblin, we would add to the entity id number 7476572645 the following components/features :\nSkin Color : green Length : short (around 120cm probably) Ears : long and pointed Profession : merchant Health : 100 Weapon : axe This is a simple example but this way we can simply destroy/create/manage the desired components/features of every entity in the game.\nHow to describe such an architecture ? What\u0026rsquo;s inside As you\u0026rsquo;ve probably understood, ECS is a software architectural pattern. It contains 3 fundamental things :\nEntity, it is a unique ID that we give to every entity in the simulation. This ID will allow us to go get the components of a given entity and to modify them.\nComponents, it is what will compose our entity, an entity can have as much as components as we want. But every entity must have one different component for a given type. For example, entity 793468682 can only have one component Health, not two. This will allow us to use dictionaries to store data and access them quickly. Here you might say : \u0026lsquo;Then it is completely useless if my entity can only have one weapon and not two\u0026rsquo;. Well, if you want any character to possess more than one weapon simply add another slot of weapon, but in each slot, only a single weapon will be stored.\nSystems, it is what will handle the registry of components and will write what different entity possesses. It\u0026rsquo;s basically functions that takes every individuals possessing the same components and write/modify information directly in the component registries.\nWhat does it look like ? I have added here some other elements of the architecture that I will explain later.\nproject | |-main.py |-entity_manager.py |-components.py |-registry.py |-world.py | | systems |---| | |-system1.py | |-system2.py | | tools |---| | |-tool1 | |-tool2 entity_manager.py The aim of this file is to create, destroy, know which entity is alive. Your entity manager should looks like this :\nclass EntityManager : def __init__(self) : self.next_id = 0 self._alive = set() def create_entity(self) : id = self.next_id self.next_id += 1 self._alive.add(id) return id def destroy_entity(self, entity_id) : self._alive.remove(entity_id) def is_alive(self, entity_id) : return entity_id in self._alive Here, entity_id is a unique ID associated to a single entity. Knowing which entity is alive will be useful for the simulation and not to run the programs on every entity created since the beginning of the simulation. Moreover, if your experience uses lots of RAM, you will be able to remove the old entity from their registry, therefore guaranteeing a lower memory usage and longer runs.\ncomponents.py In this file, we only implements the components. Remember that a component is an object that does not possess any function, its unique purpose is to store data, not to process them (this will be the job of the systems).\nHere is the example of a components file :\nclass GenomeComponent : def __init__(self, connections, nodes) : self.connections = connections self.nodes = nodes class FitnessComponent : def __init__(self, fitness, finished) : self.fitness = fitness self.finished = finished class ControllerComponent : def __init__(self, node_evals, input_nodes, output_nodes) : self.node_evals = node_evals self.input_nodes = input_nodes self.output_nodes = output_nodes See, we only store data and don\u0026rsquo;t try to process them.\nregistry.py Well, here we begin to explore more deeply the architecture. The idea to access the components of a given entity is to use the registry. Basically, the registry stores, in dictionaries, the id of an individual and the object component that store the data.\nThe registry looks like this :\nfrom components import * class ComponentRegistry : def __init__(self) : self.genome_registry = {} self.fitness_registry = {} self.controller_registry = {} # ADDER METHODS def add_genome(self, entity_id, connections, nodes) : self.genome_registry[entity_id] = GenomeComponent(connections, nodes) def add_fitness(self, entity_id, fitness, finished) : self.fitness_registry[entity_id] = FitnessComponent(fitness, finished) def add_controller(self, entity_id, node_evals, input_nodes, output_nodes) : self.controller_registry[entity_id] = ControllerComponent(node_evals, input_nodes, output_nodes) # GETTER METHODS def get_genome(self, entity_id) : return self.genome_registry[entity_id] def get_fitness(self, entity_id) : return self.fitness_registry[entity_id] def get_controller(self, entity_id) : return self.controller_registry[entity_id] # CHECKER METHODS def has_genome(self, entity_id) : return entity_id in self.genome_registry def has_fitness(self, entity_id) : return entity_id in self.fitness_registry def has_controller(self, entity_id) : return entity_id in self.controller_registry # ADVANCED GETTER METHODS def get_all_id_with_genome(self) : return self.genome_registry.keys() def get_all_id_with_fitness(self) : return self.fitness_registry.keys() def get_all_with_controller(self) : return self.controller_registry.keys() # MODIFIERS, please give an object def modify_genome(self, entity_id, genome) : self.genome_registry[entity_id] = genome def modify_fitness(self, entity_id, fitness) : self.fitness_registry[entity_id] = fitness def modify_controller(self, entity_id, controller) : self.controller_registry[entity_id] = controller # CLEARER METHODS def clear_all_except_genome(self) : self.fitness_registry.clear() def clear_genome(self) : self.genome_registry.clear() def clear_fitness(self) : self.fitness_registry.clear() def clear_controller(self) : self.controller_registry.clear() So, here we have the registry that manages what component is associated to which entity. Moreover this registry is able to get a component given an ID, or to add a component to an entity, or to modify a component, or to remove a component, or to know if an entity has a component or not.\nThe only bothering thing here is that every time you want to create a new component for your project, you have to add it to the registry and write all of its associated methods which can consume time that you don\u0026rsquo;t want to spend on this.\nTools Before showing you what a typical system looks like, I want to talk a little bit about tools. Tools are meant to allow you to handle components data and to use them. For example here, in the components.py file above, I have a neural network component, but to create it, I need to extract the building information from the genome component. This is exactly the role of the tools. Here it would be extracting the building information from the genome component and create all of the data necessary to create a neural network component. To achieve this, I would simply write this function in a controller_operator.py file. Here is what the controller_operator.py file looks like :\nimport math class ControllerOperator : activation_function = math.tanh output_activation_function = lambda x : x agregation_function = sum response = 1 bias = 0 def __init__(self) : pass def generate_controller_from_genome(self, genome) : node_evals = [] for index_of_layer in range(len(genome.nodes.keys())) : if index_of_layer == 0 : input_nodes = [] for input_node in genome.nodes[index_of_layer] : input_nodes.append(input_node) previous_layer = genome.nodes[index_of_layer] continue if index_of_layer == len(genome.nodes.keys()) - 1 : output_nodes = [] for output_node in genome.nodes[index_of_layer] : output_nodes.append(output_node) for node in genome.nodes[index_of_layer] : inputs_of_node = [] for previous_node in previous_layer : weight = genome.connections[(previous_node, node)] inputs_of_node.append((previous_node, weight)) node_evals.append((node, self.activation_function, self.agregation_function, self.bias, self.response, inputs_of_node)) previous_layer = genome.nodes[index_of_layer] return node_evals, input_nodes, output_nodes def activate(self, controller, input_values) : values = {} for key, value in zip(controller.input_nodes, input_values) : values[key] = value for node, activation_function, agregation_function, bias, response, inputs_of_node in controller.node_evals : node_inputs = [] for previous_node, weight in inputs_of_node : node_inputs.append(values[previous_node] * weight) entering_node = agregation_function(node_inputs) values[node] = activation_function(bias + response * entering_node) return [values[node] for node in controller.output_nodes] Then, these tools will be used in the systems. This will allow the reader to easily understand (I hope so) what is going on in the different system files and to easily modify the different tools and systems if it is required.\nSystems Systems are meant to to write/modify data from the registries.\nI think that here an example is ten times more valuable than explanations. The following evaluation_system.py file will take all the individual alive and then add their controller to the registry before evaluate them and proceed to add their fitness to the registry. Here goes the file :\nimport numpy as np class EvaluationSystem : def __init__(self, entity_manager, controller_operator, config, robot_simulator, reporter_tool, parallel_tool) : self.config = config self.entity_manager = entity_manager self.controller_operator = controller_operator self.robot_simulator = robot_simulator self.reporter_tool = reporter_tool self.parallel_tool = parallel_tool self.generation = 1 def __str__(self) : return \u0026#34;EvaluationSystem, evaluate al individuals in the current population and add their fitness to registry\u0026#34; def process(self, registry) : self.reporter_tool.start_generation(self.generation) self.generation += 1 entity_ids = [id for id in registry.get_all_id_with_genome() if self.entity_manager.is_alive(id)] for entity_id in entity_ids : genome = registry.get_genome(entity_id) node_evals, input_nodes, output_nodes = self.controller_operator.generate_controller_from_genome(genome) registry.add_controller(entity_id, node_evals, input_nodes, output_nodes) controllers = [registry.get_controller(entity_id) for entity_id in entity_ids] body = self.config.body bodies = [np.array(body) for _ in range(len(entity_ids))] function = self.robot_simulator.simulate chunk = list(zip(entity_ids, bodies, controllers)) results = self.parallel_tool.run(function, chunk) fitnesses = [] ids = [] for entity_id, fitness, finished in results : ids.append(entity_id) fitnesses.append(fitness) registry.add_fitness(entity_id, fitness, finished) fitnesses = np.array(fitnesses) arg_sorted_fitnesses = np.argsort(fitnesses) bests = [] number_of_reported_individuals = self.config.number_of_reported_individuals for taken in range(number_of_reported_individuals) : id = arg_sorted_fitnesses[len(entity_ids) - 1 - taken] bests.append((ids[id], fitnesses[id])) self.reporter_tool.bests(bests) world.py One of the last thing that we must do is to assemble all the different systems together in a world. This world is also an object, it contains all the different systems and the registry. Here goes the example of the world.py file :\nfrom registry import ComponentRegistry class World : def __init__(self) : self.registry = ComponentRegistry() self._builder_systems = [] self._step_systems = [] self.all_systems = [] def add_builder_system(self, system) : self._builder_systems.append(system) self.all_systems.append(system) def add_step_system(self, system) : self._step_systems.append(system) self.all_systems.append(system) def reset(self) : self.registry.clear_all_except_genome() def build(self) : for system in self._builder_systems : system.process(self.registry) def step(self) : for system in self._step_systems : system.process(self.registry) main.py The last thing to do is to put all of this together in a main file to run the simulation. Here goes the example of the main.py file :\nimport os import json from config import Config from entity_manager import EntityManager from world import World from systems.build_system import BuildSystem from systems.evaluation_system import EvaluationSystem from systems.tournament_system import TournamentSystem from tools.controller_operator import ControllerOperator from tools.genome_operator import GenomeOperator from tools.robot_simulator import RobotSimulator from tools.parallel_tool import ParallelTool from tools.reporter_tool import ReporterTool from results_manager.results_saver import ResultsSaver def main() : entity_manager = EntityManager() world = World() config_path = input(\u0026#34;\\nEnter the path to the config file from the configs folder (can be just config.json) : \u0026#34;) local_dir = os.path.dirname(os.path.abspath(__file__)) config_path_final = os.path.join(local_dir, \u0026#34;configs\u0026#34;, config_path) with open(config_path_final, \u0026#39;r\u0026#39;) as f : config = json.load(f) config = Config(config) results_saver = ResultsSaver() results_saver.add_results_path() controller_operator = ControllerOperator() robot_simulator = RobotSimulator(config, controller_operator) genome_operator = GenomeOperator(config, robot_simulator) parallel_tool = ParallelTool(config) reporter_tool = ReporterTool(config) build_system = BuildSystem(config, entity_manager, genome_operator, reporter_tool) evaluation_system = EvaluationSystem(entity_manager, controller_operator, config, robot_simulator, reporter_tool, parallel_tool) tournament_system = TournamentSystem(entity_manager, config, genome_operator, reporter_tool) world.add_builder_system(build_system) world.add_step_system(evaluation_system) world.add_step_system(tournament_system) world.build() for generation in range(config.generations) : world.step() results_saver.save_results(world.registry, config, config_path) if __name__ == \u0026#34;__main__\u0026#34; : main() As you can see, this file is just there to initialize every object and run the different systems in a loop and in the right order.\nConclusion Now we have all the necessary information to implement a simulation using ECS. This was the first part of this series of article, next I\u0026rsquo;ll explain how to install the different necessary libraries and use Docker to run all of your EvoGym simulations in a container. Moreover, I will shortly explain how I use the Evogym library to reproduce my work.\n","permalink":"https://matteovacher.github.io/news/evogym-lab/evo-ecs/","summary":"Here I present my EvoGym-ECS package so that people can better handle EvoGym and their research.","title":"Entity Component System"},{"content":"Introduction In the previous section on Evolutionary Strategies (ES), we saw how a population of individuals sampled around a center can approximate a gradient without ever computing a derivative. The key limitation, however, is that standard ES uses a fixed, isotropic distribution - a perfect sphere of noise. The algorithm does not learn the shape of the landscape it is exploring.\nCMA-ES (Covariance Matrix Adaptation Evolution Strategy) solves exactly this problem. Instead of sampling noise from a fixed spherical distribution, it progressively deforms the sampling distribution to match the geometry of the search space.\nFrom ES to CMA-ES What does ES miss In Canonical ES, the offspring are generated as follow :\n$$x_i = x + \\sigma \\cdot \\epsilon_i, \\text{ with } \\epsilon_i \\sim N(0, 1)$$\nwith :\n$x$ the current center of the population. $\\sigma$ the step size (standard deviation), a fixed scalar. $\\epsilon_i$ a noise vector drawn from a standard normal distribution (isotropic, i.e. all directions are likely to be chosen). The problem is that many real-world landscapes are not isotropic. The optimum might be located at the end of a long, narrow valley -and a spherical distribution will sample mostly useless points on the sides of the valley instead of exploring along it. This is where an ellipsoidal distribution is better to explore the search space. Next question : How can we learn the local geometry of the landscape ?\nCMA-ES addresses this by replacing the fixed identity matrix $I$ with a covariance matrix $C$ that learns the local geometry of the landscape over generations.\nRank-Based Weighting Building on what we saw in ES, CMA-ES still selects the top $\\mu$ individuals from a population of $\\lambda$ offspring. The key is that it uses fitness rank rather than fitness values, which makes the algorithm robust to the scale and shape of the objective function.\nThe weights associated to the $\\mu$ selected individuals follow (as seen in the previous article) a logarithmic decay :\n$$w_i = \\frac{\\ln(\\mu + \\frac{1}{2}) - \\ln(i)}{\\sum_{j=1}^{\\mu}\\left [\\ln(\\mu + \\frac{1}{2}) - \\ln(j)\\right]}, \\quad i = 1, \u0026hellip;, \\mu$$\nwith :\n$w_1 \\geq w_2 \\geq \u0026hellip; \\geq w_\\mu \u0026gt; 0$ and $\\sum_{i=1}^{\\mu} w_i = 1$. The best individual (rank 1) gets the most weight, the $\\mu$-th individual gets the least but still got one because he belongs to the elites of the population. This logarithmic scale makes sure that the best individuals drive the direction of update, but every selected individual contributes.\nThe weighted direction of movement (equivalent to an approximate gradient) is then :\n$$s = \\sum_{i=1}^{\\mu} w_i \\cdot \\epsilon_{\\sigma(i)}$$\nwith :\n$\\epsilon_{\\sigma(i)}$ the noise vector of the $i$-th best individual (sorted by rank). $s$ the weighted step direction - this is the signal CMA-ES uses to update both the center and the covariance matrix. The Covariance Matrix The covariance matrix $C$ measures how much the different dimensions of the search space vary together. For instance, if $C_{i, j} = 1$, if the $i$-th coordinate increase, the $j$-th coordinate will increase as well. At initialization, it is set to the identity matrix :\n$$C = I$$\nwhich corresponds to an isotropic (spherical) distribution - no dimension is preferred over another. As the algorithm runs, $C$ is updated to reflect the directions in which the selected offsprings tend to be better.\nIntuitively, if the best individuals are always found along a particular diagonal direction in the search space, $C$ will elongate the sampling distribution along that direction. To put it more concretely, if the best individuals are always located along a certain axis, the geometrical shape that will determine where to put the next offsprings will highly resemble to a linear ellipse in that direction.\nUpdating the Covariance Matrix The Rank-One Update The simplest way to update $C$ is to use the outer product of the current step direction $s$ with itself :\n$$C \\leftarrow (1 - c_i) \\cdot C + c_i \\cdot s \\cdot s^\\top$$\nwith :\n$c_i \\approx \\frac{2}{n^2}$ the learning rate for the covariance matrix update, where $n$ is the dimension of the problem. $s \\cdot s^\\top$ a rank-one matrix that points in the direction we just moved - it \u0026ldquo;stretches\u0026rdquo; $C$ along that direction. $(1 - c_i)$ the forgetting factor - old information about $C$ decays slowly to make room for new observations. Each generation, we add a small piece of information about which direction was productive, and we dilute the past history slightly. By pulling samples from a distribution fit to performant parts of the search space, there is a higher chance of sampling good individuals.\nTransforming the Distribution Once $C$ is no longer the identity matrix, we can no longer simply sample $\\epsilon \\sim N(0, 1)$ and multiply by $\\sigma$. We need to transform the distribution to match the geometry of $C$.\nTo do this efficiently, CMA-ES uses the eigendecomposition of $C$ :\n$$C = P \\cdot D \\cdot P^\\top$$\nwith :\n$P$ the matrix of eigenvectors of $C$ - the principal axes of the sampling ellipse. Said otherwise, $P$ is the change-of-basis matrix from the standard orthonormal basis to the eigenvector basis of matrix C $D$ a diagonal matrix with the eigenvalues of $C$ - the lengths of the principal axes. A new offspring is then generated as :\n$$x_i = x + \\sigma \\cdot P \\cdot D^{1/2} \\cdot \\epsilon_i, \\quad \\epsilon_i \\sim N(0, 1)$$\nGeometrically, $D$ scales the noise and $P$ rotates it by a change of basis. The result is a sample from an ellipsoidal distribution aligned with the most productive directions of the landscape.\nIt is in these moments that I highly want to thank my Maths teachers from my Preparatory School : Miss Benhamou (MPSI 1) and Mr Mohan (PSI * 1). By choosing this path I can now focus more on the content of the course on itself rather than on mathematical equations.\nStep-Size Control Adapting $C$ alone is not sufficient. If $\\sigma$ (the overall scale) is too large, the population will fail to converge to the optimum. If it is too small, convergence stalls.\nA simple method called the one-fifth rule gives the intuition : if more than one fifth of the offspring are better than the parent, the step size is too small and should increase; on the other side, if less than one fifth are better, it is too large and should decrease.\nCMA-ES implements a more complicated version of this idea called Cumulative Step-size Adaptation (CSA). Without going into the details of the formula (which involves an evolutionary path tracking the history of recent steps), the principle is the same as the one-fifth rule : the algorithm watches whether recent steps are consistently going in the same direction (increase $\\sigma$) or cancelling each other (decrease $\\sigma$). The key difference here, is that CSA accumulates this information over several generations rather than just looking at the last one, which makes it more stable. At this stage I would not be able to implement this by myself.\nThe CMA-ES Algorithm Putting everything together :\nAlgorithm 1: CMA-ES $(\\mu, \\lambda)$\nInitialize center x, step size sigma, covariance matrix C = I Initialize evolution paths s = 0, s_sigma = 0, c_i = 2/n^2 # Precompute weights w = [log(mu + 1/2) - log(i) for i = 1 to mu] w = w / sum(w) For each generation g from 1 to G : # 1. Eigendecomposition of C P, D = eig(C) # C = P * D * P^T D = sqrt(D) # 2. Generate lambda offspring For each individual i from 1 to lambda : epsilon[i] = Normal(0, 1, size=n) x_i = x + sigma * P * D * epsilon[i] f[i] = objective(x_i) End For # 3. Sort by fitness (best first) sorted_ids = argsort(f) # 4. Update center # y_i is the actual displacement from the center to offspring i y = [sigma * P * D * epsilon[i] for i = 1 to lambda] s = sum(w[i] * y[sorted_ids[i]] for i = 1 to mu) x = x + s # 5. Update covariance matrix C = (1 - c_i) * C + c_i * s * s^T # 6. Update step size sigma = csa(...) End For Return x or the best individual found In Practice In the notebook of the elective, CMA-ES is applied to standard benchmark functions : the Rosenbrock function (a narrow curved valley, hard for isotropic methods) and the shifted Rastrigin function (highly multimodal, with many local optima).\nThe pycma library (actively maintained by Hansen, one of the original authors) provides a production-ready implementation :\nimport cma es = cma.CMAEvolutionStrategy(x0 = 2 * [0], sigma0 = 0.1, {\u0026#39;popsize\u0026#39;: 20}) solutions = np.array(es.ask()) # ask for new offsprings es.tell(solutions, [optf(x[0], x[1]) for x in solutions]) # update the characteristics of the algorithm # Then repeat for each generation A few observations from the exercises in the notebook :\nOn the Rosenbrock function, CMA-ES converges quickly because the covariance matrix learns to align with the valley direction. On the shifted Rastrigin function, success depends heavily on the initial $\\sigma$ : too small and the algorithm gets trapped in a local optimum; large enough and CMA-ES can escape and find the global minimum. Final Remarks CMA-ES represents a significant improvement over standard ES. By combining rank-based selection, covariance matrix adaptation, and cumulative step-size control, it turns the blind exploration of ES into a self-adaptation, geometry-aware search. The algorithm progressively learns not just where to go, but in what shape to explore.\nIts main limitation is computational : the eigendecomposition of $C$ costs $O(n^3)$ per generation, which becomes prohibitive in very high dimensions (I don\u0026rsquo;t know when $n$ is too big).\n","permalink":"https://matteovacher.github.io/resources/courses-evo/cmaes/","summary":"Synthesis of the core concepts of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES)","title":"CMA-ES - Evolutionary Computation Elective by Prof. Dennis Wilson, ISAE-SUPAERO"},{"content":"Introduction Careful : This is a really long article. If you want to see the videos of the simulation, all of them will be displayed below. I also want to add that this article will probably be the last of this series since it took me an incredible amount of time and that I also want to explore other things in the future. I am now in Tsukuba and I am looking forward writing an article about my current project. I also want to master C++. I guess all of this will take me a lot of time since I am really dedicated on these projects. Anyway this series gave me a first glimpse of what research could look like and I really like it. It also gave me the opportunity to master a few tools I never had the chance to use before. I am writing these lines after having finish this article and the videos. The only video that will be displayed in the introduction is the video where everything works :)\nIn the previous post, we built the Ant class : movement by angle, rebound on the walls, pheromones deposit and antennas positions. But the antennas were only computed, not used. The ants were walking randomly and did not read the pheromones they were depositing.\nStep 3 is about connecting the sensory inputs to the movement. The rule is still hand-crafted - a simple weighted difference between the left and the right antenna - but it is enough to observe the first emergent trails. The goal of this step is to see a collective structure appear from purely local decisions, without any central coordination.\nThe path to get there was not straight. Many parameters were tested, many things did not work, and the model itself evolved during the process (2 antennas became 3, a single evaporation rate became two, the deposit rule was rewritten). The article below describes the final state of the model and the reasoning behind each choice.\nThis article, by its complexity, is a really long one, and if you are more interested in the results than in the process (which I totally understand), I highly recommend you to skip all this article to directly go see the videos at the end of it.\nThe Starting Point Before describing the final model, it is useful to show where Step 3 actually started. Here is the config.py at the very beginning of this step, before any of the changes described below :\n# GRID AND PHEROMONE EVAPORATION_RATE = 0.997 # between 0 and 1, the higher the slower the evaporation DIFFUSION_SIGMA = 0.3 # between 0 and inf, the higher the more the pheromone spreads, but also the more it evaporates GRID_WIDTH = 360 # width of the grid in cells, also in pixels if cell size is 1, max 2880 for 8k screen GRID_HEIGHT = 240 # height of the grid in cells, also in pixels if cell size is 1, max 1920 for 8k screen CELL_SIZE = 3 # size of each cell in pixels FPS = 24 # frames per second WINDOW_WIDTH = GRID_WIDTH*CELL_SIZE # width of the window = width*cellsize (2880 pixels max ) WINDOW_HEIGHT = GRID_HEIGHT*CELL_SIZE # height of the window = height*cellsize (1920 pixels max ) PHEROMONE_DEPOSIT = 0.7 # amount of pheromone deposited by an ant at each step, between 0 and 1 # ENVIRONMENT COLOR_HOME = (139, 90, 43) # brown color for the pheromone leading to the nest COLOR_FOOD = (50, 205, 50) # green color for the pheromone leading to the food COLOR_BACKGROUND = (0, 0, 0) # black color for the background COLOR_NEST = (255, 255, 255) # white color for the nest NEST_RADIUS = 2 # radius of the nest in cells NEST_X = 100 # coordinates of the nest NEST_Y = 100 # coordinates of the nest # ANT N_ANTS = 20 # number of ants in the simulation, must be an integer greater than 0 LENGTH_ANTENNA = 0.5 # length from the head to the tip of the antenna in cells, must be greater than 0 ANGLE_ANTENNA = np.pi/4 # angle between the direction of the ant and the direction of the antenna in radians, between 0 and pi/2, if 0 then antennas are in the same direction as the ant, if pi/2 then antennas are perpendicular to the direction of the ant COLOR_ANT = (255, 165, 0) # orange color for the ants MAX_FOOD_CARRIED = 0.5 # maximum amount of food an ant can carry, between 0 and 1, if 0.5 then an ant can carry half of a food source FOOD_COLLECT_AMOUNT = 0.5 # amount of food an ant can collect at one time, between 0 and MAX_FOOD_CARRIED EAT_DURATION = 8 # number of steps an ant needs to eat a food source, during this time the ant cannot move or interact with other food sources, must be an integer ANTENNA_WEIGHT = np.pi/3 # weight of the pheromone bias on the ant\u0026#39;s direction, between 0 and pi/2, if 0 then the ant ignores the pheromones, if pi/2 then the ant turns directly towards the strongest pheromone TRESHOLD_FOOD = 0.45 # threshold of food carried for an ant to switch from following home pheromone to following food pheromone, between 0 and MAX_FOOD_CARRIED RANDOM_DIR = np.pi/8 # maximum random change in direction for an ant at each step, between 0 and pi, if 0 then the ant never changes direction randomly, if pi then the ant can turn in any direction at each step HALF_LENGTH_BODY = 0.5 # half of the length of the ant\u0026#39;s body in cells, used for drawing the ant as a line before adding the antenna # FOOD SOURCES N_FOOD_TYPES = 2 # number of different types of food sources, must be an integer greater than 0 COLOR_APHID = (255, 220, 0) # color yellow for aphids, which are a type of food source that can recharge COLOR_SUGAR = (100, 200, 255) # color blue for sugar, which is another type of food source that does not recharge RECHARGE_RATE_APHID = 0.01 # recharge rate for aphids, between 0 and 1 RECHARGE_RATE_SUGAR = 0 # recharge rate for sugar, must be 0 since sugar does not recharge From this starting configuration, many things evolved. One of the very first changes was about the behavior at the food source. In the initial version, once an ant had finished eating, she just kept walking in the same direction she was heading before, and simply crossed the food source without turning back to the nest. To fix this, I added an automatic U-turn right after food collection, so that the ant now faces the opposite direction when she leaves the source. It is a small change, but it is the one that unlocked the very first complete nest -\u0026gt; food -\u0026gt; nest cycles.\nA Simple Angular Rule The ant has two antennas, left and right. Each one reads the pheromone concentration at its position. If the concentration on the left is higher, the ant should turn to the left, and the opposite if the right is higher. The simplest way to translate this into code is a weighted difference :\n$$\\delta\\theta = W \\cdot (C_L - C_R) + U(-\\delta_r, \\delta_r)$$\nwhere $W$ is ANTENNA_WEIGHT, $C_L$ and $C_R$ are the concentrations at the left and right antenna, and the last term is a uniform random noise of amplitude RANDOM_DIR. The noise is here to keep an exploration component - without it, the ants would stick to the strongest trail and never find new food sources.\nWhich pheromone to follow ? An ant with food wants to go back to the nest, an ant without food is looking for a source. So the rule is simple :\nfood_carried \u0026gt; THRESHOLD -\u0026gt; follows the HOME pheromones food_carried \u0026lt;= THRESHOLD -\u0026gt; follows the FOOD pheromones This hand-crafted rule is not the final objective. The final objective is to replace it with a neural network that learns the mapping from sensory inputs to movement. But before that, we need a baseline that works - something good enough to produce emergent trails, so we can study them and later compare them with what the network produces.\nFrom Two Antennas to Three The left/right rule alone is not enough. In straight lines it works fine, but in a curve the ant often leaves the trail. My hypothesis is that during the turn the two antennas miss the trail at the same time : the trail is thin, and if none of the two antennas happens to be on it, the ant reads zero on both sides. The angular bias becomes null, only the random noise remains, and the ant drifts away from the path.\nTo fix that, a third antenna is added in front of the ant. The rule becomes :\nif $C_F \\geq \\max(C_L, C_R)$ -\u0026gt; the concentration is highest in front, so we do not bias the direction and the ant keeps going straight (up to the random noise). otherwise -\u0026gt; we apply the same left/right weighted difference as before. This is inspired by Draft et al. (2018). They show that during trail tracking, carpenter ants oscillate their antennas perpendicularly to the trail. Each sweep of the head produces a temporal concentration gradient : a short moment of high concentration when the antenna crosses the trail, followed by a lower signal. In our model the oscillation is not explicit, but the front antenna plays the same role : it tells the ant \u0026ldquo;you are still on the trail, keep going\u0026rdquo;.\nOne more geometric detail. At first, the antennas were starting from the center of the ant, which is biologically wrong : the real antennas start from the head, not from the middle of the body. A new constant HALF_LENGTH_BODY was added so that the antennas start at the head position, computed as (x + HALF_LENGTH_BODY * cos(theta), y + HALF_LENGTH_BODY * sin(theta)).\nA Step-Counting Deposit Until now, an ant was depositing a constant amount of pheromone at each step. The problem is that the trail looks the same everywhere along the path : no information about the distance to the source or to the nest. If a network (or a hand-crafted rule) only reads the concentration, it cannot tell if it is close to the source or far from it.\nThe solution used here is to make the deposit decrease with the number of steps since the last reset (arrival at the nest, or collection of food). At each step, the deposit value is multiplied by a factor $\\lambda \u0026lt; 1$ :\n$$\\text{deposited}_{t+1} = \\lambda \\cdot \\text{deposited}_t$$\nIn the code, $\\lambda$ is the constant DECAY_FACTOR_STEP. This produces a gradient of intensity along the trail : strong at the start (just after the nest or just after the food), weak at the end. An ant that reads a rising concentration is moving toward the source of the trail, an ant that reads a falling concentration is moving away from it.\nBiologically, this is not completely arbitrary. Wittlinger et al. (2006), in their \u0026ldquo;stilts and stumps\u0026rdquo; experiment, show that ants count their steps to estimate the distance walked. Collett et al. (2025) cite this as one of the core mechanisms used by ants for navigation. So a deposit that depends on the number of steps since the last event is a reasonable inspiration, even if the real chemical process is more complex than a simple geometric decay.\nOne important nuance here : the Wittlinger experiment was done on Cataglyphis, the desert ant. The stride-counting mechanism (a pedometer-like integrator of the steps) is specific to this genus, which is adapted to navigate in open terrain with almost no pheromone trails and almost no visual landmarks (except perhaps the sun). Applying this mechanism directly to forest ants or to Pheidole species is therefore a simplification : I keep it here because the idea of a distance-dependent deposit is useful for the model, not because every ant species really does this.\nTwo Pheromones, Two Dynamics In Step 1 we introduced two layers of pheromones, HOME and FOOD, but they were sharing the same evaporation rate and the same diffusion sigma. After many simulations it became clear that this symmetric treatment was not the right choice.\nThe main inspiration comes from Dussutour et al. (2009), who studied Pheidole megacephala and described how the colony uses two chemical signals of very different nature. One is a long-lasting exploration signal, deposited in many places across the territory, which works as a slow memory of where the ants have been. The other is a short-lived and stronger recruitment signal, deposited by ants returning from a food source, which triggers a fast collective response. A personal discussion with Guy Theraulaz later confirmed this reading and added a useful nuance : the HOME signal in our model is probably closer to a passive body odor trail (cuticular hydrocarbons) than to a true recruitment pheromone.\nTranslated into the code, this gives two separate constants for each process :\nEVAPORATION_RATE_HOME = 0.999 EVAPORATION_RATE_FOOD = 0.996 DIFFUSION_SIGMA_HOME = 0.25 DIFFUSION_SIGMA_FOOD = 0.275 The HOME trail lasts longer and is slightly less diffused, which fits the role of a stable map of the colony\u0026rsquo;s territory. The FOOD trail is more volatile and spreads a bit more, which makes sense for a signal that is supposed to attract other ants quickly and then disappear when the source is entirely consumed.\nA More Realistic Deposit Rule The first version of the deposit was the simplest possible : at each step, we add value to the cell and clip the result to 1.\nself.grids[type_of_pheromone, y, x] = min(1, self.grids[type_of_pheromone, y, x] + value) On paper it works, but in the simulation the cells crossed by many ants saturate to 1 almost immediately and stay there forever. Everywhere the ants pass often, the concentration is a flat surface at 1, with no gradient left for the antennas to read.\nA better rule is to deposit an amount that depends on how much space is still available on the cell. The new formula is :\n$$C_{t+1} = C_t + value \\cdot (1 - C_t)$$\nwhere $v$ is the amount the ant wants to deposit and $C_t$ is the current concentration. When the cell is empty ($C_t = 0$), we add the full $value$. When the cell is full ($C_t = 1$), we add nothing. In between, the deposit is proportional to the remaining margin $(1 - C_t)$. The concentration converges asymptotically to 1 without ever getting stuck on the saturation surface.\ncurrent = self.grids[type_of_pheromone, y, x] self.grids[type_of_pheromone, y, x] = min(1, current + value * (1 - current)) The min(1, ...) stays as a safety, but mathematically the sum cannot exceed 1 anymore.\nBeyond the realism argument, there is also a secondary benefit that is relevant for the bug we observed before : when two HOME trails arrive at the nest from two different directions, their cells near the nest used to stay at 1 in a large saturated zone. Diffusion then spread this surface into neighbor cells on both sides, which is how the two trails were merging into a single wide zone where the ants could not distinguish one path from the other. With the new rule, cells near the nest no longer saturate permanently (or at least not immediately), so the diffusion from these cells injects less pheromone into the neighbors and the two trails stay more distinct for longer.\nHonestly, I do not think this will fix the whole problem. The main reason why ants get lost on a merged trail is probably the simplicity of the model itself : the ant has no memory, and only reads three scalar values at each step. If two trails merge, the left/right/front rule just follows the strongest gradient and does not care about where this gradient actually leads. The deposit rule change is a real improvement, but the deeper problem will probably only be solved when the hand-crafted rule will be replaced by a network that can learn to integrate information over time.\nSeparating Physics from Rendering At the start of the project, ants and food sources were displayed at their real size inside the simulation, which means very small : a few pixels wide. It was hard to see what was going on, and I was spending a lot of time with my face stuck to the screen trying to follow the trajectories by eye.\nAt some point I decided to separate the visual rendering from the physical simulation. The ants and the aphids are now drawn much larger on the screen than they actually are in the underlying grid. From a readability point of view, this makes observation much more comfortable. It also has a side effect that was not planned at the beginning but turned out to be useful.\nThe side effect is about how the ant detects a food source or the nest. The detection is based on a circle : the aphid has a radius, the nest has a radius. An ant is considered \u0026ldquo;on\u0026rdquo; a food source when the center of the ant (not its drawn body, not its rays of perception) enters the circle of the source. Same rule for the nest : the center of gravity of the ant has to enter the circle of the nest for the food to be delivered. This is a simple geometric test, independent of the visual size.\nBefore this change, the detection was more fragile : ants were sometimes visually on the food or on the nest but not triggering the interaction because of small rounding issues. With the new rule, the detection is clean and happens every time the center enters the circle. The collection rate and the nest return rate both improved after this change, without changing anything in the behavior of the ants themselves.\nA Calibration Tool for the Parameters The number of parameters grew quickly during this step : two evaporation rates, two diffusion sigmas, one deposit decay factor, the antenna length, the antenna angle, the random noise amplitude, the bias weight, and so on. Each time one of them is changed, it is not always obvious whether the new configuration is still in a biologically reasonable range or not.\nTwo values in particular were the real blocker for me at the beginning of the step, and I wanted a fast way to read them before each run :\nthe antenna separation, which is the distance between the left and the right antenna tips, given LENGTH_ANTENNA, ANGLE_ANTENNA and HALF_LENGTH_BODY. If the two antennas are too close, they read almost the same concentration and the left/right difference is always close to zero. If they are too far apart, one of them is always outside the trail and no useful signal can be extracted. the effective sigma of a trail at its half-life, which combines evaporation and diffusion into a single number that tells how wide a trail actually looks on the grid at a typical moment of its life. This is the width that the antenna separation has to be matched against. These two geometric values were the main reason I asked for this script, but they were not the only ones. I also asked for most of the other metrics because I already knew more or less which parameters would be important. For example, a signal-to-noise ratio between the angular bias and the random noise was obviously needed, but I did not know exactly how to formulate the signal part of the ratio, so I let the LLM take care of the exact expression. I checked the formulas afterwards and they were consistent. Being in an engineering school, I can usually tell when a derivation is correctly done, even if I did not write it myself :)\nThe script does not run any simulation, it only reads config.py and prints the derived values. It was built to display cleanly in the terminal so that I can evaluate all parameters within a few seconds.\nThe interesting part here is not the formulas themselves but the fact that I could run this script many times during a debug session, instead of launching dozens of simulations in the void and guessing what was wrong from the visuals alone.\nThe Final Parameters After all the iterations described in the previous sections, here is the config.py that the simulation is running on now. These are the values that were finally adopted for the current state of the project :\nimport numpy as np # GRID AND PHEROMONE EVAPORATION_RATE_HOME = 0.999 # between 0 and 1, the higher the slower the evaporation EVAPORATION_RATE_FOOD = 0.9985 # between 0 and 1 DIFFUSION_SIGMA_HOME = 0.25 # between 0 and inf, the higher the more the pheromone spreads, but also the more it evaporates DIFFUSION_SIGMA_FOOD = 0.275 # between 0 and inf, must be greater than sigma home because evap is less for home GRID_WIDTH = 600 # width of the grid in cells, also in pixels if cell size is 1, max 2880 for 8k screen GRID_HEIGHT = 400 # height of the grid in cells, also in pixels if cell size is 1, max 1920 for 8k screen CELL_SIZE = 2 # size of each cell in pixels FPS = 45 # frames per second WINDOW_WIDTH = GRID_WIDTH*CELL_SIZE # width of the window = width*cellsize (2880 pixels max ) WINDOW_HEIGHT = GRID_HEIGHT*CELL_SIZE # height of the window = height*cellsize (1920 pixels max ) PHEROMONE_DEPOSIT = 1 # amount of pheromone deposited by an ant at each step, between 0 and 1 # ENVIRONMENT COLOR_HOME = (139, 90, 43) # brown color for the pheromone leading to the nest COLOR_FOOD = (50, 205, 50) # green color for the pheromone leading to the food COLOR_BACKGROUND = (0, 0, 0) # black color for the background COLOR_NEST = (255, 255, 255) # white color for the nest NEST_RADIUS = 20 # radius of the nest in cells NEST_X = int(GRID_WIDTH/2) # coordinates of the nest, must be an integer NEST_Y = int(GRID_HEIGHT/2) # coordinates of the nest, must be an integer # ANT N_ANTS = 200 # number of ants in the simulation, must be an integer greater than 0 LENGTH_ANTENNA = 1.5 # length from the head to the tip of the antenna in cells, must be greater than 0 ANGLE_ANTENNA = np.pi/3 # angle between the direction of the ant and the direction of the antenna in radians, between 0 and pi/2, if 0 then antennas are in the same direction as the ant, if pi/2 then antennas are perpendicular to the direction of the ant COLOR_ANT_HOME = (255, 165, 0) # orange color for the ants exploring COLOR_ANT_FOOD = (0, 200, 80) # different green color for ants carrying food MAX_FOOD_CARRIED = 0.5 # maximum amount of food an ant can carry, between 0 and 1, if 0.5 then an ant can carry half of a food source FOOD_COLLECT_AMOUNT = 0.5 # amount of food an ant can collect at one time, between 0 and MAX_FOOD_CARRIED EAT_DURATION = 8 # number of steps an ant needs to eat a food source, during this time the ant cannot move or interact with other food sources, must be an integer ANTENNA_WEIGHT = np.pi/2.5 # weight of the pheromone bias on the ant\u0026#39;s direction, between 0 and pi/2, if 0 then the ant ignores the pheromones, if pi/2 then the ant turns directly towards the strongest pheromone LIM_ANGLE = np.pi/3 # maximum angle an ant can turn at each step TRESHOLD_FOOD = 0.45 # threshold of food carried for an ant to switch from following home pheromone to following food pheromone, between 0 and MAX_FOOD_CARRIED RANDOM_DIR = np.pi/25 # maximum random change in direction for an ant at each step, between 0 and pi, if 0 then the ant never changes direction randomly, if pi then the ant can turn in any direction at each step HALF_LENGTH_BODY = 1.5 # half of the length of the ant\u0026#39;s body in cells, used for drawing the ant as a line before adding the antenna ANT_RADIUS = 2 # must be integer for arrays and only for the visual here DECAY_FACTOR_STEP = 0.995 # decay factor applied to the pheromone deposit at each step, between 0 and 1, slowly decrease the amount of pheromone deposited NEST_DURATION = 12 # number of steps an ant needs to stay at the nest after depositing food, during this time the ant cannot move or interact with food sources, must be an integer NEST_ANGLE_VARIATION = np.pi/3 # maximum random change in direction for an ant when leaving the nest, between 0 and pi, if 0 then the ant leaves the nest in a straight line, if pi then the ant can leave the nest in any direction # FOOD SOURCES N_FOOD_TYPES = 2 # number of different types of food sources, must be an integer greater than 0 COLOR_APHID = (255, 220, 0) # color yellow for aphids, which are a type of food source that can recharge COLOR_SUGAR = (100, 200, 255) # color blue for sugar, which is another type of food source that does not recharge RECHARGE_RATE_APHID = 0.01 # recharge rate for aphids, between 0 and 1 RECHARGE_RATE_SUGAR = 0 # recharge rate for sugar, must be 0 since sugar does not recharge FOOD_RADIUS = 3 # radius of the food source in cells, must be an integer Compared to the starting config at the top of the article, almost every value has changed : the grid is larger, the number of ants is ten times higher, the deposit decay DECAY_FACTOR_STEP and the angular clipping LIM_ANGLE are new, the evaporation rates and the diffusion sigmas are now split per pheromone type, and several constants related to the ant\u0026rsquo;s body geometry (HALF_LENGTH_BODY, ANT_RADIUS) were added along the way. Each of these changes is the direct consequence of one of the sections above.\nWhere We Stand Now After all these iterations, the simulation finally produces something that looks like collective behavior. With 200 ants on a 600x400 grid, the three-antennas rule, the step-counting deposit, differentiated evaporation rates, and the proportional deposit, a few things work well and a few things are still broken.\nWhat works Trails appear and get reinforced over time. They are not perfectly stable, but they are visible and they last long enough to be useful. Ants come back to the nest often enough to keep the HOME and FOOD cycles running together, which was not really the case in the first iterations. The curve problem described earlier is strongly reduced. Ants leave the trail much less often in sharp turns than they used to. What is still broken When two HOME trails are close to each other, they merge through diffusion into a single path, and the ants stuck on this merged zone never go back to the nest. But what is interesting is that after some time, because of evaporation, this dead trail gets replaced by a new more efficient one that another group of ants has reinforced in the meantime. This is actually one of the main objectives of the simulation : adaptability, a colony that does not rely on a single trail but keeps rewriting its own map over time. The biggest current limitation is that when an ant arrives perpendicularly to a trail, it just passes through it without turning. The reason is that the gradient along the trail, the one pointing toward the nest (or toward the food, depending on the trail type), is not strong enough to trigger a clear response. If I increased this gradient, the ants that are already far from the nest would have their own deposit decayed to almost zero because of DECAY_FACTOR_STEP, and they would stop reinforcing the trail at long distance. I could not find how to make the parameters evolve correctly to balance both ends, so I left the behavior as it is for now. A related problem comes after the detection itself : even when an ant does notice a trail, it often joins it without knowing which direction is the right one, again because the gradient along the trail is too weak. There are a few rare cases where an ant arriving perpendicularly actually turns in the correct direction, but they are the exception, not the rule. I hope the neural network in Step 4 will handle this better than the current hand-crafted rule. A map creation tool for the next step To prepare for Step 4, I also added a map creation tool on the side. The tool lets me build a custom map (nest position, food sources, obstacles if I add them later), save it if I want, and when the simulation starts it now asks which map to load. This will be useful for training : the colony will not always see the same environment, and the network will have to generalize across different map layouts.\nVideos Four videos show the evolution of the simulation across this step, from the first chaotic runs to the current state.\nHere are the chaotic runs, I forgot what were the different parameters across them, but we can clearly see that something is not working correctly. Here is the current state, we can clearly the creation of path and the emergence of collective behavior : What\u0026rsquo;s Next Step 3 ends here with a hand-crafted rule that mostly works : ants follow trails, come back to the nest, trigger the recruitment cycle, and sometimes rebuild a new trail when the old one gets corrupted. But it is not possible to keep tuning the parameters by hand, and rewriting new rules for every situation would never end.\nStep 4 is about replacing the hand-crafted rule with a neural network trained for this task. I do not know yet exactly how the network will look. What I know is that I will use a genetic algorithm to evolve the weights of the ants. The key point here is that I want a collective reward, not an individual one. A single ant that brings back food is not doing well by itself, it only matters because the colony as a whole finds the source and exploits it. So the fitness score will be computed at the colony level (amount of food brought back, etc.), and each individual ant will be evolved based on the performance of the whole colony. The inputs are not fully decided yet, but I will try to keep them based on the antennas only.\nReferences Collett T., Graham P., Heinze S. (2025). The neuroethology of ant navigation. Draft R.W., McGill M.R., Kapoor V., Murthy V.N. (2018). Carpenter ants use diverse antennae sampling strategies to track odor trails. Wittlinger M., Wehner R., Wolf H. (2006). The ant odometer : stepping on stilts and stumps. Dussutour A., Nicolis S.C., Shepard G., Beekman M., Sumpter D.J.T. (2009). The role of multiple pheromones in food recruitment by ants. Personal discussion with Guy Theraulaz at EvoStar, Toulouse. ","permalink":"https://matteovacher.github.io/news/myrmico-lab/third_of_the_project/","summary":"Study of the design of the colony and emergence of collective behaviors.","title":"Step 3 (Perhaps Final) of Ant Simulation Project : Colony and Emergence"},{"content":"Introduction In the previous post, I introduced the pheromone grid and the rules behind the pheromones evaporation. Now, we\u0026rsquo;ll look into the ant agent that will move around the grid, deposit pheromones, and interact with the environment.\nTo do this we have implemented the Ant class in the core/ant.py file.\nAn Angle instead of an Index To represent the ant and its position on the grid, we have two possibilities : either to use the ant position as an index in a two-dimensional array (like the pheromones grids), or to use an angle to represent the direction and position of the ant.\nInstead of using 8 discrete directions, and therefore using 8 if/elif statements to determine the next position, we will use an floating angle $\\theta \\in [0, 2\\pi]$. At each time-step the spatial-step is simply :\n$$\\Delta x = \\cos(\\theta + \\delta \\theta), \\Delta y = \\sin(\\theta + \\delta \\theta)$$.\nHere $\\delta \\theta$ is a random value sampled from a uniform distribution in $[-\\pi/6, \\pi/6]$. Its role is only to display the ants movements on the screen. Later, this value will be the output of a neural network.\nThen, we only have to take the integer of these values to determine the next position on the surface that we\u0026rsquo;ll give to pygame.\nThe Movement of the Ant How do the ants move ? The core method of the Ant class is move(delta_theta, put_pheromones, value_pheromone). This method will move the ant by modifying the current direction by delta_theta, and will also add pheromones if put_pheromones is True. The value_pheromone parameter is the value of the pheromone to be added.\nHere we only manage and execute the movement of the ant. The value of delta_theta will be the output of the neural network.\nThe new position is then :\n$$x_{t+1} = x_t + \\cos(\\theta_{t+1}), \\quad y_{t+1} = y_t + \\sin(\\theta_{t+1})$$\nThe oscillation of the ants, a normal behavior This point deserves further explanation. Draft et al. (2018) showed that ants move in an oscillating pattern (sinusoidal, in zigzag). This behavior is not accidental, it is a universal functioning behavior. It exists in species that follow pheromones but also in species like Cataglyphis bombycina that rely on vision.\nThis oscillation has two purposes :\nIt maximizes the detection surface of the pheromones by sweeping over a wide range of directions. It allows visual corrections to occur at the peaks and valley of the oscillation when the head is stable. In our case, we\u0026rsquo;ll begin to simulate ants without vision. This oscillation may or may not be visualized in the simulation. The neural network will have to reproduce this behavior by learning and not by encoded rules.\nRebound on the Walls Because the simulation is basically a grid, the ants can only move inside this grid. If the new position is outside the grid, we put the ant back on the grid with numpy.clip. But the problem is what happens next. If the direction remains the same then, the ant will forever be stuck on the edges of the grid.\nTo avoid this, the direction is reversed by a simple symmetry along the axis of the edge :\nVertical wall : $x$ is out of bounds, $\\theta \\leftarrow \\pi - \\theta$, we inverse the horizontal component. Horizontal wall : $y$ is out of bounds, $\\theta \\leftarrow -\\theta$, we invert the vertical component. With this method the ant always goes back towards the inside of the grid.\nIn the real world, ants have a tendency to follow the walls (Thigmotaxism). This behavior is also not encoded in the ant and we hope that it will emerge and be learned by the neural network if we give it the detection of the wall as an input.\nThe Pheromones At each time step, the ant will deposit or not pheromones on the current cell. We decided to encode which type of pheromones to deposit depending on the situation :\nhas_food = True : the ant has food and deposits a green pheromone leading to the food source. has_food = False : the ant has no food and deposits a brown pheromone. As said in the first article, it is the base of stigmergy, the indirect communication between ants via the trails. Again, as already cited in Step 1, this system with two pheromones is biologically inspired : Dussutour et al. (2009) has shown that some species like Pheidole megacephala deposit a long lasting pheromone while exploring and another short lasting pheromone while bringing back food.\nThe Antennas for the detection of the Gradient of Pheromones The Method get_antenna_pos The method get_antenna_pos returns the positions of the two antennas.\n$$\\text{left antenna} = (x + L \\cdot \\cos(\\theta + \\alpha); y + L \\cdot \\sin(\\theta + \\alpha))$$ $$\\text{right antenna} = (x + L \\cdot \\cos(\\theta - \\alpha); y + L \\cdot \\sin(\\theta - \\alpha))$$\nwhere $\\alpha$ is ANGLE_ANTENNA and $L$ is the length of the antenna LENGTH_ANTENNA.\nThe values of the pheromones at these positions will be inputs of the neural network.\nThis bilateral system of gradient detection is a well-documented mechanism in ants. Collett et al. (2025) recall the foundational experiment by Hangartner (1967) on Lasius fuliginosus : with one antenna removed, the ant follows the edge of the trail detected by the remaining antenna. With both antennas surgically crossed, the ant becomes unable to follow the trail. These results prove that the ant\u0026rsquo;s brain indeed computes a bilateral spatial gradient - it compares the concentration on the left and right and turns toward the maximum. This is exactly what the network inputs in our model will do.\nHangartner (1967) also provides a key quantitative data : the detection threshold for the bilateral gradient is about 1/10. One third of the workers respond to a concentration ratio of 1/10 between the two antennas. This is an order of magnitude useful for evaluating whether the neural network correctly exploits the difference between the two antennal inputs.\nBiological Calibration of the angle $\\alpha$ and the length $L$ Draft et al. (2018) provide the most precise quantitative data available on antenna angles in Camponotus pennsylvanicus. The relevant parameter for our model is $\\theta$, defined as the angle between the body axis and the line connecting the head to the antenna tip \u0026ndash; which is exactly what ANGLE_ANTENNA represents in our code.\nFrom the cumulative distributions of $\\theta$ in Figure 3C of the paper, the three behavioral modules show distinct distributions :\nBehavioral Module $\\theta$ (approx. median) Interpretation Probing ~30 deg Antennas pointing forward, close to the body axis Trail following ~50 deg Semi-spread antennas Sinusoidal / Exploratory ~60-70 deg Widely spread, nearly perpendicular antennas The key data for our model is the trail following module. During precise trail tracking, the antennas are excluded from the central zone of approximately 2 mm width (the trail width) and move perpendicularly to the direction of motion, which corresponds to a $\\theta$ value of approximately 50 deg. We also know from the paper that the antenna length is approximately 0.5 times the body length (by approximation from the head-to-centroid distance of about 4.0 mm used as normalization unit).\nThe current values ANGLE_ANTENNA = pi/4 (45 deg) and LENGTH_ANTENNA = 1 (L = 1 represents the distance from the ant\u0026rsquo;s center to the antenna tip, combining approximately half the body length (~0.5) and the antenna length (~0.5 body length)) are therefore a reasonable approximation of trail following behavior.\nIt could be interesting in a later stage to set ANGLE_ANTENNA as an output of the neural network, so that the ant dynamically adapts its antenna configuration depending on its behavioral state - exactly as real ants switch between probing, trail following and sinusoidal modules.\nStock t - 1 : Model Hypotheses In addition to the two values of pheromones at instant $t$ we consider adding the two values of pheromones at instant $t-1$. This would grant the model an explicit access to the temporal gradient.\nThe question is now, do ants use this type of information ?\nThe response from the literature is nuanced. Collett et al. (2025) show, citing Draft et al. (2018) on carpenter ants, that during trail tracking, the antennas oscillate perpendicular to the trail and that brief contact with the trail produces a high temporal concentration gradient. But this temporal gradient emerges from the ant\u0026rsquo;s oscillating movement - it is not an explicit memory of a past value. The ant does not \u0026ldquo;store\u0026rdquo; $C(t-1)$ : it experiences it through its movement.\nAs a result, storing $t-1$ in our model is an original modeling hypothesis, not directly observed in ants. It is inspired by another mechanism and provides the neural network with an additional signal that could be useful. It is an experimental extension of the model, not a biological constraint.\nThe Food, Sources, Collect and Transport Two types of Food Sources In the model, we consider two types of food sources with each their own purposes :\nThe Aphids : A consistent number of ant species have been observed feeding on aphids, a common example is the ginger wood ant Formica rufa that raises aphids like cattle. Aphids provide a constant source of sugar for the colony. In the simulation, we hope that ants will build a consistent network of pheromones to guide them towards the aphid source. Each aphid will have a RECHARGE_RATE and at each time step, the aphid will stock this amount of food.\nThe Sugar : The sugar will represent a non-persistent food source for the ant. If a sugar source is consumed, it will never recharge again. We will be the only one deciding whether or not it is added on the map. We hope that ants will still keep looking for other sources of food instead of aphids to feed their colony.\nThe Food Transport We changed the former model where each ant had a boolean attribute has_food to a float attribute that derives better the natural behavior of ants. This attribute is called food_carried and food_carried $\\in [0, MAX FOOD CARRIED]$\nThis choice is made to highlight that ants do not always carry the maximum amount of food they can hold. For instance, if the food source is almost consumed, the ant goes back with what\u0026rsquo;s left which is compatible with a partially-consumed source.\nThe method interact() The way the ant thinks is simple and directly inspired from what everyone can observe at their own scale :\nIf the ant is on the food source and that she still has food to carry, she takes whatever she can from the food source in the limit of FOOD_COLLECT_AMOUNT and what\u0026rsquo;s left in the source. If the ant carries food and that she reaches the nest, she will deposit the food in the nest. The collect pause : EAT_DURATION When an ant collects food, she stays still for a few time steps. This behavior is directly inspired from nature. Ants don\u0026rsquo;t collect the food right away, they wait a few time steps before moving to the nest.\nThe Threshold THRESHOLD_FOOD An ant that carries very few food will put HOME pheromones and not FOOD pheromones. This is directly inspired from the fact that ants don\u0026rsquo;t deposit pheromones if they have very little food. This avoids 2 problems :\nA representation problem : an ant that carries $0.01$ units of food doesn\u0026rsquo;t really find food, so she needs to keep going and look for food.\nA numerical problem : instead of comparing units with == which is dangerous with float number, we use this threshold to solve the problem.\nThe Results Here is a video showing how the model that we currently have works :\nOn the video, we can see that ants may find the food sources, whether it is sugar or aphids, then they deposit the right pheromone but are not able to find their way back to the nest because we have not implemented the sensory mechanism. Still 2 ants happen to go back to the nest increasing the total number of food collected. Then they deposit again the right type of pheromone.\nWhat to do next ? The ants currently move and deposit pheromones, but they do so blindly. They cannot yet read the pheromone gradient with their antennas, which means the trail system has no effect on their behavior. This is the core problem to address in the next step.\nStep 3 — Colony and Emergence will focus on connecting the sensory inputs to the movement. Concretely, this means reading the pheromone concentrations at the two antenna positions and using that difference to bias delta_theta. At this stage, the rule will still be hand-crafted - a simple weighted difference between left and right antenna - but it will be enough to observe the first emergent trails. The goal is to see a collective structure appear from purely local decisions, without any central coordination.\nReferences Collett T., Graham P., Heinze S. (2025). The neuroethology of ant navigation. Draft R.W., McGill M.R., Kapoor V., Murthy V.N. (2018). Carpenter ants use diverse antennae sampling strategies to track odor trails. Hangartner W. (1967). Spezifitat und Inaktivierung des Spurpheromons von Lasius fuliginosus. Dussutour A. et al. (2009). The role of multiple pheromones in food recruitment by ants. ","permalink":"https://matteovacher.github.io/news/myrmico-lab/second_of_the_project/","summary":"Implementation of the ant agent : orientation by angle, movement, reflection on walls, pheromone deposit.","title":"Step 2 of Ant Simulation Project : Ant Agent"},{"content":"Introduction In the previous post, I introduced the project and set up the technical environment. Now it is time to build the first real component of the simulation : the pheromone grid.\nBefore introducing any form of ant in the code, we need to build the substrate the ant will interact on. Each individual ant has a tiny brain, they don\u0026rsquo;t act because the queen gave them an order. The queen itself, in fact has a purpose and a role, lay eggs, and not give them an order. Yet, they collectively build optimized trail networks and adapt to disruptions in seconds. The trick is that they do not need to talk to each other (depending on the species of course), they write down the information on the ground and read it back, all from a shared memory (chemical memory).\nReproducing this in simulation means building a data structure that every agent perceives, every agent modifies, and that evolves according to its own physical laws (that we have to respect) between agent interactions. That is the pheromone grid.\nTwo Layers for One Grid In real ant colonies, chemical communication does not rely on a single signal but on a combination of pheromones with complementary roles. Dussutour et al. (2009) experimentally showed that some species deposit a long-lasting pheromone while exploring their territory - even in the absence of food - and a short-lasting, much stronger pheromone when returning to the nest with food. The first one builds a collective memory of the environment, while the second triggers targeted recruitment towards an identified food source. This distinction, grounded to real biology, justifies the use of two separate pheromone in the simulations. Rather than just simplifying the system to a single signal, keeping this duality opens the door to richer collective behaviors - and gives the neural network the freedom to exploit this complexity on its own. If possible we\u0026rsquo;ll be able to implement a more complex system of pheromones. For now, we\u0026rsquo;ll focus on the two following signals :\nHOME pheromone : left by ants leaving the nest, marking the explored territory. FOOD pheromone : left by ants carrying food, marking the route toward food sources for other individuals. Together, these two signals are enough to produce the core emergent behavior we are looking for : the formation of efficient foraging trails from a colony of individually simple agents.\nFirst, the grid is implemented as a single NumPy array of shape (2, HEIGHT, WIDTH). The first axis selects the pheromone type; the two remaining axes are the spatial dimensions (2D). Keeping both layers in a single object is really convenient since it will reduce the time and memory complexity of the simulation : operations that apply to all pheromone types simultaneously, evaporation, diffusion, can be expressed as a single array operation with no Python loop which is much more efficient.\nself.grids = np.zeros((2, GRID_HEIGHT, GRID_WIDTH)) The two types of pheromones are called via class constants HOME = 0 and FOOD = 1. All of this will allow me to review the code and the structure faster later : grids[PheromoneGrid.HOME] is simple in a way that grids[0] is not.\nThe Physics of Pheromones Once deposited, a pheromone does not stay at the deposit point forever. To represent that, we use two physical processes shaping its evolution over time : evaporation and diffusion. Getting these two processes right is the core scientific work of this step of the simulation.\nEvaporation First, evaporation is the simplest of the two to implement. Over time, we can simply imagine that the pheromone concentration decreases due to chemical degradation, substrate absorption and air condition. Then, what is the mathematical form of this decay ?\nEdelstein-Keshet et al. (1995) gave us the answer. They use a first-order kinetic model (an exponential decay) for pheromone evaporation, it is for them the most convenient way of representing decay :\n$$\\frac{dC}{dt} = -r \\cdot C$$\nThis is the same equation that governs radioactive decay, capacitor discharge, and countless other physical systems. Its solution is the exponential :\n$$C(t) = C(0) \\cdot e^{-rt}$$\nThen we discretize this equation into a geometrical solution :\n$$C_{t+1} = C_t (1 - \\rho)$$\nIn my code, EVAPORATION_RATE stores the factor $(1 - \\rho)$, so that the update is a single multiplication without any loop over the entire 3D array :\nself.grids *= EVAPORATION_RATE What makes this result practically useful is that Edelstein-Keshet also measured $r$ for several species, we\u0026rsquo;ll see in the future if we use the different values :\nSpecies Time of decay Rate $r$ in $s^{-1}$ Atta texana \u0026gt; 6 days very small Eciton burchelli 2 to 8 days $4 \\times 10^{-7}$ Iridomyrmex humilis ~30 min $5 \\times 10^{-4}$ Solenopsis saevissima 104 s to 20 min 0.008 to 0.12 Myrmica rubra 2 to 3 min 0.005 to 0.008 Pogonomyrmex badius ~35 sec 0.028 These values describes five orders of magnitude. Then, the choice of the species directly set the entire timescale of the simulation. Furthermore, we have to decide which configuration of the evaporation and diffusion we want to use because it will describe a certain species and set the time of the simulation. For example, a slow-evaporating species like Atta texana produces persistent trails while a fast-evaporating species like Pogonomyrmex badius produces short signal that requires a constant reinforcement.\nDiffusion But there is also another process that takes place : Diffusion.\nMolecules spread to the surrounding cells by diffusion, creating a spatial gradient that ants can detect and follow. This process is called chemotaxis and it is how ants detects trails.\nThe governing equation is the classical diffusion (Partial Differential Equation) :\n$$\\frac{\\partial C}{\\partial t} = D \\cdot \\nabla^2 C$$\nwhere $D$ is the diffusion coefficient. Solving this numerically is computationally expensive.\nTo implement the diffusion of pheromones, we choose to use a Gaussian filter because of the fundamental mathematical equivalence between linear diffusion and a Gaussian convolution. As highlighted in Prof. Daniel Cremers\u0026rsquo; lectures (Variational Methods for Computer Vision), the solution to the linear diffusion equation is not just approximated by a Gaussian filter, it is the exact solution.\nThis implementation strategy is a well-established standard in scientific computing :\nWeickert et al. (1998) validate this solution. This method also guarantees us that :\nMass conservation : the total amount of pheromone in the grid is unchanged by diffusion alone. Convergence : without any deposit, the grid will converge to a uniform concentration and then decay to zero with evaporation. Maximum-minimum principle : no new concentration values are created. Here is the implementation of the gaussian filter in the code (we will find the appropriate parameters later):\ngaussian_filter(self.grids, sigma = DIFFUSION_SIGMA, order = 0, output = self.grids, mode = \u0026#39;constant\u0026#39;, cval = 0.0, truncate = 7.0, radius = None, axes = (1,2) ) Finding the good parameters for the diffusion process is a real challenge, in the literature, most of the models mix the evaporation and the diffusion processes, both Edelstein et al. (1995) and Watmough et al. (1995) do not consider the diffusion process by itself and therefore $D$ remains a free parameter. In order to find the good parameters, I will have to test different configuration and find the best trade off that respect the biological data find in the literature.\nThe axes=(1, 2) argument deserves attention. Without it, gaussian_filter would treat the array as a 3D object and blur along all three axes - including the first one, which separates HOME and FOOD. The result would be a mix of the two pheromone types, which is physically wrong. Restricting the filter to the spatial axes only is mandatory and will allow the two grid to not mix together.\nThe Complete Update Combining both processes, the full pheromone update at each time step is :\n$$C_{t+1} = \\text{GaussianFilter}_\\sigma\\left(C_t \\cdot (1 - \\rho)\\right) + \\text{Deposit}_t$$\nEvaporation happens first, then diffusion spreads what remains, then new deposits are added by the ants. This ordering is a modeling choice : it means diffusion propagates the already-decayed signal. The alternative (diffusion then evaporation) is also possible. Neither is strictly more biologically correct - both of them are physically correct.\nA final threshold removes values below $10^{-4}$ to prevent noise from accumulating over thousands of iterations and from making the calculations less stable :\nself.grids[self.grids \u0026lt; 0.0001] = 0 From Grid to Screen The pheromone grid is internal state - to observe it during development, we need to convert it to pixels. The Environment class handles this with a fully vectorized approach (RGB approach). Each cell on the grid is converted to a triplet of RGB values. R for red, G for green, B for blue.\nThe rendering uses a dominant pheromone logic : at each cell, whichever concentration is higher determines the pixel color. HOME maps to one configured color (brown), FOOD to another (green), and the brightness scales linearly with concentration (The pheromone values are scaled from $0$ to $1$). The key is that this is done with NumPy operations over the entire grid at once - no Python loop over cells which is much slower than vectorized operations .\nhome_stronger = home \u0026gt; food red = np.where(home_stronger, home * COLOR_HOME[0], food * COLOR_FOOD[0]) green = np.where(home_stronger, home * COLOR_HOME[1], food * COLOR_FOOD[1]) blue = np.where(home_stronger, home * COLOR_HOME[2], food * COLOR_FOOD[2]) In addition, NumPy indexes arrays as (row, column), which equals to (y, x). Pygame\u0026rsquo;s surfarray, however, expects (width, height), which equals to (x, y). A transposition is mandatory before passing the array to pygame, otherwise the entire image would be rotated by 90 degrees :\npygame.surfarray.make_surface(np.transpose(env_surface, (1, 0, 2))) Please note that we only transpose the array on the first and second dimensions, not the third since this one derives the color of the pixel.\nTesting Without Ants Since no ant agent exists yet, how do we verify the grid behaves correctly ? The answer is in tests/tests_pheromones.py. It reads mouse input each frame via pygame.mouse.get_pressed() and deposits pheromones wherever the cursor is held down. Left click is for HOME, right click is for FOOD.\nThis is enough to check if the grid is updated correctly :\nDeposits appear at the correct location. Evaporation gradually turns the signal off over time, proportionally to concentration. Diffusion spreads the concentration in a smooth gradient - visible as the color spreads away from the deposit point each frame. The nest is displayed above the pheromones. What we can already observe at this stage is that the two pheromones spreads and evaporate independently of each other.\nWhat Remains Open A few points need to be resolved before this step is fully validated :\nDiffusion coefficient $D$: No biologically calibrated value exists for ant trail pheromones. It will be determined empirically by matching the simulated active space to the biological literature values.\nDeposit model for agents: The classic ACO (Ant Colony Optimization) formula $\\Delta\\tau = Q / L_k$ (Dorigo et al. 2000) was derived for graph-based ACO, not for 2D spatial grids. An adapted deposit model will be needed once ant agents are introduced in Step 2. The biological evidence (Watmough and Edelstein-Keshet, 1995) suggests a deposit amount on the order of $0.6 \\times C_s$, where $C_s$ is the saturation concentration, as a starting point. Despite the fact that this is a simplification, it is a good starting point.\nSummary The pheromone grid is a (2, H, W) NumPy array updated each tick by three sequential operations: geometrical decay (evaporation, derived in Edelstein-Keshet\u0026rsquo;s first-order kinetic model), Gaussian convolution (diffusion, exactly equivalent to solving the linear PDE), and addition (deposit). The rendering converts concentration values to an RGB image using vectorized NumPy operations, with a transposition required for pygame Surface compatibility.\nThe grid is now physically coherent and visually observable. The next step is to populate it with agents.\nReferences Edelstein-Keshet L., Watmough J., \u0026amp; Ermentrout G. B. (1995). Trail following in ants: individual properties determine population behaviour.\nWeickert J., ter Haar Romeny, B. M., \u0026amp; Viergever M. A. (1998). Efficient and reliable schemes for nonlinear diffusion filtering.\nvon Thienen W., Metzler D., Choe D.-H., \u0026amp; Witte V. (2014). Pheromone communication in ants: a detailed analysis of concentration-dependent decisions in three species.\nDorigo M., Bonabeau E., \u0026amp; Theraulaz G. (2000). Ant algorithms and stigmergy.\nWatmough J., \u0026amp; Edelstein-Keshet L. (1995). Modelling the Formation of Trail Networks by Foraging Ants.\nDussutour A., Nicolis S.C., Shepard G., Beekman M., \u0026amp; Sumpter D.J.T. (2009). The role of multiple pheromones in food recruitment by ants.\n","permalink":"https://matteovacher.github.io/news/myrmico-lab/first-of-the-project/","summary":"Building the pheromone grid : the substrate every ant reads from and writes to, grounded in biological and mathematical models.","title":"Step 1 of Ant Simulation Project : Pheromones"},{"content":"Introduction Evolutionary Strategies (ES) are a powerful class of stochastic search algorithms that have been used to solve a wide range of optimization problems. While traditional optimization often relies on calculating exact derivatives, ES excels in environments where the objective function is unknown, non-differentiable, or noisy. In the notebook of the elective, we used standard test functions like the Himmelblau function or the Rosenbrock function.\nEvolutionary Strategies Building on what we previously explored in the Genetic Algorithms (GA) section, we now dive into Evolutionary Strategies (ES). While GA often focuses on discrete populations, ES is the fundamental tool for continuous optimization.\nFrom Individual Survival to Population Exploration As we\u0026rsquo;ve seen before :\nThe (1 + 1) ES : is the simplest form of ES, a single parent produces one child and the parent is replaced only if the child is better. The (1 + \\lambda) ES : is a more complex form of ES, a single parent produces $\\lambda$ children and the parent is replaced only if the best child is better. Exploration vs Speed While the (1 + 1) ES moves quickly, the (1 + \\lambda) ES is more exploratory, which avoid getting stuck in local optima. But be careful, generating too many points (will give you better data on the landscape but :) can become computationally expensive (at least for complex simulations).\nApproximating the Gradient What is brilliant with ES is that it allows us to approximate the gradient of a function without calculating its derivative.\nBased on Performance The algorithm samples multiple points around the parent using for example a normal distribution and by observing which offspring perform better than the others, we can identify a promising direction to move in the search space.\nNormalization In order to make this process robust, we normalize the fitness function.\nThe Approximation After some non intuitive calculus, we can approximate the gradient of the function using the following formula :\n$$ \\nabla f = \\pm \\frac{A \\cdot N}{\\lambda} $$\nwith :\n$N$ is the matrix of noise vectors sampled from a normal distribution $N(0, 1)$. Here, for the whole population, $N$ is a matrix of size $[\\lambda, d]$ where $d$ is the dimension of the problem.\n$A$ is the standardized fitness score of the population. It is a vector that contains the standardized fitness score of each individual in the population. This means that for each individual $i$, $A_i = \\frac{f(x_i) - \\mu(f(x))}{\\sigma(f(x))}$ where $f(x)$ is the fitness function and $\\mu(f(x))$ and $\\sigma(f(x))$ are the mean and standard deviation of the population.\n$\\lambda$ is the number of offspring.\nThe Learning Rate Lastly we define the new center of the future normal distribution by :\n$$ x = x + \\alpha \\frac{A \\cdot N}{\\lambda} $$\nThe Algorithm With all of this we can build our algorithm :\nAlgorithm 1: Evolutionary Strategies $(\\mu, \\lambda)$\nInitializing parent x in the search space Set learning rate alpha and the population size of the offspring lambda and the standard deviation of the noise sigma For each generation g from 1 to G : # 1. Generating the offsprings N = [N1, N2, ..., Nlambda] from a normal distribution N(0, 1) F = [f1=0, f2=0, ..., flambda=0] For each offspring i from 1 to lambda : ind_i = x + sigma *N[i] F[i] = f(ind_i) # Here we can keep track of the best individual in the population. End For # 2. Normalization mu_f = mean(F) sigma_f = std(F) A = [A1, A2, ..., Alambda] = [(f1 - mu_f) / sigma_f, (f2 - mu_f) / sigma_f, ..., (flambda - mu_f) / sigma_f] # 3. Approximation of the gradient gradient_approx = dot(A, N) / lambda # 4. Update parent x = x + alpha * gradient_approx End For Return x or the best individual in the population. To Summary In this approach we turned a blind and random search into a gradient descent. We no longer need to know the derivative of our landscape, we just feel where we should move.\nRank-Based Updates In ES, rank-based updates are a robust method for moving toward a solution by focusing on the relative ordering of individuals rather than their absolute fitness scores.\nThe concept Instead of focusing on fitness values to approximate a gradient, the algorithm sorts the individuals (offsprings here) from the best to the worst based on their performance.\nSelection of $\\mu$ individuals : Usually, not all individuals are useful for finding the center of the search. In the notebook, we typically select the best 50% percent of the population.\nWeight update : Then, we create a set of weights that decreases logarithmically from the best to the worst selected individual. This ensures that the most promising individuals are given more importance and influence on the algorithm moves.\nWhy Reducing noises : By only using ranks, worst individuals do not count as the best ones, and the algorithm is not bothered by individuals with bad performance.\nAvoiding local optima : focusing only on the top individuals prevent from getting pull away from a nearby optimum.\nInvariance property : I didn\u0026rsquo;t really notice this one but : rank-based updates are invariant to order preserving transformations. That means if you scale the fitness values (multiply by a scalar take the log), the update remains exactly the same as long as the ranking remains the same.\nThe Algorithm Here is the associated algorithm :\nAlgorithm 2: Rank-based Evolutionary Strategies $(\\mu, \\lambda)$\nInitializing parent x in the search space Set the learning rate sigma (or standard deviation) and the population size of the offspring lambda and the number of elites mu # Compute the weights based on log rank w = [w1 = (log(mu + 1/2) - log(1)), w2 = (log(mu + 1/2) - log(2)), ..., wmu = (log(mu + 1/2) - log(mu))] w = w / sum(w) For each generation g from 1 to G : epsilon = Normal(0, 1, size=(lambda, d)) # where d is the dimension of the problem f = [f1, f2, ..., flambda] # 1. Generating the offsprings For each individual i from 1 to lambda : x_i = x + sigma * epsilon[i] f[i] = f(x_i) end For # 2. Sorting the offsprings sorted_f = argsort(f) # where the best is at index 0 # 3. Approximation of the gradient gradient_approx = dot(w, epsilon[sorted_f[:mu]]) = N[first]*w1 + N[second]*w2 + ... + N[last]*wmu # 4. Update parent x = x + sigma * gradient_approx End For Return x or the best individual in the population. This algorithm is called Canonical ES.\n","permalink":"https://matteovacher.github.io/resources/courses-evo/evo_strat/","summary":"Synthesis of the core concepts of Evolutionary Strategies (ES)","title":"Evolutionary Strategies (ES) - Evolutionary Computation Elective by Prof. Dennis Wilson, ISAE-SUPAERO"},{"content":"Introduction to Evolutionary Computation Individuals and Genes The basic unit of an EA (Evolutionary Algorithm) is the individual, which represents a potential solution to a given problem. In this tutorial, individuals are represented by binary strings, consisting of genes that are either 0 or 1. Each individual of the population is initialized with random genes and a default fitness score of zero.\nEvaluation and Objective Functions What is an objective function ?\nAn objective function (or fitness function) is a function that gives a score to the individual (potential solution) i.e. tells us how good is the individual on the given problem. For each problem, there could be multiple objective functions. Each of them will give a different score to the individual. On the first Notebook we have two objective functions :\nOneMax : This function returns the sum of all the bits in the genotype. The aim of this function is to find the individual with the maximum number of 1 bits. LeadingOnes : This functions counts the number of consecutive starting 1s from the left, stopping at the first 0. The (1+1) Evolutionary Algorithm In Evolutionary Strategies (ES), population management is defined by the $(\\mu / \\rho , \\lambda)$ or $(\\mu / \\rho + \\lambda)$ notation. Here, $\\mu$ is the number of parents, then $\\rho$ is the number of parents involved in creating the next generation, the offspring, and finally, $\\lambda$ is the number of offspring. Also, note that the notation with the $+$ indicates that the parents creating the offspring can be part of the next generation while the $,$ indicates that the parents will not be part of the next generation. For the $+$ notation, whether the parents or included or not in the next generation depends entirely on their fitness compared to the fitness of their offspring\nWell then, what is the $(1 + 1)$ EA ? This algorithm follows really simple steps :\nInitialization : Create an individual by choosing random bit strings. Mutation : Create a child by flipping each bit of the parent with a certain probability. Selection : Select (by comparing the fitness function) the best individual between the parent and the child. Iteration : repeat the process until a stopping condition is met. Performance This algorithm is the most simple that can be done. His performance depends primarily on the the problem and on the objective function.\nThe $(1 + \\lambda$) EA The Algorithm In the previous algorithm a single parent parent produces a single offspring. In this algorithm, a single parent produces $\\lambda$ offsprings. The best child is then compared to the parent and if the child has a higher fitness the parent is replaced by the child.\nThe Performance The $(1 + \\lambda)$ usually requires fewer generations to converge, it performs more evaluations per generation ($\\lambda$ to be exact). In order to compare the $(1 + 1)$ and the $(1 + \\lambda)$ algorithm, we need to compare the total number of evaluation and not only the number of generations.\nMore on These Algorithms Finally the choice of the mutation rate and the population size significantly impact the performance of the EA.\nGenetic Algorithms What are GA ? GA are a class of stochastic optimization techniques that are highly adaptable to specific problem constraints. GA can for example replace gradient based optimization algorithms. In the notebook of the elective we study the case of the Traveling Salesman Problem (TSP), a problem where a salesman must visit every city once in a list of cities by minimizing the total distance traveled.\nGA are basically composed of different steps :\nThe population : The population is a set of individuals that represent potential solutions to the problem. The Evaluation : Each individual is evaluated using a fitness function to determine their \u0026ldquo;fitness\u0026rdquo;. Here, we want to minimize the distance traveled in the 2D space. The Selection : The selection is then the process of selecting the best individuals from the population. Several methods are available. The Crossover : The crossover is the process of creating new individuals by combining the genes of two parents. The Mutation : The mutation is then the process of modifying the genes of an individual to explore new solutions. By repeating these steps, we can generate a new population of individuals that are more likely to be good solutions to the problem.\nThe Population The first step in a GA is defining the representation of a solution, known as the genome. For the TSP, a genome is represented as a permutation of city indices, ensuring each city is visited exactly once.\nUnlike simpler ES, GA typically use a large population size, typically around 100-1000.\nMaintaining a large population size allows for more diversity in the population, which can lead to better solutions by looking for multiples different potential solutions at the same time.\nThis diversity is the key to avoiding being trapped in a local optimum.\nThe Evaluation To guide the evolution, every individual in the population must be assessed by an objective function to determine their fitness.\nIn the context of the TSP, the fitness is measured by the total distance traveled by the salesman.\nThe objective function of an individual $I$ (which is represented by a list of the cities visited) is then expressed as : $$f(I) = \\sum_{i=1}^{n-1} \\text{dist}(p_{i-1}, p_i)$$ with :\n$n$ is the number of cities. $p_{i}$ is the index of the city at position $i$ in the tour. $\\text{dist}(a, b)$ is the pre-computed distance between city $a$ and city $b$ from the distance matrix. The selection The selection is the process of deciding which individuals will be selected and will pass their genes to the next generation. The goal is to favor the best-performing individuals while preserving the diversity in the group.\nHere, there exists different type of selection methods :\nTruncation Selection : This really simple method (compared to CMA-ES, which we\u0026rsquo;ll see later) simply takes the top percentage of the population. It is a really efficient method but in nature for example it does not preserve the diversity of the population. It is the same here, it may loose diversity too quickly.\nFitness Proportionate Selection : This gives every individual a chance to be selected based on a probability relative to their fitness.\nTournament Selection : We first choose a subset of individuals randomly from the population. Then the best among them is given the right to reproduce. This method relies more on the rank than on the absolute value of the fitness which means that if you are selected to reproduce it is because you are the best of the subset and not because you crush your opponents.\nThe Crossover The crossover is the \u0026ldquo;sexual\u0026rdquo; part of the algorithm where the information from two parents is combined to create an offspring.\nIn many problem, a one point crossover is used to swap genetic segments after a random point.\nNevertheless, for the TSP, standart crossover can break constraints by creating an offspring that visit a city twice.\nTo solve this, we use a specialized operator like the Edge Recombination Operator (ERX), which builds a child by prioritizing the existing links between the cities find in both parents.\nThe Mutation The mutation step is then necessary to introduce entirely new and random genetic material into the population.\nUsually mutation modifies an individual by changing only a small percentage of its genes (usually at a rate of 1/n_genes).\nTo respect TSP constraints, we swap the order of a single random pair of cities.\nThis step is vital for maintaining genetic diversity and allows the algorithm to search outside the current combinations already existing in the population.\nThe Process Then we combine all of those steps and we obtain the algorithm. For example a typical GA looks like this :\nAlgorithm 1: General Genetic Algorithm\nInitialize population P with N individuals For each generation g from 1 to G: F = Evaluate(P, objective_function) # 1. Elitism: select the top percentage of the population P_next = TruncationSelection(P, F, proportion=0.15) # 2. Reproduction: fill the rest of the population While size(P_next) \u0026lt; N: # Selection of parents parent1 = TournamentSelection(P, F) parent2 = TournamentSelection(P, F) # Creation of offspring child = Crossover(parent1, parent2) child = Mutate(child) # Integration into the new population P_next = add(P_next, child) End While # 3. Transition to the next generation P = P_next Record best(F) End For Return the best individual found in P By combining these steps, along with elitism, which passes the very best individuals directly to the next generation, the Genetic Algorithm iteratively refines its population toward an optimal solution.\nFinal Remarks By effectively balancing the preservation of elite individuals with the continuous introduction of novel genetic traits, the Genetic Algorithm shifts from a mere random search to a sophisticated, directed exploration of the solution space. This iterative refinement allows the population to surpass optima, eventually converging toward a robust and highly optimized solution.\nIn the next articles, we will explore more advanced strategies: the Evolutionary Strategies and Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and see how it handles even more complex continuous optimization landscapes.\n","permalink":"https://matteovacher.github.io/resources/courses-evo/ga/","summary":"Synthesis of the core concepts of Genetic Algorythms (GA)","title":"Genetic Algorithms (GA) - Evolutionary Computation Elective by Prof. Dennis Wilson, ISAE-SUPAERO"},{"content":"Elective Synthesis This elective course is a synthesis of the concepts of evolutionary computation. I want to document the core principles I\u0026rsquo;ve gained through this elective so that I can go back to them later if needed through my different projects.\nWhy this \u0026ldquo;Ressources\u0026rdquo; section ? I have decided to document the corner stones of this course and of the different paper that I read for two main reasons :\nReinforcement Learning : Explaining complex concepts is for me the best way to master them. Research base : These summaries will serve as a starting point for my own projects. Key Takeaways The approach chosen by Prof. Dennis Wilson allowed us to navigate through different strategies and algorythms to solve different complex problems :\nGenetic Algorythms (GA) \u0026amp; Evolutionary Strategies (ES) : Here, we understand the fundamental mechanics of selection, crossover and mutation to explore the parameters space. Neuroevolution : Here, we study the connection between evolutionnary algorythms and neural networks. Multi-Objective Evolution : Learning how to optimize multiples and maybe competing objectives at the same time. Genetic Programming : Here, we evolve actual computer programs or mathematical expressions. ","permalink":"https://matteovacher.github.io/resources/courses-evo/intro_elective/","summary":"Personal synthesis to remind of the course content","title":"Introduction to the Evolutionary Computation Elective by Prof. Dennis Wilson, ISAE-SUPAERO"},{"content":"The Start of the Road This post marks the beginning of my journey. My goal is to explore the world of collective intelligence and multi agent systems through ant colony simulations.\nInstead of waiting for a finished product, I have preferred to use this website as a space where I document my work as it happens. I still have no idea where all of this will lead but I hope that I will observe interesting simulations.\nCurrent Context \u0026amp; Acknowledgements This research is directly linked to the elective course taught by Prof. Dennis Wilson (ISAE-SUPAERO). It serves as both the academic foundation and the practical starting point for everything I am building right now.\nThe concepts of evolutionary computation and complex systems explored in his course will be the foundation of my current project.\nProjet Dennis Wilson I particularly want to thank Dennis Wilson for his invaluable support, both for this project and others we have discussed. He has shown incredible patience and provided essential guidance at times when I struggled to find the right words to describe the ideas I wanted to explore. His mentorship has been the catalyst for this journey.\nThe Vision My goal is to visualize thousands of digital agents interacting in real-time. I want to see paths forming, food sources being depleted, aphids raised and protected against ladybugs, the colony adapting to obstacles.\nWhat I will share I plan to share my advancements here.\nInitial Ideas : Concepts and questions that I am exploring Progress : Seeing how my simulations and results evolve with my code Tools : Notes on the tools I will use Fails : I also want to share when I fail From Theory to the Real World Coming from a background of \u0026ldquo;Classes Préparatoires\u0026rdquo; and now my first two years at ISAE-SUPAERO, I have written a lot of code in C, Java, MATLAB, and Python. However, my experience has been mostly focused on the theoretical aspect of computer science.\nWith this/these projects, I want to fill the gap. I want to confront the reality of the practice with bugs and unexpected behaviors of a live simulation. Documenting my failures is, for me, part of becoming a better engineer and will allow others to see where to be careful.\n","permalink":"https://matteovacher.github.io/news/myrmico-lab/the-beginning/","summary":"Starting a journey into collective intelligence and multi-agent systems. A look at the project\u0026rsquo;s origins, from theory to live ant colony simulations.","title":"The Beginning"},{"content":"Mattéo Vacher Student \u0026amp; Researcher, AI Enthusiast\nI am a student passionate about the intersection of myrmecology (the study of ants), neuro-ethology (the study of neural mechanisms responsible of animals behavior) and Artificial Intelligence. I am currently researching about the evolution of soft robot at the University of Tsukuba, Japan with Prof. Claus Aranha.\nResearch Interests I am interested in :\nCollective Intelligence : How simple individuals solve complex problems together. Bio-Inspired AI : Applying biological knowledge to AI. Simulations : Developing efficient models on my computer to visualize agent-based behaviors. Experience 2026 - Present : University of Tsukuba (Tsukuba, Japan) Exchange Research Student : Working on the EvoGym library and the evolution of single genomne soft robot. Education 2024 - Present : ISAE-SUPAERO (Toulouse, France) Master\u0026rsquo;s in Engineering : General engineering formation. I will be studying for my last year Data \u0026amp; Decision Sciences. Developped a strong foundation in Python programming, Automatic Control, Java and C. Currently learning C++ on my own. 2022 - 2024 : Lycée Saint-Louis (Paris, France) CPGE MPSI/PSI* : Intensive French Mathematics and Physics Preparatory Classes. Projects \u0026amp; Research Myrmeco-Lab This website serves as my digital laboratory. I will documents my progress in building ant simulations that mimics ants natural behaviors.\nCurrent Research Exchange Research Student at the University of Tsukuba, Japan (May - Sept 2026) researching the evolution of robot structures and neural controllers with Prof. Claus Aranha and Prof. Dennis G. Wilson.\nPersonnal I love running, sports, reading, TV series, board games. I ran my first Marathon in Toulouse at 20 yo, and my objective is to run an ultra-marathon. I maintain an ant colony at home (Messor Barbarus) and I love watching their organization in nature.\n","permalink":"https://matteovacher.github.io/about/","summary":"\u003ch2 id=\"mattéo-vacher\"\u003eMattéo Vacher\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eStudent \u0026amp; Researcher, AI Enthusiast\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eI am a student  passionate about the intersection of \u003cstrong\u003emyrmecology\u003c/strong\u003e (the study of ants), \u003cstrong\u003eneuro-ethology\u003c/strong\u003e (the study of neural mechanisms responsible of animals behavior) and \u003cstrong\u003eArtificial Intelligence\u003c/strong\u003e. I am currently researching about the evolution of soft robot at the University of Tsukuba, Japan with Prof. Claus Aranha.\u003c/p\u003e\n\u003chr\u003e\n\u003ch3 id=\"research-interests\"\u003eResearch Interests\u003c/h3\u003e\n\u003cp\u003eI am interested in :\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eCollective Intelligence :\u003c/strong\u003e How simple individuals solve complex problems together.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eBio-Inspired AI :\u003c/strong\u003e Applying biological knowledge to AI.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSimulations :\u003c/strong\u003e Developing efficient models on my computer to visualize agent-based behaviors.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch3 id=\"experience\"\u003eExperience\u003c/h3\u003e\n\u003ch4 id=\"2026---present--university-of-tsukuba-tsukuba-japan\"\u003e\u003cstrong\u003e2026 - Present :\u003c/strong\u003e University of Tsukuba (Tsukuba, Japan)\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eExchange Research Student :\u003c/strong\u003e Working on the EvoGym library and the evolution of single genomne soft robot.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"education\"\u003eEducation\u003c/h3\u003e\n\u003ch4 id=\"2024---present--isae-supaero-toulouse-france\"\u003e\u003cstrong\u003e2024 - Present :\u003c/strong\u003e ISAE-SUPAERO (Toulouse, France)\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eMaster\u0026rsquo;s in Engineering :\u003c/strong\u003e General engineering formation. I will be studying for my last year Data \u0026amp; Decision Sciences.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eDevelopped\u003c/strong\u003e a strong foundation in Python programming, Automatic Control, Java and C. Currently learning C++ on my own.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch4 id=\"2022---2024--lycée-saint-louis-paris-france\"\u003e\u003cstrong\u003e2022 - 2024 :\u003c/strong\u003e Lycée Saint-Louis (Paris, France)\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eCPGE MPSI/PSI\u003c/strong\u003e* \u003cstrong\u003e:\u003c/strong\u003e Intensive French Mathematics and Physics Preparatory Classes.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch3 id=\"projects--research\"\u003eProjects \u0026amp; Research\u003c/h3\u003e\n\u003ch4 id=\"myrmeco-lab\"\u003eMyrmeco-Lab\u003c/h4\u003e\n\u003cp\u003eThis website serves as my digital laboratory. I will documents my progress in building ant simulations that mimics ants natural behaviors.\u003c/p\u003e","title":"About"}]