Catalyst project is coming to an end, and the developments are being experimented in realistic conditions among the partners’ DCs. The goal of the Catalyst project is to use the DCs features and specificities to bring flexibility to the electricity grid. At Qarnot, we investigated the flexibility that might be offered by leveraging the IT workloads running on the servers, and therefore manage the power consumption, which is also at the same time the local heat production.
The first pilot realized by Qarnot was aiming at proposing demand response features locally. As an Edge DC, the amount of electrical energy used in a building or in a city might be capped as the infrastructures and the supply contracts might not be as favorable as for large DCs. At the same time, Qarnot is expected to provide heat according to users’ needs. Therefore it is valuable for Qarnot to be able to reduce its energy consumption / heat production at a local level, either to comply with local needs or external requests.
The difficulty is to define the level of flexibility of the local resources, to do so, a Catalyst component, the Energy Optimizer (EO) has been developed. By monitoring the infrastructure, the tool is able to forecast energy consumption and propose optimisation plans. AI algorithms for the Catalyst Energy Optimizer have been developed by the University of Cluj-Napoca.
We deployed this component on our Scalemax production site near Paris which is an Edge DC with about 100kW currently running. This site has very few of the flexibility features that are available in a “regular DC” such as UPS, or cooling system which is limited to the very minimum. Nevertheless, for the pilot we identified “delay tolerant” IT workload to assess the site flexibility using optimization plans proposed by EO.
During the pilot tests, we run EO and manually executed optimisation plans to pause “delay tolerant” workload and observe the actual reduction of energy consumption. Before application, we were able to measure a range of possible demand response of about 72kW of delay tolerant workload. The EO decided to use this capacity in two batches of shifting composed of 28,6kW and 19,07kW for different duration. We manually postponed the delay tolerant workload and successfully observed a corresponding to an actual reduction in electrical load of about 51kW.
The second strategy investigated by Qarnot along with Power Ops and Singular Logic is related to the Virtual Machines (VM) migration. Indeed, when considering the energy consumption, servers remain obviously the main contributors… Therefore the idea was to migrate IT workload onto another server in a remote DC to relocate the corresponding energy consumption (and the associated waste heat). To do so a Catalyst software component has been developed, enabling the migration of VMs without service interruption. The IT workload relocation has been tested with two of the partners, Schuberg Philis near Amsterdam and Poznan Supercomputing and Networking Center in Poland. Inside each partners the client component handling migration was installed while the main server was deployed in Engineering Pont Saint Martin DC.
Then, we generated VMs to test and validate the working conditions. One VM was using a CPU burn, which is basically a program that does nothing except run the CPU at full capacity, generating important power consumption and heat production. This VM purpose is to actually assess that the VM have been relocated, is still working, and that the power consumption is moving from one DC to another. The power consumption depends on the CPU, therefore the power consumption is not representative.
Then we generated two other VMs, one with a simple web server, and a second with a 3D rendering program. The web server is easy to deploy and demonstrates that the VM is still responding and communicating with its original client (HTTP/1.0 200 OK). The power consumption is too small for such VMs to really be noticeable. The second VM is representative of real IT workload for Qarnot, it is a Blender 3D rendering job (rendering raspberries image), which is CPU intensive, and contains data.
The migration has been successful and the rendering hasn’t stopped. The VM size is approximately 14Gb, which corresponds to the data to be transferred between DCs. We can observe that the migration time from Qarnot to PSNC lasts about 26 minutes, and the migration back lasts about 22 minutes. It is important to note that Internet connexion for QRN distributed infrastructure cannot be considered as a “pro” or “DC grade”. Usually connections in DCs are much more efficient and should significantly reduce migration time.
This experiment is really representative of Qarnot daily operation. Indeed, it is possible to imagine that this migration could be triggered by Qarnot heaters’ thermostat. In other words, if someone equipped with a Qarnot heater decides to reduce its thermostat. Therefore the IT workload already running, very possibly a 3D render workload, would be impacted. To reduce heat production, it is possible to imagine multiple strategies, stop the workload (and start again somewhere else), reducing CPU frequency, pause the server (and restart later), or relocate the VM running on this heater onto another server (strategy investigated in Catalyst). This relocation can be done within the Qarnot domain in another production site, or in another domain of the Catalyst federated DCs.
As for Qarnot, these strategies can have multiple applications that could be applied depending on the site or on the IT workload actually running. We could imagine actual energy services either at the building or local smart grid level. This is also another way of maintaining high computing grid resiliency. Finally, it is also for Qarnot another way of leveraging its multiple small edge computing sites in a flexible manner seamlessly for its clients.