This article contains the following:
Ensure required time windows due to the changed shutdown sequences in case of a vSAN:
Take DRS within the vSAN cluster into account
Special role: The witness server
Tutorial: If the shutdown on a vSAN has to be aborted once...
What else is importan when aborting a vSAN shutdown
How long is the estimated shutdown time?
Before starting, please read the following configuration notes carefully to prevent shutdown issues caused by a wrong configuration of RCCMD.
RCCMD can handle vSAN VMware environments.
Due to the fact that a vSAN is very complex and the operating conditions of a vSAN differ when compared with a single host or a standard cluster, there are some preconditions that must be met before RCCMD can shut down a vSAN:
The RCCMD client that handles the vSAN cannot be installed inside a vSAN.
Due to the fact that each host of a vSAN must be set to maintenance mode before it can be switched off. As long as one virtual machine is running, it is not possible to switch off the hosts.
The vCenter that handles the vSAN is always the first virtual machine and the last virtual machine.
The vCenter is the control unit of a vSAN. It is allowed to install the vCenter inside the vSAN as well as to run it on a single host that is not part of the cluster. The essential function of the vCenter is managing the data synchronization inside a vSAN after all other virtual machines are down. You need to ensure that the vCenter can complete this operation.
If you run a Witness Server as a virtual machine inside a vSAN.
The Witness Server has a special task. If two hosts do not agree on which host holds the most recent data, they ask the Witness Server. The Witness Server acts like a complete host but cannot maintain virtual machines.
Due to this fact, the Witness Server can also be virtualized in the vSAN and still acts as a stand-alone host. In that case, you need to differentiate between the Witness Server's IP address and the host's virtual machine where the Witness Server's virtual machine is located.
The Witness Server is shut down regularly within the vSAN cluster.
The host that maintains the virtual machine that contains a Witness Server needs a second RCCMD client for enabling the maintenance mode after the Witness Server is switched off. Technically, an RCCMD client can only handle the vSAN or the host it runs on:
Therefore, if you have a single host AND a vSAN cluster, you will need at least 2 RCCMD clients: RCCMD 1 manages the shutdown of the vSAN cluster and RCCMD 2 manages the shutdown of the single host. The shutdown routine is then divided into 2 different commands for the CS141:
- Shut down the vSAN cluster
- Shut down the single host
Since the two RCCMD clients run side by side:
When choosing the correct time window for shutdown tasks, ensure the vSAN has turned off all hosts completely before turning off the last remaining single host - otherwise the RCCMD client that manages the shutdown of the vSAN may not be able to complete the shutdown routine because the second RCCMD client performs a local virtual machine shutdown.
Note: Appliance vs. Appliance – What is a Virtual Machine and what is the RCCMD Client?
Both appliances are essentially the same: each is a virtual machine. However, since you are running two appliances, the name of the virtual machine they run on will differ. Specifying the virtual machine name prevents an RCCMD client from shutting itself down prematurely.
For example, if you tell RCCMD 2 the name of its own virtual machine, it will treat RCCMD 1 as just another virtual machine and shut it down. When using a vSAN, the shutdown commands of the CS141 synchronize the shutdown behavior of both appliances.
Ensure required time windows due to the changed shutdown sequences in case of a vSAN:
The target when using a vSAN is to combine and maximize resource availability with data redundancy. The system is therefore not well suited to carry out a fast shutdown without strict procedures. Since a system-wide complete shutdown is rather an exception, it is difficult to estimate how much time the vCenter within a vSAN will need to take all hosts into maintenance mode.
In principle, vSAN proceeds with a shutdown in three steps.
The time-critical part is the post-synchronization phase, as this phase is difficult to estimate:
Maintenance mode can only be reached after the synchronization of the data between all hosts has been completed. This process is dynamic and changes depending on available hardware, the number of virtual machines, and the amount and type of data contained within the virtual machines that ultimately need to be synchronized between all hosts.
What makes matters worse is that this process takes place within the vSAN — at some point, the hosts are in maintenance mode, which means that the process is complete.
This is offset by the maximum operating time of the UPS.
RCCMD needs clear time windows to be specified for the shutdown, which, in addition to the calculated times for a shutdown, must also be based on the operating time of the UPS. RCCMD therefore needs a reserved time window to carry out the shutdown:
to allow the IT to be shut down in a timely manner,
to provide a time buffer if the post-synchronization phase changes,
to carry out shutdown procedures within the safety range of the UPS's running time,
to ensure there is enough time for the host outside the cluster to shut down.
Take DRS within the vSAN cluster into account
Contrary to popular belief, DRS and vCenter are independent system services that simply communicate with each other. It's even possible to set up an HA cluster without DRS, although this doesn't necessarily make sense, as the HA cluster itself actually implies DRS.
In contrast to an HA cluster, DRS is an essential component in a vSAN because this service can detect unused storage resources in the background and bundle them for a virtual machine. This intelligent, real-time resource management allows the operation of more virtual machines than in a normal cluster:
Virtual machines that now assume this distributed state are particularly vulnerable to a host-based cluster shutdown because vCenter and DRS only implement the cluster shutdown cleanly together if explicitly instructed to do so by an administrator via vCenter.
Although VMware recommends a cluster shutdown procedure via maintenance mode, if the hosts are put into maintenance mode too quickly, the DRS service may not be able to immediately determine the necessary resources for an internal migration of VM data and will attempt again after a sort of waiting loop. Virtual machines whose allocated disk space is withdrawn in this case because the hardware is no longer available are acutely affected by the outage. Whether and to what extent the operating system and data are damaged depends heavily on the memory state and the operating system's use.
The new RCCMD Shutdown Management with its customized shutdown procedure can not only shut down sensitive systems in a manner that is dependent on each other but also save valuable system resources once the general shutdown takes effect.
Important:
An overview of the time window the UPS can provide is mandatory for an orderly shutdown. This will determine not only the specific schedule but also the latest time slot to start the system shutdown:
Shutting down a vSAN is a very system-critical process due to its technical nature. A vSAN reacts sensitively if it is not shut down properly.
Preparing RCCMD for the vSAN
At VMware Settings, enable “Hosts are also vSAN nodes”
To manage the shutdown routine, ensure that the RCCMD appliance is located outside the vSAN cluster.
Once you have activated the vSAN mode, you will get additional menus:
Mode for decommissioning vSAN nodes
Leave the decommissioning mode on No data evacuation - this mode is the fastest method to shut down a vSAN cluster:
The virtual machines are shut down in a structured way and then all data will be synchronized on all affected hosts.
Definition of the vSAN Resync timeout
Unlike the default procedure, the vCenter becomes active after the virtual machines shut down and starts synchronizing all records within the cluster.
This post-synchronization phase defines the critical phase of the shutdown procedure:
All datasets from the virtual machines must be in sync with mirrored data stored on other hosts.
As long as this synchronous system state is not reached, the maintenance mode cannot be entered by any host.
Note:
This process is highly dynamic and depends on the type of data that needs to be synchronized. Creating several new virtual machines may only marginally affect the synchronization time. However, in some cases, adding a single virtual machine can significantly increase the post-synchronization time. Similarly, data within a virtual machine may grow organically due to usage, which in turn impacts the time required.
This value cannot be defined once during the initial installation as a fixed figure; it must be reviewed regularly to ensure accuracy and adjusted if necessary.
The vCenter takes all the time needed for this process. Unfortunately, this relative amount of time is in direct contrast to a clearly defined time window that can be provided by the UPS during an emergency power operation. You need to calculate a sufficiently large time window to give the vCenter a time reserve in case the calculated period is insufficient.
Defining maintenance mode for the vCenter.
This setting defines how much time the vCenter has to shut itself down after synchronizing data. If the vCenter runs as a virtual machine within the vSAN, this point in time becomes interesting: After this time window, the hosts are put into maintenance mode and the vCenter is switched off by its host.
Enter data for the vSAN managing vCenter
Since RCCMD must coordinate with the vCenter over the entire process, the access data for the vCenter, which manages the vSAN, is mandatory.
In this configuration dialog, do not enter credentials for individual hosts.
Define the vSAN managing RCCMD client:
RCCMD has the task of shutting down all virtual machines and turning off the hosts at the end. Since within a vCenter not only a vSAN but further hosts can be mapped, RCCMD can shut them down, too. There are two exceptions that need more attention:
Information about the virtual machine running RCCMD
Although RCCMD itself cannot run in the vSAN that should be shut down, the vCenter that manages the vSAN may include additional hosts in its list. The RCCMD appliance is a virtual machine that must comply with the control commands of the host on which it is running itself - if the host advises a shutdown, the appliance will do it. To prevent RCCMD from inadvertently giving itself a shutdown command, enter the name of the virtual machine you chose for RCCMD. When entered, the virtual machine that holds this name will be excluded from the shutdown process.
Define the virtual machine that contains the vSAN managing vCenter
Within the vSAN system, the vCenter performs special administrative tasks, but is also a virtual machine. During the shutdown, RCCMD first gets an overview of active virtual machines and then shuts them down, migrates them, etc. With this setting, RCCMD will know which of the virtual machines is the vCenter and will shut it down exclusively as the last machine in the vSAN shutdown procedure.
Definition of the vSAN ESXi host nodes
Define the hosts to be shut down by RCCMD. The virtual machines can be moved to other hosts via the vCenter. To shut down a host, RCCMD requires the following information:
HOST / IP name
We recommend using the IP address of the host at this point to avoid addressing problems when parts of the IT infrastructure are down.
Due to the fact RCCMD supports host names, you may enter a host name, too.
User
A user with the appropriate system rights to shut down the VMware environment accordingly. Keep in mind to use a local host administrator with root rights to grant the shutdown command permission!
Password
The password assigned to the user that allows RCCMD to authenticate itself as authorized.
Shutdown delay
The next step is to determine how much time RCCMD should allow the virtual machines to shut down before the ESXi host will quit all operations and switch off:
The vSAN has a special feature compared to other operating modes:
The shutdown duration typically defines the time window that a host grants the operating systems within virtual machines before the virtual machine is simply powered off. Thereby it does not matter if a vCenter has previously tried to migrate machines or not.
When this command is issued to the hosts running in a vSAN, there are no more virtual machines that need to be powered off:
- All hosts must be in maintenance mode
- A host can only be in maintenance if all virtual machines are moved or switched off.
For the hosts in vSAN, this means that the shutdown time of virtual machines can be set to 1 second:
The shutdown routine on a vSAN has already brought all hosts into maintenance mode. Consequently, no time window is required to grant operating systems within a virtual machine for a shutdown.
Special role: The witness server
Small vSAN systems lack the necessary resources to be able to independently adjust all data stocks.
To prevent problems with data synchronization in minimalist vSAN systems, a witness server is used:
This witness server acts as a stand-alone host in vSAN, but is not responsible for hosting and managing virtual machines - as soon as hosts are unable to agree with the timeliness of their datasets, the witness server decides which host has to synchronize the data.
The witness server can be both, a real physical machine with its own hardware as well as acting like a physical host but running within a virtual machine. The vSAN nodes cannot see the difference between the different setup strategies of a witness server. But this difference affects the RCCMD configuration:
If running a real witness server as a standalone machine:
In this case, assign the witness server and any hosts that you want to shut down. The hosts will go into maintenance mode accordingly:
- Shut down virtual machines
- The vCenter will perform the resynch
- The hosts switch into maintenance mode
- The hardware can be switched off.
When using a virtual machine to run a witness server
If you run the witness server as a virtual machine in the vSAN, you must differ between the host on which the witness server is stored and the witness server as a stand-alone host. Since the witness server acts like a host within the vSAN, it is perceived and treated accordingly - The installation type does not matter:
While the host that maintains the virtual machine of the witness internally perceives only one virtual machine running "some kind of system", it accepts the witness server as a standalone host and network node on the network. If the wrong IP address has now been specified, the host responsible for the virtual machine will respond correctly:
- The host will stop running the virtual machine
- The host changes to Maintenance Mode
However, since the (albeit virtualized) witness server represents a full-fledged host and network node, it must consequently be treated as a real host and put into maintenance mode before being turned off. Formally, you need two RCCMD appliances to shut down a vSAN. If you use a virtualized witness server, you can use the second RCCMD to regularly switch off the host that manages the virtual witness server.
Tutorial: If the shutdown on a vSAN has to be aborted once...
A normal ESXi cluster with vCenter differs from a vSAN:
While a normal cluster with individual hosts can ultimately shut down and switch off its virtual machines on its own if maintenance mode is not possible, a vSAN can only be switched off if absolutely no virtual machines except the vCenter are running on the vSAN. The second major difference is, that the RCCMD client that manages the vSAN cannot logically run within the vSAN cluster- it must control it from outside. The third point is that vSAN requires a run-on time after all virtual machines have been shut down, during which all inventory data is synchronized; only then may the hosts be switched off safely.
Due to these differences, there are logical sections between in between aborting the shutdown sequence is quite possible. This tutorial shows one way in which an automated shutdown could be aborted. Please note, this is not the intended way how to use RCCMD and you will do it at your own risk...
In this example, the framework conditions are fulfilled that allow operation without a Witness server:
Problem:
As soon as there is a power failure, RCCMD 1 becomes active and starts the shutdown process in time. Measurements have shown that the entire shutdown will take something around 38 minutes. Since the UPS can cover up to 45 minutes, the shutdown must therefore be initiated after 5 minutes at the latest, otherwise the system cannot be shut down correctly. With a time window of 20 minutes, it now results that the main power supply returns and a further shutdown is no longer necessary.
Since the RCCMD appliance, by definition of the software purpose, cannot revert or stop the shutdown sequence, RCCMD 1 will also perform this to the end and shut down the vSAN cleanly. The CS141 cannot send a "clear all pending commands" - signal to RCCMD 1.
Make a decision of principle
Since the UPS was running for 20 minutes, another mains failure is likely to have fatal consequences, consideration should be given to whether a shutdown and a wait until the minimum 40-minute hold-off time or an abort of the shutdown is an option. The commands given are identical at this point, but the event changes. In this example, an abort of the shutdown was chosen. Like the shutdown, the termination is initiated via the CS141, only with the event "Power Restored".
It is important always to keep in mind: The shutdown itself is already an emergency measure - thinking about a forced termination of the emergency measure is legitimate, but always associated with additional risks.
Interrupt the shutdown sequence
When the Power restored event occurs, a shutdown signal is applied to RCCMD 2 - which is supposed to shut down the last single host - which includes the immediate shutdown of all virtual machines. RCCMD 1, as a virtual machine, will adhere to this default and shut down and power off accordingly - forgetting that there are still official control commands pending that affect the vSAN.
The vCenter will process the last commands and then wait for further instructions accordingly. Since the control commands are bound to time windows that have been stored in RCCMD 1, it is possible to quickly find out in this way what has already been done.
- vMotion is active and tries to move the virtual machines by default.
- vSAN - shutdown is active and the virtual machines are shut down or moved.
- The post-synchronization phase is running.
This method will end the current running phase and if no new command is received afterwards, the vSAN will stand in this system state and wait for further orders
Structured Restart: Reactivate RCCMD Protection
Send the WOL signal to the ESXi Single Host - it will power up via the WOL signal and accordingly the RCCMD appliances also start and move to their start position. The RCCMD protection is now up.
Starting the depowered hosts
Now, send a WOL signal to each single host - it does not matter whether this host has already been switched off or not: If the host is running, the WOL signal falls into a void and is ignored at its destination.
Start the Virtual Machines (VM‘s)
Depending on the system configuration, virtual machines that have been switched off you may decide to switch them on via a WOL signal. To do this, send a WOL signal to the MAC address of the respective virtual machine.
Note:
Keep in mind that WOL signals are sent to the MAC address, the CS141 must be in the same network segment or the signal must be accordingly routed.
Since you can freely define the timing, it is even possible to specify a special order in which virtual machines start up. This allows you to have the basic network start up automatically.
What else is important when aborting a vSAN shutdown
Do not just restart the RCCMD client via web interface
RCCMD has a protection mechanism that executes a valid shutdown even if someone stops the RCCMD service via the web interface after the shutdown has been triggered. The system is nevertheless shut down cleanly and switched off. That’s why you need to shut down the managing RCCMD client – just clicking "restart" in the web interface will not interrupt
A vSAN may change the time window and therefore may require adapting the timing parameters regularly:
Depending on the current data situation and the expansion stage of the vSAN, this shutdown can be very protracted and the procedure must be started accordingly early. If the main power supply or an emergency generator becomes available during this time, this does not change the shutdown routine - RCCMD follows an instruction and organises the implementation..
And, of course, if a shutdown in progress must be stopped…
This process cannot be automated completely in the end, because RCCMD coordinates the instructions and their time windows, but does not receive any feedback from the vCenter about logical sections. As a result, an administrator must consider whether to let the shutdown run to the end and restart the system or to provoke an abort within a shutdown routine:
Both have their respective advantages and disadvantages.
To stop the shutdown process and set the associated scripts back to 0, stop the appliance virtual machine on the host. Once this happens, no more commands are transmitted to the hosts and the system stops the shutdown process after the last command has been successfully transmitted. You may save time, but a clear restart may also be the better way for some server operations.
IMPORTANT:
For a shutdown abort, ensure that the virtual machine is shut down by the appliance, even if you set RCCMD to "Stop" in the web interface and the service has been stopped according to the web interface.
The shutdown sequence will still be executed.
How long is the estimated shutdown time?
Basic shutdown time
After entering all the data, an estimated time will be shown that RCCMD may need for a full shutdown.
You can see the estimated shutdown time below the ESXi host settings.
Please note that this value is a guideline calculated by entered data.
This value is intended to help you to find the optimized trigger time to run an emergency shutdown routine if a power failure occurs.
Note:
Each UPS can only grant a pre-defined time window emergency power. When the batteries are depleted, the UPS will shut down itself to avoid damaging the batteries. In general, it will not help if you just play with the numbers within RCCMD until the estimated shutdown time matches the data sheet of the UPS:
Furthermore, these values are just a snapshot of your system based on the data you entered! Please check regularly whether the entered values meet the real shut down condition in case of an emergency.
Keep in mind that between two shutdown tests the shutdown conditions may change. When calculating and adapting the average shutdown time, we recommend taking some extra time than the minimum time required.
v.: 2025-08-26
Comments
0 comments
Please sign in to leave a comment.