Sunday, January 20, 2008

Transactional Debian Upgrades with ZFS on Nexenta

The Problem explained

There is no ideal software, it always has bugs. Minor, major or security issues will always exist and modern operating systems need to deal with this fact.

"""Today, engineers can design and test something as complex as a Boeing 777 in cyberspace. But paradoxically, that's not possible with big software programs. The physical laws governing how metal behaves when shaped into a plane and hurled through the air are well known. For software, there is no such body of basic science. """ www.businessweek.com

The Solution

What if any software which user installs had a capability to rollback to previously known successful point and operation itself would take no time?

What if developer or user has a tool which could checkpoint operating system and capability to revert changes in no time?

This is possible if we will marry two great technologies: ZFS and Debian APT. Both technologies now part of Nexenta Operating System which is core foundation for its derivative distributions.

Meet apt-clone(8). The tool which integrates with the NexentaCP system, keeps track of upgrade checkpoints and allows to create/destroy/edit checkpoints by request.

Example #1. Recovering from unsuccessful upgrade

In this first example we will try to bring system up to date with 'unstable' official repository. Unstable repository might break stuff, but sometimes it has fixes or features which overweight the risk of upgrade. Upgrade usually involves risk, even for well tested software... With Nexenta ZFS integrated capabilities and apt-clone utility the risk is minimal - just checkpoint system before upgrade and roll it back in case of failure or other reason.

Note that initially NexentaCP 1.0 system has such ZFS layout:

root@myhost:/export/home/erast/apt# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.36G 2.18G 23K none
syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy
syspool/rootfs-nmu-000@initial 226M - 786M -

where "syspool/rootfs-nmu-000" is bootable ZFS dataset. Now, lets do upgrade from unstable repository using apt-clone:

root@myhost:/export/home/erast/apt# apt-clone dist-upgrade
This operation will upgrade your system using ZFS capabilities. Proceed ? (y/n) y

Updating APT sources ...
Downloading upgrades and checking if reboot will be required.
This may take a few minutes. Please wait...
Verifying free space...
Success. Upgrade requires 279.60MB of available free space.
This upgrade will require REBOOT. Proceed? (y/n) y

Upgrade is in progress. Please DO NOT interrupt...
Creating Upgrade Checkpoint...

Upgrade Checkpoint has been created: rootfs-nmu-001

Use 'zfs list -r syspool' command to list all available
upgrade/rollback checkpoints

Extracting templates from packages: 100%
Preconfiguring packages ...
(Reading database ... 40453 files and directories currently installed.)
Preparing to replace nexenta-sunw 5.11.79-1 (using .../nexenta-sunw_5.11.80-1_solaris-i386.deb) ...
Unpacking replacement nexenta-sunw ...
Setting up nexenta-sunw (5.11.80-1) ...
(Reading database ... 40453 files and directories currently installed.)
Preparing to replace nexenta-lu 5.11.79-1 (using .../nexenta-lu_5.11.80-1_solaris-i386.deb) ...
Initiating NLU protected environment ...
Unpacking replacement nexenta-lu ...
Setting up nexenta-lu (5.11.80-1) ...
...
...
Unpacking replacement sunwwbsup ...
Preparing to replace sunwesu 5.11.79-1 (using .../sunwesu_5.11.80-1_solaris-i386.deb) ...
Unpacking replacement sunwesu ...
Errors were encountered while processing:
/var/cache/apt/archives/sunwscmu_5.11.80-1_solaris-i386.deb
/var/cache/apt/archives/sunwsmbsu_5.11.80-1_solaris-i386.deb
***********************************************************************
* *
* Upgrade sequence returned an error. To enter NLU protected *
* environment please type 'source /tmp/nlubin/env.sh' *
* *
***********************************************************************
E: Sub-process /usr/bin/dpkg returned an error code (1)
Upgrade failed. Would you like to rollback changes now? (y/n) y

All upgrade changes now rolled back.

And indeed, root filesystem didn't change at all:

root@myhost:/export/home/erast/apt# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.36G 2.18G 23K none
syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy
syspool/rootfs-nmu-000@initial 226M - 786M -

We are back to the starting point, few minutes spent on upgrade and just few seconds on rollback and system in exactly previous state, no reboot required... Not bad!

Example #2. Successful upgrade, but revert back to the previous state

Sometimes we want to go back in time to the previously known state even if software upgrade went successfully. Lets see how this could be done.

This time - successful upgrade:

...

Setting up sunwsshdr (5.11.80-1) ...
Setting up class: sshdconfig /etc/ssh/sshd_config
Setting up class: manifest /var/svc/manifest/network/ssh.xml

Setting up sunwsshdu (5.11.80-1) ...

Setting up sunwwbsup (5.11.80-1) ...

Setting up sunwesu (5.11.80-1) ...

Creating ram disk for /tmp/upgrade-attempt.23979
updating /tmp/upgrade-attempt.23979/platform/i86pc/amd64/boot_archive
updating /tmp/upgrade-attempt.23979/platform/i86pc/boot_archive
* * *
SYSTEM NOTICE

The first phase of upgrade has completed successfully:
- created Upgrade Checkpoint 'rootfs-nmu-001'
- the system is ready to reboot into the new checkpoint
- all Zones been checkpointed and upgraded

+------------------------------------------------------------------+
| |
| At this point you have three options: |
| |
| 1. You can reboot now, make sure that system is healthy and |
| then activate the current (i.e., newly created) checkpoint. |
| |
| 2. You can activate the newly created (upgraded) checkpoint |
| right now, and then reboot. |
| |
| 3. Or, you can simply continue using the system as is and |
| do (1) or (2) later. |
| |
+------------------------------------------------------------------+
Would you like to follow the option (1) above and reboot now ? (y/n) y

Activate upgrade command: 'apt-clone -a rootfs-nmu-001'
Rollback changes command: 'apt-clone -r rootfs-nmu-001'
Operation in progress. Please wait...

After the machine is rebooted, lets see what is happened with system pool:

root@myhost:/export/home/erast# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.87G 1.67G 23.5K legacy
syspool/rootfs-nmu-000 1.37G 1.67G 1.13G legacy
syspool/rootfs-nmu-000@initial 234M - 786M -
syspool/rootfs-nmu-000@nmu-001 7.13M - 1.13G -
syspool/rootfs-nmu-001 514M 1.67G 1.17G legacy

root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
o rootfs-nmu-001 Upgrade Checkpoint [nmu-001 : Jan 17 04:09:32 2008]

Please notice that active checkpoint is 'rootfs-nmu-000' while currently loaded is 'rootfs-nmu-001'. Assuming that for some reason (like you discovered that some software not behaves like it used to) we decided to rollback this upgrade. However, it is not possible to rollback current checkpoint, and this is understandable - we are currently using it and dataset is locked:

root@myhost:/export/home/erast/apt# apt-clone -r rootfs-nmu-001
This will destroy clone 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y

apt-clone.WrongArguments: Can not destroy currently active system folder

So, we reboot, and select previous checkpoint from GRUB:

After reboot, notice that current bootfs is 'rootfs-nmu-000':

root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
rootfs-nmu-001 Upgrade Checkpoint [nmu-001 : Jan 17 04:09:32 2008]

Now lets revert previous upgrade:

root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -r rootfs-nmu-001
This will destroy clone 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y

Upgrade changes for clone 'syspool/rootfs-nmu-001' now rolled back/destroyed.
root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]

As you can see, dataset and 'rootfs-nmu-001' deleted and we simply continue to work with not modified system:

root@myhost:/export/home/erast/apt# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.36G 2.18G 23K none
syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy
syspool/rootfs-nmu-000@initial 226M - 786M -

Example #3. Installing application under ZFS supervision

Assume we have an application we want to deploy, but the changes involved could be too intrusive and our system could end up to be unusable even after software removal. Think of Windows registry or UNIX /etc or who knows what, but sometimes it happens and there is no way back, only complete OS re-installation.

In this example, lets install apache:

root@myhost:/export/home/erast# apt-clone install apache2
This operation will upgrade your system using ZFS capabilities. Proceed ? (y/n) y

Updating APT sources ...
Downloading upgrades and checking if reboot will be required.
This may take a few minutes. Please wait...
Verifying free space...
Success. Upgrade requires 4.05MB of available free space.
Upgrade is in progress. Please DO NOT interrupt...
Creating Rollback Checkpoint...

Rollback Checkpoint has been created: rootfs-nmu-001

Use 'zfs list -r syspool' command to list all available
upgrade/rollback checkpoints

Preconfiguring packages ...
Selecting previously deselected package libapr0.
(Reading database ... 40408 files and directories currently installed.)

...

Setting up apache2-mpm-worker (2.0.55-4nexenta2.3) ...
Starting apache 2.0 web server....

Setting up apache2 (2.0.55-4nexenta2.3) ...

It is installed. Lets assume that we modified system pool to make apache2 work with self compiled php, and changed some other aspects of the system and suddenly decided that apache2 is not what you wanted, but you better go back to apache1 setup...

Lets see if rollback checkpoint was created:

root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
rootfs-nmu-001 Rollback Checkpoint [nmu-001 : Jan 17 17:54:45 2008]
root@myhost:/export/home/erast# apt-clone -l
root@myhost:/export/home/erast# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.37G 2.17G 23.5K legacy
syspool/rootfs-nmu-000 1.37G 2.17G 1.14G legacy
syspool/rootfs-nmu-000@initial 234M - 786M -
syspool/rootfs-nmu-000@nmu-001 3.75M - 1.14G -
syspool/rootfs-nmu-001 77.5K 2.17G 1.14G legacy

Good, we see that checkpoint 'rootfs-nmu-001' was successfully created, now lets activate it:

root@myhost:/export/home/erast# apt-clone -a rootfs-nmu-001
This will set default GRUB entry to 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y

Upgrade changes for clone 'syspool/rootfs-nmu-001' has been activated.
Default GRUB entry '0' will boot 'syspool/rootfs-nmu-001' ZFS clone.
root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008]
o rootfs-nmu-000 Upgrade Checkpoint [nmu-000 : Dec 30 09:46:19 2007]

Yes, the checkpoint 'rootfs-nmu-001' is active now, however we need to reboot to get our previous state, shouldn't take long time and in couple minutes we get our system back to the checkpointed state:

root@myhost:/export/home/erast# dpkg -l|grep apache2

i.e. no apache2, and system state is back. However, checkpoint 'rootfs-nmu-001' is now active and current:

root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008]
rootfs-nmu-000 Upgrade Checkpoint [nmu-000 : Dec 30 09:46:19 2007]

We can safely destroy previous checkpoint 'rootfs-nmu-000' now, or keep it, if we want to continue on this setup later on. Lets destroy it, to save some space taken by apache2 modifications:

root@myhost:/export/home/erast# apt-clone -r rootfs-nmu-000
This will destroy clone 'syspool/rootfs-nmu-000'. Proceed ? (y/n) y

Upgrade changes for clone 'syspool/rootfs-nmu-000' now rolled back/destroyed.
root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008]

The system state is back, but layout is slightly changed:

root@myhost:/export/home/erast# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.37G 2.17G 23.5K legacy
syspool/rootfs-nmu-001 1.36G 2.17G 1.14G legacy
syspool/rootfs-nmu-001@initial 234M - 786M -

You can see that 'rootfs-nmu-000' is been used by apt-clone to accomplish in-place upgrade, that is why checkpoint been called 'Rollback Checkpoint' which was activated later. However, it was also an option to use 'Upgrade Checkpoint' and install apache2 server into ZFS cloned filesystem in chroot environment using '-s' option like this:

apt-clone -s install apache2

In that case, the rollback procedure not much different to what been explained in Examples #1 and #2.

Good Luck and happy checkpointing!

The End

At the end I would like to mention NexentaStor, which is NexentaCP-based derivative and extends this idea with providing production-quality and integrated with management software upgrades. Read more details here.

No comments: