Guest Author : Sander Stad (@sqlstad)
PSDatabaseClone is a PowerShell module that has the ability to create images of databases ("clones") and distribute those clones to one or more hosts.
Why use this module?
Database administrators, and really anyone who has to deal with data provisioning, have users that want to work with our databases. This can be to develop new solutions, to extract the data for ETL processes, simulate performance issues, retrieve data for reporting purposes, and so on.
Provisioning the data to other locations can be a very tedious task and most of us spend a lot of time and resources performing that task.
Known solutions to provision the data is by restoring databases or developing some extensive scripting to solve this problem. These solutions are not always the most efficient way of provisioning data and can take a considerate amount of time.
Besides the time it costs to get the data to the other locations, it can also cost a lot of disk space.
What if there was a way to reduce the amount of time it takes to provision the data and, at the same time, reduce the amount of resources needed? Now there is a solution for that and it's called PSDatabaseClone.
How to install the module
The installation of the module can be accomplished in several ways.
The easiest way is by using the command
Install-Module. The command looks for the module in the PowerShell Gallery. Execute the following code to install the module using the PowerShell Gallery:
Using this method also makes it easier to update the module when a new version gets released. Just use the
Update-Module command like below to update it to the latest version:
Cloning the repository
Another way to install the module is by cloning the Git repository to your local machine. I would advise people who want to contribute to the project to use this method.
Execute the following command:
git clone https://github.com/sanderstad/PSDatabaseClone.git
This will download the entire repository to your local machine from which you can use the module or develop new features for it.
How does the module work?
The goal of the module was to use as minimum external applications as possible, so everyone can use the module without having to install any extras. The module works by using native technology in Windows to create virtual hard disks (VHDs). A VHD is a file format that represents a hard disk and can contain volumes, partitions which in turn can contain files and folders. VHDs can be distributed to other hosts and mounted as an additional disk, much like plugging in an external USB drive. Once connected, the host can access the files and folders like any other disk on the system.
The module creates a VHD which will be our so-called image. The image will contain all the files and folders that are necessary for the database.
During the process of creating an image, the module will get a backup from the database you've specified and restore the database into a VHD.
After the restore, the VHD is closed and unmounted and now contains all the files needed to clone the database.
Figure 1: Process of creating an image
To create a clone, the module creates a "differencing disk" which contains the same files and folders as the parent disk, but only saves the delta between parent and child.
That functionality immediately saves disk space because only a fraction of the original database size is used for the clone.
Another advantage is that, because the clone saves the delta of the changes, the image itself is not affected by any of the changes. This means that other clones are also not affected.
The process of creating a clone is to create a differencing disk inside a created image (parent disk). The clone is mounted and the database inside the clone is attached to an instance.
From that moment on, the database is operational in the same way the original one would be. It contains the same objects, data, and behaviors. Because all the objects are in the database like the original database, query plans and query statistics will act the same.
This gives the user or developer the chance to create queries that will run the same on production as on their development system, reducing the risk of running inefficient queries.
Creating the images and clones
Create an image
The first step of the process is to create an image. To create an image you have the following prerequisites:
- Source instance
- Destination instance (can be the same as the source instance)
- Network path
The command that creates the image is:
It's very important to use a network path to the images to make it possible for other hosts to find the image and use it to create a clone.
When the following command is executed:
New-PSDCImage -SourceInstance SQLDB1 -DestinationInstance SQLDB1 -ImageNetworkPath \\sqldb1\psdatabaseclone\images -Database DB1 -CreateFullBackup
You'll get the following result:
This command was executed on 2018-07-24 around 9 PM, which explains the dates in the image
In this example the command looks at the instance SQLDB1 for a particular database and tries to create an image using the same instance. It will create a new full backup to get the latest data.
The result is an image located in the network share
The numbers in the name of the image are a timestamp at which time the image was created.
Creating a clone
The clone is dependent on the image and the easiest way to create a clone from a particular database is to let the module figure that out for you.
Executing the following command:
New-PSDCClone -SqlInstance SQLDB3 -Database DB1 -CloneName DB1_Clone2 -LatestImage
Will create a similar result like below:
The command just created a differential disk on another instance. The clone is based on the latest image for database DB1, and the clone will be named "DB1_Clone2."
Because we did not supply a destination, the command will lookup the default data location of the SQL Server instance.
Within that directory it will create a subdirecory called "clone". The clone will be placed within that folder.
During the setup of the module, you assigned a location where all the clones can be found. This was either a database or a directory, and more specifically a share, where the configuration files are written.
To retrieve the information you can use the commands
Get-PSDCClone. These commands will retrieve the information from the information store and return it back to the user.
To get the image information you have to use the command
Get-PSDCCimage. This will give you a result similar as below:
You see that there can be multiple images from the same database which each could have different data.
Similar as to retrieving image information, you can get clone information. Use the command Get-PSDCClone and you will get a similar result as below:
The next thing with PSDatabaseClone is to make it possible to obfuscate the data in the images to make it suited for enterprise environments. We all know about the data breach incidents lately and we want to make sure our data, at least the data that is not on premise, is masked in such a way that nobody would know what to do with it.
How can you help?
PSDatabaseClone is an open source PowerShell module and everybody who's interested can help to improve the module. If you're familiar with developing PowerShell functions and/or modules you can help out developing new features, fix bugs (I'm sure there are bugs somewhere). If you're not comfortable developing the code, don't worry; there is plenty of stuff to do.
You can help test the different functions and try to break them. I, as one person, can only go so far and most of the time I test the happy flow of my programs. You probably test in a different way and get other results that I would never have thought of. This will help make the module better.
You can also help think of new features to add to the current solution. You may have a great idea for new functionality which I could not have thought of on my own.
If you have an idea for a new feature or found a bug, go to the github page and create a feature request or a bug report. It is greatly appreciated.
About the Author
He has worked in a variety of industries, from agriculture, financial industries and healthcare.
He has been working with PowerShell since version 1 and is embracing automation where he can.