2.7 Backup Repositories
Backup repositories are the destination of backup and backup copy jobs in Veeam Backup & Replication. They can also store configuration backups of remote Veeam Backup servers. You can create them using Windows or Linux machines with local attached or remote storage carved out of a SAN, or they can be a storage appliance exposing its disk space via SMB protocol. They can also be one of the supported deduplication appliances.
2.17: Veeam Backup & Replication supports different repository types
Once a backup repository is configured and registered in the Veeam Backup & Replication console, a new cloud repository is created and assigned to the user during the creation of a new Veeam Cloud Connect customer consuming backup resources, using a portion of an existing backup repository. From a service point of view, a cloud repository is a remote repository exclusively assigned to its user. From an infrastructure point of view instead, a cloud repository is a sub-folder of a backup repository with an applied quota.
For a Veeam Cloud Connect deployment, there are no special requirements for repositories compared to regular repositories used in private environments, but the general rules of Veeam Backup & Replication are still valid.
Starting from Veeam Backup & Replication v9.5, service providers can leverage Veeam Scale-Out Backup Repositories (SOBR) and Per-VM backup chains to further improve the efficiency of their Cloud Repository. We will discuss about this topic later in this chapter, but we can anticipate that we highly suggest to use SOBR in any Veeam Cloud Connect.
Extreme caution should be taken for the use of deduplication appliances: As tenants have the option to encrypt their backup files, and service providers cannot forcefully disable this option (but they can force mandatory encryption if needed), encryption itself can nullify any advantage of deduplication appliances, which are going to be filled with encrypted backups that cannot be deduplicated. If a service provider can control the configuration of incoming backup jobs or backup copy jobs, a deduplication appliance may be a good choice, but for those service providers offering Veeam Cloud Connect Backup as a self-service solution, a deduplication appliance may not be the best choice. Also, specific limits on supported/integrated deduplication appliance may limit the choices available when using one of those; for more information, read the corresponding part of the User Guide: Limitations for Cloud Repository.
Once deployed, a repository has different components, listening over different TCP ports:
|SSH Server (Linux only)||22|
|Veeam Installer Service (Windows only)||6160|
|Veeam Data Mover Service (control)||6162|
|Veeam Data Mover Service (data)||2500-5000|
Ports from 2500 to 5000 need to be open between WAN Accelerators and Backup Repositories for data transfers of WAN accelerated jobs, or between Backup Repositories for direct jobs.
Monitoring considerations are different for Windows and Linux repositories. The latter in fact has no permanent component installed: instead, a temporary component is loaded dynamically every time a job is executed. For this reason, the monitoring information is split for the two options:
|Service Display name||Service Name||Startup Type||Log On as|
|Veeam Data Mover Service||VeeamTransportSvc||Automatic||Local System|
|Veeam Installer Service||VeeamDeploySvc||Automatic||Local System|
Administrators need to verify that the Linux repository has the SSH server enabled and running, and Perl subsystem available. The Veeam Backup & Replication server connects to the Linux machine via SSH, copies the temporary binaries and executes them using Perl. No permanent Veeam component is installed in the repository, so there is no Veeam component to monitor.
Service providers may want to monitor the SSH server to guarantee that is up and running.
From a protection standpoint, backup repositories need proper protection. They are the components storing customers’ backup data, and losing them means losing the customers' data. Because of the many available technologies used to build a repository, there are no universal considerations that apply to every scenario. A service provider must carefully evaluate the available options in regards to the technology used to create the backup repository.
Storage space sizing is not covered in this book. There are too many options available on the market to build a Veeam backup repository, and each solution has its own limits in terms of number of disks, stripes, volumes, etc. The only limit on Veeam Cloud Connect Backup is 2 PB (petabyte) for a single cloud repository; service providers may plan to have such large repositories, or build smaller blocks to reduce the failure domain of each repository.
In regards to the memory sizing of a backup repository, it is important to know how a Veeam repository uses memory. Veeam Backup & Replication v9.5 has four different levels of storage optimization for a backup job:
2.18: Storage optimization options for a backup job
A repository uses memory to store incoming blocks. This queue collects all blocks coming from source data movers, caches them in memory and after some optimization, this content is flushed to disk. This reduces the random I/O affecting the backup files to a minimum, while trying to serialize as many write operations as possible. The amount of memory consumed by the queue is simple to calculate: It uses 2 GB of memory per active job. When Per-VM chains are used, as we are suggesting to do when both the service provider and its users have upgraded to version 9.5, similar calculations can be applied even for this configuration option. This value has to be multiplied per the value of concurrent tasks configured in the Load Control, to find the final value of memory that a repository needs.
However, this is not the only amount of memory consumed by the repository: Veeam backup files contain deduplicated information of the saved blocks. As with any deduplicated storage, there are metadata information stored along the file itself in order to keep track of stored blocks.
To improve performance, the repository loads dynamically this metadata information into memory. Starting from Veeam Backup & Replication v8 Update 2, the cache accelerates both write and read operations, but there are also differences in the way the cache is populated and used. The amount of consumed memory for metadata depends on the selected block size for deduplication:
|VBK size||Optimization||VBK block size||Memory consumption for VBK metadata|
|1 TB||WAN target||256 KB||700 MB|
|1 TB||LAN target||512 KB||350 MB|
|1 TB||Local target||1024 KB||175 MB|
|1 TB||Local target 16+ TB||4096 KB||44 MB|
Note: Starting from Veeam Backup & Replication v9, the new block size for Local target 16+ TB is 4 MB instead of 8 MB. The previous value for memory consumption was 22 MB.
By adjusting these values to a real scenario, service providers can estimate how much data a given repository will be able to process at a certain point in time; or said differently, how much memory will be needed for an expected amount of processed data.
If a given backup repository is assigned to different customers and all of them are executing their jobs at the same time, the total memory must be divided among all the incoming jobs. The Veeam repository doesn't constantly consume the same amount of RAM, because it can dynamically load and offload metadata, but planning for the maximum possible consumption is a good choice to be prepared for the worst-case scenario.
Finally, it's worth to remember that backup and backup copy jobs are configured by the customer and not by the service provider. There is no direct way for the service provider to plan for an accurate utilization of the backup repository memory, because he does not know in advance which block size will be used and what the total size of a backup set will be. However, the quota configured for a tenant in Veeam Cloud Connect can also be considered the maximum possible size of a backup file of a customer. For these reasons, proper monitoring of the backup repository is paramount, so the provider can quickly identify when the system is too stressed.
The maximum possible size of a single cloud repository is 2 PB (2,097,151 GB to be precise); the memory required to manage this amount of data at the default block size would be theoretically around 350 GB. This value will never be reached because there are mechanisms in place to flush the cache. However, it is up to the service provider to design a single large backup repository, decide to have multiple independent pods, or leverage Scale-Out Backup Repositories, and size their memory accordingly.
Simple Repository / Pod
The first design is based on one or more "pods", where each pod is a single repository built with any supported storage (local disks in a Windows or Linux machine, a SAN, a NAS or a deduplication appliance) that has a fixed size or expandable up to a certain limit.
2.19: Pod design for repositories
With this kind of repository, if used alone, a service provider needs to plan how to manually distribute customers among the several repositories he could have. And in addition, he must keep some free space for a future increase in the cloud repositories’ quotas and transform operations. A customer may start with a small amount of space, but after some time, the customer could ask for an increase in the storage quota. If there is no free space left in the repository, the service provider will be able to satisfy the customer's request only by migrating the customer's backup files into another repository. This can be done almost transparently, but it involves some manual activities on the part of the service provider and some downtime in the customer’s Veeam Cloud Connect service. Cloud repository quotas are strictly applied, but as long as the customer is not using the entire amount of the assigned quota, the service provider can use some over-commitment. However, the service provider should carefully evaluate the level of over-commitment to avoid any interruption of the service.
In order to prevent these issues, we recommend to create one or more Veeam Scale-out Backup Repositories, and to group multiple simple repositories into these objects.
Scale-Out Backup Repository
The second type of design is Scale-out Backup Repository. This design is only slightly more complex than the previous one, but it has several advantages. For this reason, we highly suggest to use this design. Veeam Scale-Out Backup Repository (called SOBR) groups multiple simple repositories into a unique logical entity that can be dynamically expanded, modified and shrinked, while the logical entity as a whole is constantly seen as unchanged by the client component.
To create it, multiple Veeam simple repositories are grouped together into a SOBR as "extents".
2.20: Scale-Out Repository logical architecture
This solution helps service providers avoid capacity problems in their repository design, especially when enabling self-service capabilities to their customers. If a customer can set up his storage quota freely, proper capacity planning cannot be effective: a scale-out approach helps to react quickly to a capacity shortage without changing any configuration to the repository structure.
Even if a provider is planning to start with a small Cloud Connect deployment and thus is only going to deploy one simple repository, we suggest to create immediately a SOBR entity and to add this repository as the only extent of the repository:
2.21: Scale-Out Repository with one extent
By starting immediately with SOBR, its file and folder structure is configured since the first received backup, and thus any future expansion of the group with the addition of other extents will not require any migration. Placement policies will take care of automatically placing new backups into the newer extents.
NOTE: existing Veeam Cloud Connect deployments using simple repositories can be migrated to the new Scale-out Backup Repository. The procedure is not freely available to service providers, but can be asked to Veeam Support, that will execute the migration for the service provider by simply opening a support ticket.
Concurrency in a Backup Repository is an interesting and important topic: the Load Control section of a repository has two main values, and while data rate is pretty easy to understand, the limits applied to concurrent tasks can be a bit tricky, also because the behaviour in a Cloud Connect environment is different.
Limits of a repository should be carefully evaluated by the service provider, in order to find a balance between:
- avoid to overload the Cloud Connect environment with too many tasks; we also recommend to never remove the limit but to always set a number;
- have customers jobs waiting for available resources at the service provider for too long because there are too few task slots.
2.22: Configure carefully the repository load control
A note about the removal of the limit: before any cloud quota is created, each extent of a SOBR group has to have the limit configured. Unlimited access will prevent the creation of cloud quota:
2.23: Cannot create quota on extents with unlimited task slots
Let's start from the basic concept: a task is an operation that can be executed by a Veeam Repository. A backup job, a backup copy job, but also compact or merge operations, they all consume a task slot. So, maximum concurrent tasks is the value of how many of these operations a repository can run at the same but.
This is the general concept, but in Veeam Cloud Connect, the behaviour is different: in order to guarantee to tenants that a job is always executed when the tenant planned for it, every job is executed as long as there are still free tenant concurrent tasks :
2.24: Max concurrent tasks for a tenant
This is the most important concept to understand when planning for Veeam Cloud Connect backup services, let's repeat it again:
Veeam Cloud Connect allows each tenant to execute up to the max amount of concurrent tasks assigned to the tenant itself, REGARDLESS of the available concurrent tasks slots in the repositories.
This is done on purpose to let tenants consume the slots they are paying for, but it can lead to undesired results if the environment has not been sized correctly.
Let's use the example of my own lab: there are 4 repositories joined together into a SOBR group. The first three has 4 CPU and 16 GB of Memory, and the fourth has double of those resources; they all offer 400GB of disk space for backups.
The overall SOBR group is this:
|server||CPUs||Memory||Storage space||Max concurrent tasks|
|REFS1||4||16 GB||400 GB||16|
|REFS2||4||16 GB||400 GB||16|
|REFS3||4||16 GB||400 GB||16|
|REFS4||8||32 GB||400 GB||32|
In an end-user environment, this SOBR group would be able to accept up to 80 concurrent tasks. But in Cloud Connect, connection limits are regulated by the tenants configurations. If for example we have 100 tenants, each with 5 allocated tasks, the total amount of tasks that the Cloud Connect environment will accept is 500, way more than 80.
This case obviously will happen if every tenant is using all their assigned tasks; the assigned value is a hard limit, so even if there's only one tenant actively running tasks, it's limit of 5 will never be surpassed, even if there are 80 task slots in the SOBR group.
So, why we need to configure Max concurrent tasks in each extent of a SOBR group? This is still an important parameter to set, because it influences the way internal load balancing of SOBR works.
Let's explain how a SOBR group selects which extent has to be used.
For an existing backup chain, as long as there is enough free space, the choice is simple: SOBR selects the same extent where the full was stored (for data locality policy) or the extent where the previous incremental was stored (for performance policy).
For a completely new chain or (in case of performance policy) for the first incremental, the placement of the backup file has to be decided. SOBR algorithm works in this way:
- First, SOBR selects only those extents that will not break the placement policy (so in case of performance policy, the extent used for the full file will NOT be used);
- Then, SOBR lists the extents based on their actual load, measuring the used slots and dividing this number by the Max Concurrent tasks. We may have a situation like this:
|Order||Server||Max concurrent tasks||Used slots||Load|
In our example, REFS4 will be the selected extent. It has the most active sessions, but its load is the lowest of the SOBR extents. The load is calculated again at each new session, and since the limit is set by the sum of the tenant extents, we may also have load values above 100%.
Finally, if there's a tie between two or more extents in terms of Load, SOBR uses Free space as the second parameter to evaluate: the extent with more free space will be selected, if both have the same load.
Windows 2016 and ReFS
Veeam Backup and Replication 9.5 has introduced support for Microsoft Windows 2016. Here Microsoft has introduced, among other features, the new ReFS 3.1 filesystem. Some of its characteristics makes it a possible great choice to create new Veeam repositories.
Microsoft has introduced a new feature in ReFS 3.1 called BlockClone, that can be leveraged via API calls. Thanks to this feature, ReFS can clone blocks by just updating its metadata information, without effectively reading and writing the same block multiple times, but only updating the reference count of the same block. Let's suppose we have two files, made with multiple blocks (images are taken from Microsoft MSDN):
2.25: Cluster layout before clone
Now suppose an application issues a block clone operation from File X, over file regions A and B, to File Y at the offset where E currently is. This is the same operation that a Veeam Backup transform operation does, where an incremental backup is merged into the full backup. The result on the file system after the clone operation is this:
2.26: Cluster layout after clone
The new file Y is not effectively written again, it’s just an update operation on the Reference Count of the file table in ReFS, and now block regions A and B are used two times in the file system, by both file X and Y. The net result is that transform operations in Veeam Backup & Replication are extremely fast now, as only metadata need to be updated. There are also advantages for GFS retention (typical choice for a Backup Copy Job that can be sent to Veeam Cloud Connect): a complete full backup is written each time, but now it doesn’t consume additional space on the disk, as the same block is just referenced multiple times.
NOTE: there is NO space saving on incremental backups during transform operations, as the same block is always written once, and it’s only moved from the incremental backup file to the full one. Transform operations are about time saving, not space saving. You can get savings when you run a synthetic full, either in backups or backup copy jobs (like GFS retention).
In order to leverage this technology, service providers need first of all to upgrade Veeam Cloud Connect to Veeam Backup & Replication 9.5, and have at least one repository using Windows 2016 as the underlying operating system, and a volume formatted with ReFS 3.1.
IMPORTANT: we highly suggest to format ReFS volumes using the 64KB cluster size, instead of the default 4KB. To learn more about the reasons behind this recommendation, read here: http://www.virtualtothecore.com/en/refs-cluster-size-with-veeam-backup-replication-64kb-or-4kb/
Veeam datamover in Veeam Backup & Replication 9.5 recognizes the ReFS filesystem and can then leverage BlockCloning. Service providers and their customers can recognize the effectiveness of the API by first of all looking at the time for completing a merge operation, and by looking at this line in the Job Statistics:
2.27: Fast Clone leveraged in a Veeam backup job
One optimal design we suggest is the usage of multiple pods, each using ReFS volumes, grouped together into a SOBR with data locality policy.