The simplest solution is to just use an external disk array for the data storage and have a second computer ready to take over if the primary fails. In case of a failure, detach the primary server from the disk array, plug in the secondary server, and turn it on. Providing the downtime required to change over the disk array is acceptable (for example, 5 minutes), this is the cheapest and simplest solution.
Note: RAID arrays are themselves subject to failure. See KB article Bad Raid Controller can Corrupt Data for an example.
Perforce can be deployed on a clustered filesystem using highly available storage. In order for the Perforce Server to function properly in a clustered environment, the flock() call must work reliably across all nodes using the filesystem. As long as file locking is respected cluster-wide, the Perforce Server will work well from an availability point of view; however, performance will suffer to some degree in such a distributed locking configuration.
When using Perforce in a clustered server environment, there might be special considerations for licensing. The Perforce server must be able to lookup the IP address of the cluster alias and resolve it to an IP address on the node on which the perforce server is running. The Perforce Server will perform this check as part of its license validation. If this license check fails, the server will refuse to start. Contact Perforce Support for license assistance.
Perforce recommends a "warm standby" strategy consisting of restoring the latest checkpoint on a read-only standby server then regularly or continually updating the replica. Optionally, the standby server can be kept updated by one of the following methods:
- Continual replication of metadata and versioned files
Starting with version 2010.2 commands have been added to the Perforce server to allow for the continual unidirectional replication of archive and metadata from one server to another. Please see the Perforce Replication chapter in the System Administrator's Guide for details.
- Daily or regular replication of metadata and versioned files
Every N minutes or hours truncate the journal file from the primary server and replay on the warm standby. To create and update a warm standby server, please see KB article Offline Checkpoints regarding the off-line checkpoint process. In addition to maintaining a standby copy of the Perforce metadata, a copy of the Perforce archive files must also be maintained (using rsync or similar utility), or the archives must be accessible to the standby server using a SAN or similar network mounted file system.
- Continual replication of metadata only
Starting with Perforce server version 2009.1, support was added for the p4 replicate command to replicate Perforce metadata only . Refer to KB article Perforce Metadata Replication for more information.
To provide additional assurance that the replica server is a copy of the master server, Perforce recommends "re-seeding" the Perforce replica at least once a month. During off-hours, take a checkpoint on the master server, stop the replica server and move the db.* and lbr files off the replica for safekeeping, then restore the checkpoint and restart the replication.
Note: For Perforce server versions prior to 2009.1 there is a user-contributed utility called p4jrep that enables maintaining an "offline" replica of the primary Perforce database. This utility is available from our Public Depot. Please note that while p4jrep is not supported, it has been implemented successfully at several large sites.
The p4jrep utility allows for the replication the Perforce database (but not the archive files) to a remote machine. Again, if the remote machine also has access to the Perforce archive files, using NFS for example, then it is ready to act as a replacement server.
Note: If a CPU or disk fails in a clustered environment, Perforce cannot guarantee that database corruption will not occur. In the case of a hot backup, the Perforce Server is dependent on the third-party failover solution, something Perforce cannot guarantee. A "warm standby" server provides some protection against metadata corruption, whether it be in a clustered environment or not.
There are also third-party solutions for Perforce based on the journal replication concept, such as the ICManage solution used by several large companies. The ICManage solution is Linux-based.
Perforce replication can be used for disaster recovery where the disaster recovery replica server is on a different site, with caveats:
- The network connection between the sites must be highly reliable and fast enough to keep up with replication data transfer.
- Perform regular testing to confirm the replica has the proper data.
- Conduct disaster recovery drills at least once a quarter as seen in Failing over to a replica server.
- Perforce also recommends "re-seeding" the Perforce replica at least once a month as described above.
Perforce does not recommend using a continual read-only replica server as a migration tool. The best practice is to stop Perforce, take a checkpoint on the old server and install Perforce and restore this checkpoint onto the new server. Then copy the versioned files from the old server to the new server, start Perforce, and run sanity tests. See our KB article, Cross-Platform Perforce Server Migration.
But if downtime is an issue, before taking the old server down, install Perforce onto the new server, erase any new db.* and files on the new server, then restore the old server's checkpoint onto the new server and copy the versioned files from the old server to the new server. Then during downtime, stop the old server and play back the old server's journal onto the new server and use rsync, xcopy, or robocopy to update the latest changes to the versioned files. Then start Perforce on the new server, disable the old server or change the old server to the different unadvertised port number, and run sanity tests on the new server to verify the migration.
In all cases, make sure the old and new servers are on the same Perforce release. A Perforce upgrade, if desired, should be a separate step.
High Availability and Disaster Recovery Planning
User Conference Presentations
The following presentations from the Perforce User Conference 2005 might be of interest to anyone configuring a high reliability Perforce solution: