Perforce Public Knowledge Base - Checkpoints in a Distributed Helix environment
Downloads Blog Company Integrations Careers Contact Try Free
Menu Search
Reset Search



Checkpoints in a Distributed Helix environment

« Go Back


How is backup and recovery handled in a Distributed Perforce Helix environment?
In a Distributed Helix environment, file content and metadata is spread out over multiple servers; no single server contains all of the installation's data. 

The Commit Server contains:
  • All of the committed files revisions and their content
  • Global data, such as security information, stream and depot specifications, and the jobs database
  • Work-in-progress data for clients using not bound to any Edge Server
Each Edge Server contains:
  • A (possibly filtered) copy of the Commit Server's database content
  • Some or all of the file content for committed file revisions
  • All of the work-in-progress data for clients bound to the Edge Server
  • Work-in-progress data includes both database content and shelved file content
In a Distributed Helix environment, every server needs to be regularly backed up, and can be restored if need be from those backups. It is not necessary, however, that all servers be backed up simultaneously. 

Commit Server

Checkpointing the Commit Server

Taking a checkpoint of the Commit Server impacts the entire Distributed installation, since Edge Server's are constantly communicating with the Commit Server. For that reason, it's recommended that a Distributed Perforce Service use a no-down-time backup technique for the Commit Server, such as:
  • Creating a read-only replica of the Commit Server and checkpointing the replica in coordination with journal rotation on the Commit Server
  • Using the Server Deployment Package (SDP) or a similar approach to maintain a separate copy of the Commit Server database and checkpointing that database periodically
Checkpointing a Commit Server is identical to checkpointing a classic Perforce Helix Server, with the exception that rotated journals should not initially be compressed. The replication process may need to read rotated Commit Server journals and can't do so if they're compressed. Therefore, the following rules should be followed when checkpointing the Commit Server directly and rotating the Commit Server journal:
  • Use p4d -jc -Z or p4 admin checkpoint -Z  so rotated journals aren't compressed
  • Use p4d -jj or p4 admin journal without the -z flag so rotated journals aren't compressed

Restoring the Commit Server

Restoring the Commit Server is a big deal and should not be undertaken lightly. After restoring the Commit Server, you most likely have to re-seed all Edge Server instances, and run p4 reconcile on every client. 

Guarding against Commit Server Outages

By minimizing the overall work that is performed by the Commit Server, the installation is able to institute high levels of protection for that server, minimizing its downtime. Considerations for the Commit Server should therefore include:
  • High quality server hardware, with error-correcting memory, RAID-protected or otherwise reliable disk subsystems, UPS power supply, etc...
  • Restricted physical access: no other work should be occurring on this machine, no unprivileged users should be logging into it
  • Standby hardware in case of hardware failure
  • Server Deployment Package, read-only replica, or other hot-spare protections
  • Routinely run p4 verify, p4d -xx, p4d -xv on the standby spare

Edge Server

As described above, Edge Servers contain both global and local data:
  • In a single set of database tables
  • With a single journal containing both the replicated changes from the commit server and the local changes for local clients/labels

Checkpointing an Edge Server

Taking a checkpoint of the edge server impacts users as there is downtime for the duration of the checkpointing process. For that reason, best practice for checkpointing an Edge Server is to create a read-only replica of the Edge Server and checkpoint the read-only replica. If checkpointing the Edge Server directly, the process is the same as taking a scheduled checkpoint of a replica server, as described in Taking Checkpoints on Edge and Replica Servers.

Restoring an Edge Server

The Edge Server checkpoint contains the complete data for all of the Edge Server's database tables, and the Edge Server journals contain the transactional data for all updates that the Edge Server performs. Restoring an Edge Server is a completely standard process; simply restore the checkpoint and then replay any necessary journals.

Offline Checkpointing an Edge Server

Offline checkpointing techniques can be used for Edge Server instances in all the standard ways. For example, you can set up a read-only replica of your Edge Server and checkpoint the read-only replica instead of the Edge Server directly. This provides the benefit of eliminating downtime on the Edge Server. You can alternatively use the Server Deployment Package (SDP-style) techniques to maintain a mirror depot by playing each rotated Edge Server journal into a separate P4ROOT, and then periodically checkpoint that P4ROOT.

Journal Dumping to an Edge Server

The p4d -jd command takes a point-in-time checkpoint without incrementing the journal counter on the edge server so it can be used in situations where Edge Server administrators wish to independently create a checkpoint of that Edge Server without the need to rotate journals system-wide. 
Related Links
Backup and Recovery ('Fundamentals' Admin Guide)
Backup and high availability / disaster recovery (HA/DR) planning ('Multi-site' Admin Guide)



Was this article helpful?



Please tell us how we can make this article more useful.

Characters Remaining: 255