The offline checkpoint process involves creating a second instance of your Helix Server database and then performing checkpoints on this second database instance. By taking checkpoints offline, users are no longer blocked from accessing the primary server during lengthy checkpoint operations; the only downtime on the primary server is associated with rotating the journal. You can configure and maintain an offline checkpoint server using Helix Server replication or through a manual process of checkpoint restore and subsequent replay of rotated master journals. Both methods are detailed below.
Using Helix Server Replication for Offline Checkpoints
The secondary server instance used for offline checkpoints is configured and maintained through the use of Helix Server replication. Follow the documentation here to set up a read-only replica server to use as the offline checkpoint server:
- The database content in the offline checkpoint server is updated automatically by the Helix Server replication process.
- To take checkpoints, a replica server need only be configured to replicate metadata (Helix Server database content) with db.replication=readonly and lbr.replication=none. The replica server can optionally be configured to replicate versioned file (archive) content with db.replication=readonly and lbr.replication=readonly and can then serve as an offline checkpoint server in addition to a potential failover option should the master server become unavailable.
- Ideally, choose a machine other than the primary server to host the replica server to reduce any potentially unfavorable performance effects of the checkpoint process.
- Helix Server releases 2015.1 and later that are configured as replica servers don't require a separate server license as they use the master server license.
- Should the replica server used for offline checkpoints become corrupt or otherwise unusable, you can reseed the replica from a new primary server checkpoint to restart the process. How to reseed a replica server
Once the read-only replica server has been configured and is running, to take a checkpoint on the offline server follow the instructions in the following article:
Taking Checkpoints on Edge and Replica Servers
Using an Offline Helix Server Database Directory for Offline Checkpoints
The secondary server instance used for offline checkpoints is maintained manually by first seeding an offline database directory with a current primary server checkpoint and then manually replaying subsequently rotated master journals into the offline database. Checkpoints are taken against the offline database after each manual replay of the master server rotated journal.
A prerequisite to setting up a manual offline checkpoint process is choosing a location for your offline database. In the example below,
D:\Offline_P4ROOT is used as the offline database location. To seed the offline database, checkpoint the primary server:
p4 -p MASTER-HOST:MASTER-P4PORT admin checkpoint
and restore the checkpoint to the offline database location:
p4d -r D:\Offline_P4ROOT -jr checkpoint.NNN
Running the command 'p4 admin checkpoint' on the primary server will use the server configuration to determine the location and name of the checkpoint to be created. The journalPrefix server configurable dictates behavior:
- Check the setting using p4 configure show journalPrefix
- When journalPrefix is set, the configured prefix is automatically used as a prefix argument to the p4 admin checkpoint command and the checkpoint is written with the name and location of the configured prefix. For example, with a journalPrefix setting of C:\P4ROOT\backups\helixServer the checkpoint file written would be C:\P4ROOT\backups\helixServer.ckp.NNN.
- When journalPrefix is not set and no prefix is specified on the command line to the p4 admin checkpoint command, default behavior writes the checkpoint to the P4ROOT directory using the default naming convention checkpoint.NNN.
Once the primary server checkpoint has been successfully restored to the offline database location, the offline checkpoint process is to rotate the primary server journal:
p4 -p MASTER-HOST:MASTER-P4PORT admin journal
manually restore the rotated journal to the offline database:
p4d -r D:\Offline_P4ROOT -jr journal.NNN
and checkpoint the offline server without truncating the journal using p4d -jd:
p4d -r D:\Offline_P4ROOT -jd checkpoint.mmddyyyy
The server configuration determines where the p4 admin journal command on the primary server writes the rotated journal and also determines its name. The journalPrefix server configurable (p4 configure show journalPrefix) dictates behavior:
- When journalPrefix is set, the configured prefix is automatically used as a prefix argument to the p4 admin journal command and the rotated jouranl is written with the name and location of the configured prefix. For example, with a journalPrefix setting of C:\P4ROOT\backups\helixServer the rotated journal file written would be C:\P4ROOT\backups\helixServer.jnl.NNN.
- When journalPrefix is not set and no prefix is specified on the command line to the p4 admin journal command, default behavior writes the rotated to the P4ROOT directory using the default naming convention journal.NNN.
Upon completion of the checkpoint process, the offline checkpoint can be used to rebuild your primary server should the need arise.
- Do not use p4d -jc to checkpoint the offline database as doing so increments the journal counter and the next attempt to replay a primary journal will fail generating an out of sequence error.
- If creating offline checkpoints on the same system as the primary Helix Server, always specify the path to the Helix Server root directory using p4d -r P4ROOT to prevent inadvertently running checkpoint commands against your primary Helix Server database.
- Should the offline database become corrupt or out-of-sync with the primary server, reseed of the offline server from a new checkpoint of the primary server and restart the process of replaying subsequently rotated master journal data.
- The primary server journal can be rotated at arbitrary intervals, such as once a day or once an hour, depending on your particular requirements.
- When the primary journal is rotated, the primary server briefly stalls during the copy operation. This stall is often imperceptible, but might be noticeable for busy servers with large journal files. Regardless of this short stall, the journal rotation operation is markedly faster than a checkpoint operation.
- It's good practice to restart the process periodically by taking a checkpoint on the primary server, erasing the db.* files on the offline server, and restore the checkpoint onto the offline server. This ensures that the offline server is in sync with the data on the primary server.
Please contact Perforce Support if you have any questions.