Perforce Public Knowledge Base - Taking Checkpoints on Edge and Replica Servers
× PRODUCTS SOLUTIONS CUSTOMERS LEARN SUPPORT
Downloads Company Partners Careers Contact Free Trials
Menu Search
Perforce
Reset Search
 

 

Article

Taking Checkpoints on Edge and Replica Servers

« Go Back

Information

 
Problem
How do I take a checkpoint of an Edge or Replica Server?
Solution

The p4 admin checkpoint command is used to take a checkpoint of an edge or replica server. Do not use p4d -jc to checkpoint an edge/replica server as it increments the journal counter which will adversely affect the replication process. With Perforce Server 2015.1 and later, the use of p4d -jc to checkpoint an edge/replica server is prevented:

p4d -r /p4/1/root -jc
Perforce server error:
A replica may not be checkpointed directly using 'p4d -jc'.
Use 'p4 admin checkpoint' to initiate a coordinated replica checkpoint.

Using p4 admin checkpoint produces a coordinated checkpoint, one that coincides directly with journal rotation on the commit/master which simplifies the recovery process when a restore from backup is required. The p4d -jd command can be used against an edge/replica server to produce a point-in-time checkpoint of the edge/replica as it takes a checkpoint without incrementing the journal counter. Because the checkpoint taken by p4d -jd isn't coordinated with journal rotation on the commit/master, the recovery process when a restore from backup is required isn't as straight forward. Best practice for taking a checkpoint of an edge or replica server is therefore to use p4 admin checkpoint to produce a coordinated checkpoint. 


Taking a Coordinated Checkpoint

Run p4 admin checkpoint against the edge/replica:

p4 -p edge:1666 admin checkpoint
The 'pull' command will perform the checkpoint at the next rotation of the journal on the master.

This results in a message about the scheduling of the checkpoint and a file called stateCKP being written to the edge/replica server root (P4ROOT) directory containing information about the scheduling of the checkpoint:

​Checkpoint scheduled at 1472141783 (2016/08/25 09:16:23 -0700 PDT ); opts:

To cancel a scheduled checkpoint, remove the stateCKP file from the edge/replica P4ROOT prior to rotating the journal on the commit/master server. 

Run p4 admin journal against the commit/master:

p4 -p commit:1666 admin journal
​Rotating journal to journal.40...            

Note, do not use the -z flag to p4 admin journal (or p4d -jj) as rotated commit/master server journals initially need to be uncompressed as not to affect the replication process. The edge/replica server detects commit/master journal rotation as part of the metadata replication process using special Journal notes. Once journal rotation is detected, the edge/replica checks for the stateCKP file and begins the checkpoint process if the file is found. Once the checkpoint process begins, the stateCKP file is removed. 

 

Detecting Coordinated Checkpoint Completion

To determine a coordinated checkpoint has completed, record the journal counter on the commit/master at the time the edge/replica checkpoint is scheduled. For example the following sequence of commands:

p4 -p commit:1666 counter journal 40
p4 -p edge:1666 admin checkpoint
The 'pull' command will perform the checkpoint at the next rotation of the journal on the master.

report the journal counter on the commit/master server and immediately schedule a checkpoint on the edge/replica server. Based on the journal counter value from the commit/master, the next edge/replica checkpoint will be checkpoint.41 and we can use various methods to detect completion of that checkpoint.

Checkpoint Checksum

When a checkpoint completes, an md5 checksum of the checkpoint contents is written alongside the checkpoint:

$ ls -l edge1/checkpoint.41*
-r--r--r-- 1 bruno staff 11833462 Aug 25 09:59 checkpoint.41
-r--r--r-- 1 bruno staff 55 Aug 25 09:59 checkpoint.41.md5

We can simply look for the writing of the md5 checksum to signal checkpoint completion. 

Using Checkpoint History

The p4 journals command displays information from the db.ckphist table which holds historical information about checkpoint and journal activity. For example, you can report on the last checkpoint taken using:

p4 journals -F type=checkpoint -m1
... start 1472142210
... startDate 2016/08/25 09:23:30
... end 1472142211
... endDate 2016/08/25 09:23:31
... pid 53536
... type checkpoint
... flags -q true (admin checkpoint)
... jnum 40
... jfile checkpoint.40
... jdate 1472142211
... jdateDate 2016/08/25 09:23:31
... jdigest 7A5080F52EC13518305AD2A93919864A
... jsize 11833462
​... jtype text

Once a checkpoint has been scheduled and you know the checkpoint sequence number of the next edge/replica checkpoint, we can poll the edge/replica using p4 journals for the next checkpoint:

p4 journals -F 'type=checkpoint jnum=41'

The command won't return data until the checkpoint completes, at which time you'll see the details of the checkpoint completion in the p4 journals output:

p4 journals -F type='checkpoint jnum=41'
​... start 1472144358
... startDate 2016/08/25 09:59:18
... end 1472144358
... endDate 2016/08/25 09:59:18
... pid 53757
... type checkpoint
... flags -q true (admin checkpoint)
... jnum 41
... jfile checkpoint.41
... jdate 1472144358
... jdateDate 2016/08/25 09:59:18
... jdigest 22971CDC1E26C70B1E6A58C92C4820AA
... jsize 11833460
... jtype text

Note, if a checkpoint fails, the p4 journals output contains information about the failure including the error message related to the failure, for example:

​p4 journals -m1
... start 1452184543
... startDate 2016/01/07 08:35:43
... end 1452184543
... endDate 2016/01/07 08:35:43
... pid 98622
... type checkpoint
... flags  (admin checkpoint)
... jnum 41
... jfile /Volumes/backups/checkpoint.41
... jdate 1452184543
... jdateDate 2016/01/07 08:35:43
... jdigest CFF44FD4B9B26AD90F93AC71D4E47418
... jsize 65536
... jtype text
... failed 1
... errmsg write: /Volumes/backups/checkpoint.41: No space left on device

Note, the db.ckphist database table is not part of a checkpoint so if you recover the database tables from checkpoint you start a new db.ckphist and p4 journals will reflect that.

3) A journal-rotate trigger on the edge/replica

Configure a journal-rotate trigger on the edge/replica which fires when the edge/replica journal is rotated. Since journal rotation is a sign of a successful checkpoint, if the trigger fires you know the checkpoint has completed. 

Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255