Perforce Public Knowledge Base - Fixing a hung replica server
× PRODUCTS SOLUTIONS CUSTOMERS LEARN SUPPORT
Downloads Company Partners Careers Contact Free Trials
Menu Search
Perforce
Reset Search
 

 

Article

Fixing a hung replica server

« Go Back

Information

 
Problem
Replica server appears to be hung
Solution

Replica server appears to be hung

You can check the progress of the replica through

p4 pull -lj
p4 pull
p4 servers -J

Sometime the numbers on the replica do not change over many minutes.
This is often because the replica was processing a huge transaction.  The Perforce replica is smart enough to play back journals only if the transaction has completed.  Otherwise, if a transaction was only halfway done and the network connection was broken, only half the transaction would be on the replica thereby causing inconsistencies.  So the Perforce replica makes sure the whole transaction is available before it begins to plays back.  This also means the replica should be configured with a lot of memory.  It has to receive whole transactions at a time.

You can find out the progress the Perforce replica has made by looking at the number of bytes processed.

  1. From the "p4 pull -lj" command, determine the numbered journal or running journal that the "p4 pull" command is working on.  In the below example, the replica is working on journal 2817.
$ p4 pull -lj
Current replica journal state is:       Journal 2817,   Sequence 3239.
Current master journal state is:        Journal 2817,   Sequence 7899.
The statefile was last modified at:     2017/03/23 16:45:57.
The replica server time is currently:   2017/03/23 16:48:21 -0700 PDT
  1. Find the journal on the master
This could be the running journal, or it could be a numbered journal.  In this case, we would look for a journal with 2817 in its name or 2817 in the running journal.
  1. Run the split command with the -b flag on the journal in question to split the journal file at the specified number of bytes.
$ split -b 3239 journal
$

This will create a number of files, xaa, xab, xac, and so on.  Note that each file will be exactly 3239 until the remainder.
$ ls -l xa*
-rw-r--r--. 1 rfong team 3239 Mar 23 16:52 xaa
-rw-r--r--. 1 rfong team 3239 Mar 23 16:52 xab
-rw-r--r--. 1 rfong team 1421 Mar 23 16:52 xac

  1. That means xaa contains the journal entries of the last transaction processed and xab contains the journal entries corresponding to beginning of the next transaction
  1. Convert the time stamps to a human readable date and time
By looking at xaa and xab, you may be able to guess at what Perforce is working on.  Or, look at the Unix time stamps of the first entries in xab and convert the Unix time into local time and cross-reference the Perforce log.

In this case, the second line of xab is
@ex@ 30062 1490313033
which converts to
Thu 23 Mar 2017 04:50:33 PM PDT GMT-7:00 DST
  1. Look at the Perforce server log at the given time
If you see a resource intensive command at that time, a large transaction may be what is causing the replica to be delayed. 
  1. Check for large transfers of versioned files
If you don't see a huge transaction at this time, run
 
p4 pull -l
 
to see if large file transfers are taking place or if some files are not transferring well.  You can remove these transactions using "p4 pull -d" and optionally manually copy the files over to replication server.
Or a faster way to abort the large file transfers is to make note of the "p4 pull -l" entries, stop the replica, erase the rdb.lbr file (to delete all "p4 pull -l" entries), and then restart the replica.
 
Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255