There are times when the metadata p4 pull thread that's replicating metadata from a commit or master server needs to read rotated journal files on the commit or master server. When rotated journals can't be located or can't be read because they are compressed, you'll see the 'open for read' error in the edge/replica log and the output of the p4 pull -lj command against the edge/replica.
How does the metadata pull thread locate rotated journal files on the commit/master server?
The metadata p4 pull thread looks by default in the P4ROOT directory on the master server and uses the default journal.N naming convention for uncompressed rotated journals where N is the sequence number of the rotated journal. If the journalPrefix configurable is set on the commit or master server and no prefix setting is configured for the metadata pull thread using p4 pull -i N -J prefix, the metadata pull thread will look in the location referenced by the commit/master journalPrefix setting. If the metadata p4 pull thread is configured with the -J option specifying a prefix, the prefix overrides the commit/master journalPrefix setting. For Helix Server releases prior to 2011.1, the journalPrefix configurable is not available so only p4 pull -i N -J prefix can be used. For additional details on configuring checkpoint and rotated journal location in a distributed Helix environment, see the following article.
Error Causes and Resolutions
Compressed Rotated Journal
Checkpoint or journal rotation commands issued on the commit/master or backup scripts compress rotated journals so the edge/replica metatdata p4 pull
thread can't read them. Uncompress the rotated journal file using
or a similar utility so the metadata p4 pull
thread can process it. In addition, modify backup procedures on the commit/master so rotated journals are not compressed. This means using the -Z
flag with checkpoint commands:
p4 admin checkpoint -Z
p4d -r /p4/1/root -J /logs/p4/1/journal -Z -jc
which creates a compressed checkpoint file but leaves the rotated journal uncompressed, and not using the -z
flag with the p4 admin journal
and p4d -jj
commands when rotating the commit or master journal. It's okay to compress rotated commit/master journals after the edge/replica has finished processing the journal transactions in that journal. Use p4 servers -J
against the commit/master:
p4 -F '%ServerID% %PersistedJournal%' -ztag servers -J
to get a report on what journal counter each edge/replica server is currently replicating. It's safe to compress rotated journals with a sequence number lower than the lowest journal counter reported by p4 servers -J
, so journal.38
and below in this example. If on a Helix Server release prior to 2014.2, p4 servers -J
isn't available so use p4 pull -lj
against each edge/replica:
p4 -F %replicaJournalNumber% -ztag pull -lj
to determine the journal counter of each edge/replica and what commit/master journals are safe to compress. Moved or Renamed Rotated Journal
Checkpoint or journal rotation commands issued on the commit/master or backup scripts move or rename rotated journals so the edge/replica metadata p4 pull
thread can't locate them.
- If the metadata p4 pull thread specifies a -J prefix option, return the rotated journal to the directory path and name used in the prefix
- If journalPrefix is set on the commit/master, return the rotated journal to the directory path and name used in the journalPrefix setting
- If neither p4 pull -J prefix or journalPrefix are set, return the rotated journal to the default P4ROOT location as journal.N
See the Configuring Checkpoint and Rotated Journal location in Distributed Helix Environments
Knowledge Base article for further details on proper configuration using the journalPrefix
server configurable or p4 pull -J
Rotated Journal no Longer Available
If rotated commit/master journals are simply no longer available or backup copies don't exist, reseeding the edge/replica server is required.
Unexpected journal number
The journal number referenced in the error above (journal.37) is obtained from the state
file located by default as
. The first number in the state file is the numbered journal file the replica last worked with. If the state file does not exist, the edge/replica journal counter is used. If the open for read error references an old journal number and you just reloaded a checkpoint into the edge/replica, delete the state file and restart the edge/replica.