There are times when the replica metadata pull thread needs to read rotated journal files on the master server. When rotated journals can't be located or can't be read because they are compressed, you'll see the 'open for read' error in the replica log and the output of the p4 pull -lj command against the replica.
How does the metadata pull thread locate rotated journal files on the master server?
- The metadata pull thread looks by default in the P4ROOT directory on the master server and uses the default journal.N naming convention for uncompressed rotated journals where N is the sequence number of the rotated journal.
- If the journalPrefix configurable is set on the master server (p4 -p master:1666 configure show journalPrefix) and no prefix setting (pull -J prefix) is configured for the replica metadata pull thread (p4 -p replica:1666 configure show startup.1) the metadata pull thread will look in the location referenced by the master journalPrefix setting.
- If the replica metadata pull thread is configured with the -J specifying a prefix, the prefix overrides the master journalPrefix setting.
- For Helix Server releases prior to 2011.1, the journalPrefix configurable is not available so only pull -J prefix can be used.
- For additional details on configuring checkpoint and rotated journal location in a distributed Helix environment, see the following article.
Error Causes and Resolutions
Compressed Rotated Journal
Checkpoint or journal rotation commands issued on the master or backup scripts compress rotated journals. Uncompress the rotated journal file using
or a similar utility so the replica metadata pull thread can process it. In addition, modify backup procedures on the master so rotated journals are not compressed. This means using the -Z
flag with checkpoint commands:
p4 admin checkpoint -Z
p4d -r /p4/1/root -J /logs/p4/1/journal -Z -jc
which creates a compressed checkpoint file but leaves the rotated journal uncompressed, and not using the -z
flag with the p4 admin journal
and p4d -jj
commands when rotating the master journal. It's okay to compress rotated master journals after the replica has finished processing the journal transactions in that journal. Use p4 servers -J
against the master:
p4 -F '%ServerID% %PersistedJournal%' -ztag servers -J
to get a report of which journal counter each replica of this master is currently replicating. It's safe to compress rotated journals with a sequence number lower than the lowest journal counter reported by p4 servers -J
, so journal.38
and below in this example. If on a Helix Server release prior to 2014.2, p4 servers -J
isn't available so use p4 pull -lj
against each replica:
p4 -F %replicaJournalNumber% -ztag pull -lj
to determine the journal counter of each replica and what master journals are safe to compress. Moved or Renamed Rotated Journal
Checkpoint or journal rotation commands issued on the master or backup scripts move or rename rotated journals so the replica metadata pull thread can't locate them.
- If the replica metadata pull thread specifies a -J prefix option, return the rotated journal to the directory path and name used in the prefix
- If journalPrefix is set on the master, return the rotated journal to the directory path and name used in the journalPrefix setting
- If neither pull -J prefix or journalPrefix are set, return the rotated journal to the default P4ROOT location as journal.N
See the Configuring Checkpoint and Rotated Journal location in Distributed Helix Environments
Knowledge Base article for further details on proper configuration using the journalPrefix
server configurable or p4 pull -J
Rotated Journal no Longer Available
If rotated commit/master journals are simply no longer available or backup copies don't exist, reseeding the edge/replica server is required.
Unexpected journal number
The journal number referenced in the error above (journal.37) is obtained from the state
file located by default as
. The first number in the state file is the numbered journal file the replica last worked with. If the state file does not exist, the replica used the journal counter found in the db.counters table in the replica P4ROOT. If the open for read error references an old journal number and you just reloaded a checkpoint into the replica, delete the state file and restart the replica.