Perforce Public Knowledge Base - Filtering Data for Replica Servers
Perforce Software logo
Reset Search
 

 

Article

Filtering Data for Replica Servers

« Go Back

Information

 
Problem
I do not want my replica to be a full copy of all the master server's data. I only want specific data to be replicated.
Can I filter the data between the master and replica server?
Solution

Summary

The 2013.1 version of the Perforce Server introduces some additional methods of filtering data destined for a replica server.  See Distributing Perforce.

Caveats:

Please, be aware that in its current implementation files/revisions data filtering works the best only if you re-seed the replica/edge server after Perforce Administrator updates the server specification.


Detail

In addition to the existing -T option, which provides table-level exclusion filters, additional filtering options are available to address the following use cases:
 
1) A replica server is to contain a subset of the master data
2) A checkpoint dump is to contain a subset of the data
3) A journal export is to contain a subset of the data

Replica filtering adds new functionality to three commands: p4 pull, p4d -jd, and p4 export. This provides a common mechanism to describe the filtering.

The optimal mechanism for describing metadata filtering is to specify it as part of a server spec, and then to specify that server as an argument to the p4 pull, p4 export, and p4d commands.
 

Setting up Filtered Replication

Let us assume that the Perforce Replica named gabriel is a forwarding replica:
 
[perforce@replica1 bruno]$ p4 info
User name: bruno
Client name: client1
Client host: replica1
Client root: /home/perforce/bruno/client1
Current directory: /home/perforce/bruno
Peer address: 127.0.0.1:44768
Client address: 127.0.0.1
Server address: replicaClientDataFilter1:1666
Server root: /home/perforce/bruno
Server date: 2013/06/27 12:46:51 -0700 PDT
Server uptime: 00:00:06
Server version: P4D/LINUX26X86_64/2013.1/659207 (2013/06/18)
ServerID: myforward
Server license: Perforce Software Inc. 3000 users (expires 2014/01/30)
Server license-ip: 10.20.30.222
Case Handling: sensitive

[perforce@replica1 bruno]$ p4 configure show gabriel
gabriel: P4LOG = /home/perforce/bruno/repllog
gabriel: P4TARGET = master1:1666
gabriel: P4TICKETS = /home/perforce/bruno/.p4tickets
gabriel: db.replication = readonly
gabriel: lbr.replication = readonly
gabriel: monitor = 2
gabriel: rpl.forward.all = 1
gabriel: server = 3
gabriel: serviceUser = service
gabriel: startup.1 = pull -i 1
gabriel: startup.2 = pull -u -i 1
gabriel: startup.3 = pull -u -i 1

1. Add "-P serverID" to the startup.1 configurable
 
To enable the new replication filtering for a specific replica, you need to change the replica's startup.1 configurable to include the -P option and specify the replica name.  Note that this is not the -Pic, -Pxc, -Pif, nor the -Pxf options also available for filtering. 
For example:
 
[perforce@replica1 bruno]$ p4 configure set "gabriel#startup.1=pull -P myforward -i 1"
For server 'gabriel', configuration variable 'startup.1' set to 'pull -P myforward -i 1'

The startup.1 configurable now shows as
 
gabriel: startup.1 = pull -P myforward -i 1

but note that "p4 monitor show" does not yet reflect this change until a replication server restart.

Under the covers, the journal records produced for this change will look similar to
@rv@ 1 @db.config@ @gabriel@ @startup.1@ @pull -P myforward -i 1@
@rv@ 1 @db.config@ @gabriel@ @configurationVersion@ @69@

 
2. Run p4 server serverid  and RevisionDataFilter and/or ClientDataFilter
 
The new spec options are ClientDataFilter: and RevisionDataFilter: described in p4 server.

RevisionDataFilter

In this example, we specifically allow //depot/... but we specifically disallow //depot/main/release4/perl_proj/... 

Make sure you run p4 server on the ServerID and not the server name. 
[perforce@replica1 perl_proj]$ p4 serverid
Server ID: myforward
 
[perforce@replica1 bruno]$ p4 server myforward
ServerID:       myforward
Type:   server
Name:   gabriel
Address:        tcp:replica1:1666
Services:       forwarding-replica
Description:
        Forwarding replica pointing to master1:1666
RevisionDataFilter:
        //depot/...
        -//depot/main/release4/perl_proj/...

[perforce@replica1 perl_proj]$ p4 admin restart


On the master, we add a file to release4

[bruno@master1 perl_proj]$ echo "release4 file" > release4.txt
[bruno@master1 perl_proj]$ p4 add release4.txt
//depot/main/release4/perl_proj/release4.txt#1 - opened for add
[bruno@master1 perl_proj]$ p4 submit -d "adding release4.txt"
Submitting change 24508.
Locking 1 files ...
add //depot/main/release4/perl_proj/release4.txt#1
Change 24508 submitted.

On the replica, we see the changelist, but it does not contain the file because the file is filtered out.

[perforce@replica1 perl_proj]$ p4 changes -m 1
Change 24508 on 2013/06/27 by bruno@Bruno_Perl 'adding release4.txt'

Change 24508 by bruno@Bruno_Perl on 2013/06/27 14:34:58

        adding release4.txt

Affected files ...

Differences ...

The file release4.txt is not on the replica server although it is on the master.

[perforce@replica1 perl_proj]$ p4 files //depot/main/release4/perl_proj/release4.txt
//depot/main/release4/perl_proj/release4.txt - no such file(s).

[perforce@replica1 perl_proj]$ p4 fstat //depot/main/release4/perl_proj/release4.txt
//depot/main/release4/perl_proj/release4.txt - no such file(s).

On the master, just for sanity checking, we add a file that is not in release 4, and therefore will not be filtered.

[bruno@master1 perl_proj]$ echo "release1 file" > release1.txt

[bruno@master1 perl_proj]$ p4 add release1.txt

//depot/main/release1/perl_proj/release1.txt#1 - opened for add
[bruno@master1 perl_proj]$ p4 submit -d "adding release1.txt"
Submitting change 24509.
Locking 1 files ...
add //depot/main/release1/perl_proj/release1.txt#1
Change 24509 submitted.

On the replica, the file appears normally as expected as an Affected file.

[perforce@replica1 perl_proj]$ p4 changes -m 1
Change 24509 on 2013/06/27 by bruno@Bruno_Perl 'adding release1.txt'

[perforce@replica1 perl_proj]$ p4 describe 24509
Change 24509 by bruno@Bruno_Perl on 2013/06/27 14:38:04

        adding release1.txt

Affected files ...

... //depot/main/release1/perl_proj/release1.txt#1 add

Differences ...

[perforce@replica1 perl_proj]$ p4 sync //depot/main/release1/perl_proj/release1.txt
//depot/main/release1/perl_proj/release1.txt#1 - added as /home/perforce/bruno/myworkspace/main/release1/perl_proj/release1.txt

Under the covers, the corresponding journal record is
@rv@ 0 @db.svrview@ @myforward@ 1 0 0 @//depot/...@
@rv@ 0 @db.svrview@ @myforward@ 1 1 1 @//depot/main/release4/perl_proj/...@

Values are explained in the Perforce schema.



ClientDataFilter

Suppose you do not want any more have data from client workspace name myworkspace sent to the replica because the client is not relevant to the replica's remote location.  Add the ClientDataFilter line to exclude this workspace.
 
[perforce@replica1 perl_proj]$ p4 info | grep "Client name"
Client name: myworkspace
 
[perforce@replica1 perl_proj]$ p4 server myforward

ServerID:       myforward
Type:   server
Name:   gabriel
Address:        tcp:replica1:1666
Services:       forwarding-replica
Description:
        Forwarding replica pointing to master1:1666
ClientDataFilter:
        -//myworkspace/...

[perforce@replica1 perl_proj]$ p4 admin restart

The sync works

[perforce@replica1 perl_proj]$ p4 sync -f //depot/main/release1/perl_proj/test2
//depot/main/release1/perl_proj/test2#1 - refreshing /home/perforce/bruno/myworkspace/main/release1/perl_proj/test2

But notice the replica does not know the file has been synced.

[perforce@replica1 perl_proj]$ p4 have //depot/main/release1/perl_proj/test2
//depot/main/release1/perl_proj/test2 - file(s) not on client.

The master does know this information; it just was not replicated

[bruno@master1 perl_proj]$ p4 -c myworkspace -H replica1 have //...
//depot/main/release1/perl_proj/fromrep.txt#1 - /home/perforce/bruno/myworkspace/main/release1/perl_proj/fromrep.txt
//depot/main/release1/perl_proj/test2#1 - /home/perforce/bruno/myworkspace/main/release1/perl_proj/test2
//depot/main/release1/perl_proj/utf8.txt#1 - /home/perforce/bruno/myworkspace/main/release1/perl_proj/utf8.txt


ArchiveDataFilter (requires Perforce 2013.2)

Suppose the master has large versioned files that are not relavant to the remote replica location.  You can exclude replicating the versioned files until the files are specifically requested.

Example:  Do not transfer *.c files to the replica unless specifically requested.
 

A. On the server, change the server specification to not transfer .c versioned files in the depot named depot.
 
$ p4 server -o myforward
# A Perforce Server Specification.                   
<snip>
ArchiveDataFilter:
        -//depot/....c

B. On the replica, restart the replica server
 
$ p4 admin restart

C.  On the master, make changes to file networker.c at changelist 24833.
 
$ p4 edit networker.c
//depot/main/release1/perl_proj/networker.c#5 - opened for edit
$ vi networker.c
$ p4 submit -d "test of archive filter"
Submitting change 24833.
Locking 1 files ...
edit //depot/main/release1/perl_proj/networker.c#6
Change 24833 submitted.
$

D.  On the replica, note that versioned files transfer is up to date.
$ p4 pull -l
$

E.  On the replica, the versioned file, networker.c,v, has not been updated.  Note the head revision is at changelist 24832 and not the latest changelist 24833.
 
$ head networker.c,v
head     1.24832;
access   ;
symbols  ;
locks    ;comment  @@;


1.24832
date     2013.12.09.12.46.17;  author p4;  state Exp;
branches ;
next     1.24831;

F.  On the replica, get the latest revision of networker.c
 
$ p4 sync networker.c
//depot/main/release1/perl_proj/networker.c#6 - updating /home/perforce/rfong/centclient/main/release1/perl_proj/networker.c

 G. On the replica, only after a sync is the versioned file of networker.c updated.  Now the versioned file on the replica shows changelist 24833.
 
$ head networker.c,v
head     1.24833;
access   ;
symbols  ;
locks    ;comment  @@;

1.24833
date     2013.12.09.13.06.31;  author p4;  state Exp;
branches ;
next     1.24832;


Filtered Replication Will Show Versions Before Filtering

Here below is an example of an edit on the master server to a filtered branch.  The filtering was put into place after revision 13 of file gx10 was created.  This means revision 13 will be on the replica server, but revision 14 and beyond will only be on the master server.  If RevisionDataFilter is enabled, users may be confused if the version is on the master but not on the replica.


For example, the master makes a change to a //depot/branch2/gx10

$ p4 edit //depot/branch2/gx10
//depot/branch2/gx10#13 - opened for edit
... //depot/branch2/gx10 - also opened by giles@ws10200
... //depot/branch2/gx10 - also opened by giles@ws12100

$ p4 submit -d edit
Submitting change 2170.
Locking 1 files ...
edit //depot/branch2/gx10#14
Change 2170 submitted.

Because the branch is filtered, the user on the replica affected by this will not see this latest (filtered) change:
 
$ p4 fstat //depot/branch2/gx10
... depotFile //depot/branch2/gx10
... clientFile /home/giles/workspaces/filtered-client/depot/branch2/gx10
... isMapped
... headAction edit
... headType binary
... headTime 1358168377
... headRev 13               <--- note: previous revision (filtered)
... headChange 2169          <--- note: previous change   (filtered)
... headModTime 1358168158
... haveRev 13
... ... otherOpen0 giles@ws10200
... ... otherAction0 edit
... ... otherChange0 2009
... ... otherOpen1 giles@ws12100
... ... otherAction1 edit
... ... otherChange1 2165
... ... otherOpen 2

To the master, it will appear something like this:
 
p4 fstat //depot/branch2/gx10
... depotFile //depot/branch2/gx10
... headAction edit
... headType binary
... headTime 1358169395
... headRev 14
... headChange 2170
... headModTime 1358168158
... ... otherOpen0 giles@ws10200
... ... otherAction0 edit
... ... otherChange0 2009
... ... otherOpen1 giles@ws12100
... ... otherAction1 edit
... ... otherChange1 2165
... ... otherOpen 2

Additionally, we can see from checkpoints of the master and the replica that the data has in fact been filtered for change 2170:
 
grep gx10 replica.ckp.170 | grep 2170
No data returned

grep gx10 master.ckp.170 | grep 2170

@pv@ 9 @db.rev@ @//depot/branch2/gx10@ 14 65539 1 2170 1358169395 1358168158 E77BBA67A14ABA34C3E2B1FC573F3873 48 0 0 @//depot/branch2/gx10@ @1.2170@ 65539
@pv@ 0 @db.revcx@ 2170 @//depot/branch2/gx10@ 14 1
@pv@ 9 @db.revhx@ @//depot/branch2/gx10@ 14 65539 1 2170 1358169395 1358168158 E77BBA67A14ABA34C3E2B1FC573F3873 48 0 0 @//depot/branch2/gx10@ @1.2170@ 65539

The second of the three numeric fields in the db.svrview journal records is a sequence number to keep the rows ordered. If the server's filter has multiple lines, the first line is seq=0, the second line is seq=1, the third line is seq=2, etc.

Note: Meta data already present in the server will be used as the basis for 'p4 verify -t'; the filter specification will not stop the verify command pulling archive files referenced in the replica's meta data.
Seeding a filtered replica using a filtered dump file will alleviate this issue.

 

Filtered Replication Will Still Allow Retrieval of Unfiltered Files

Even if filtered replication is in place, unfiltered files can still be synced.  This is because the read-write command will go directly to the master.

For example:
  1. Here we create a file on the master that will be filtered out
[master perl_proj]$ echo "filterrep1" > filterrep1.txt
[master perl_proj]$ p4 add filterrep1.txt             
//depot/main/release2/perl_proj/filterrep1.txt#1 - opened for add
[master perl_proj]$ p4 submit -d "adding filterrep1.txt"
Submitting change 41593.                                           
Locking 1 files ...                                                
add //depot/main/release2/perl_proj/filterrep1.txt#1               
Change 41593 submitted. 
         
                                  
  1. The file in directory release2 is not replicated
Only files in release1 are replicated.
 
[master perl_proj]$ p4 server -o myforward
# A Perforce Server Specification.                   
<snip>
ServerID:       myforward
Type:   server
Name:   gabriel
Services:       forwarding-replica
Description:
        Filtered forwarding replica pointing to master:20152
RevisionDataFilter:
        -//depot/...
        //depot/main/release1/perl_proj/...
  1. On the master, the new file exists
[master perl_proj]$ p4 fstat -Oc //depot/main/release2/perl_proj/filterrep1.txt
... depotFile //depot/main/release2/perl_proj/filterrep1.txt
... clientFile /home/perforce/p4work/20152/client/main/release2/perl_proj/filterrep1.txt
... isMapped
... headAction add
... headType text
... headTime 1478301002
... headRev 1
... headChange 41593
... headModTime 1478300981
... haveRev 1
... lbrFile //depot/main/release2/perl_proj/filterrep1.txt
... lbrRev 1.41593
... lbrType text
... lbrIsLazy 0

[master perl_proj]$ p4 files //depot/main/release2/perl_proj/filterrep1.txt
//depot/main/release2/perl_proj/filterrep1.txt#1 - add change 41593 (text)
  1. On the master the versioned file exists
[master perl_proj]$ p4 info | grep -i "Server root"
Server root: /home/perforce/p4work/20152

[master perl_proj]$ ls /home/perforce/p4work/20152/depot/main/release2/perl_proj/filterrep1.txt,v
/home/rfong/p4work/20152/depot/main/release2/perl_proj/filterrep1.txt,v
[master perl_proj]$
  1. Switching to the replica, the replica is set to filter out release2 files.

[replica perl_proj]$ p4 configure show | grep pull
startup.1=pull -P myforward -i 1 (configure)
startup.2=pull -u -i 1 (configure)
startup.3=pull -u -i 1 (configure)

[replica perl_proj]$ p4 server -o myforward
<snip>
ServerID:       myforward
Type:   server
Name:   gabriel
Services:       forwarding-replica
Description:
        Filtered forwarding replica pointing to linux-perforce:20152
RevisionDataFilter:
        -//depot/...
        //depot/main/release1/perl_proj/...

  1. The replica cannot see the release2 files as expected.

[replica perl_proj]$ p4 files //depot/main/release2/perl_proj/filterrep1.txt
//depot/main/release2/perl_proj/filterrep1.txt - no such file(s).

[replica perl_proj]$ p4 fstat -Oc //depot/main/release2/perl_proj/filterrep1.txt
//depot/main/release2/perl_proj/filterrep1.txt - no such file(s).

  1. The versioned file is not on the replica as expected.

[replica perl_proj]$ p4 info | grep -i "Server root"
Server root: /home/perforce/replica

[replica perl_proj]$ ls /home/perforce/p4work/20152/depot/main/release2/perl_proj/filterrep1.txt,v
ls: cannot access /home/perforce/p4work/20152/depot/main/release2/perl_proj/filterrep1.txt,v: No such file or directory

  1. Yet the file can still be synced

[replica perl_proj]$ p4 sync //depot/main/release2/perl_proj/filterrep1.txt
//depot/main/release2/perl_proj/filterrep1.txt#1 - added as /home/perforce/replica/replclient/main/release2/perl_proj/filterrep1.txt

This is because the read/write command is being retrieved from the master.  To prevent this file from retrieval, the "p4 protect" command must specifically exclude these files.


Filtering a Dump File


The server spec is used to determine what data is filtered when creating a dump file; for instance:
 
$ p4d-2013.1 -r. -P Replica13FR -jd filtered-dump


And the difference:
 
$ grep branch2 unfiltered-dump | wc -l
40498

$ grep branch2 filtered-dump | wc -l
217

$ p4 -p popeye:13288 files //depot/branch2/...
//depot/branch2/... - no such file(s).
 

Filtering 'p4 export' Output

If we export all of the records for checkpoint 170 we find 54243 records with 'branch2' in them:
 
$ p4 export -c 170 | grep branch2 | wc -l
54243

Using our well-known filter definition to exclude '//depot/branch2/...' records, this is reduced to:
 
$ p4 export -c 170 -P Replica13FR | grep branch2 | wc -l
273

We can also specify the actual filter on the command line:
 
$ p4 export -c 170 -Pxf://depot/branch2/... | grep branch2 | wc -l
273

The full list of options is as follows:
 
-Pic://client/pattern  -- client records to include
-Pxc://client/pattern  -- client records to exclude
-Pif://depot/pattern   -- depot records to include
-Pxf://depot/pattern   -- depot records to exclude


More information can be found using p4 help export.

 

Filtering and Performance

-P filter interpretation requires the master to parse journal records. Unlike the pre-2013.1 -T journal record filtering, which works with the un-parsed journal record data (it only needs to know the table name and the raw journal record), the 2013.1 -P filters need to interpret the columns in the journal record.

This means that with release 2013.1, if the p4 pull and p4 export commands include -P filters, the master will be doing more processing of journal records than it did in prior releases.
Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255