Perforce Public Knowledge Base - Improving diff results when comparing files with more than 50,000 lines
Perforce Software logo
Reset Search
 

 

Article

Improving diff results when comparing files with more than 50,000 lines

« Go Back

Information

 
Problem

When comparing large files, P4Merge sees matching lines as different, potentially requiring more manual resolves than should be necessary.

Solution

The diff algorithm used by all Perforce diffing and merging tools uses an algorithm to detect matching lines between two files.

Counterintuitively, it will work harder to find matching lines between smaller files than it will for large files.  The reason for this has to do with performance.  Comparing two large files, particularly if they have few matching lines, will cause the algorithm to work very hard and consume more resources and potentially result in a crash or hung process. As processing power continues to improve, this performance problem becomes less and less likely.

To manage the potential performance problem, there are a few undocumented configurables (p4 help undoc, p4 configure show undoc) available.  Here are the default settings.

    diff.slimit1           10M Longest diff snake; smaller is faster
    diff.slimit2          100M Longest diff snake for smaller files
    diff.sthresh           50K Use slimit2 if lines to diff < sthresh
 

diff.sthresh is the average number of lines between the two files being compared.  Notice that as long as the average number of lines is less than 50,000, it will use diff.slimit2, which causes it to consider more data (work harder) when looking for matching lines.  When the average number of lines is equal to or greater than 50,000, diff.slimit1 is used, causing the algorithm to consider less data (work less hard) to protect against resource exhaustion.

What if you are comparing two files larger than diff.sthresh and the output seems to find differences when it should find matching lines?

The first thing you can do is increase diff.sthresh to something greater than the average number of lines between the two files to cause the algorithm to continue to use diff.slimit2 when comparing larger files.  If the results are still not satisfactory, try increasing diff.slimit2 to cause the algorithm to work harder to find matching lines.

With P4Merge, you can set different limits in a P4ENVIRO file. This requires P4V/P4Merge 2015.1 or later.
  1. Set the P4ENVIRO environment variable
On windows platforms set P4ENVIRO as a Windows environment variable (Advanced settings->Environment Variables)
 
  P4ENVIRO=%USERPROFILE%\p4enviro.txt

 On Linux
 
  export P4ENVIRO=~/.p4enviro

  1. If it does not exist already, create the file specified by P4ENVIRO (%USERPROFILE%\p4enviro.txt or ~/.p4enviro as appropriate) containing a line like the following:
  diff.sthresh=100000

While in this example it is set to 100,000, the actual value you choose should just be greater than the average number of lines in the 2 files being compared.

p4 diff supports these configurables through a P4CONFIG file.

To adjust the settings on the server, use p4 configure set.  Note that increasing diff.sthresh makes it less likely that diff.slimit1 will be used at all.  If you also increase diff.slimit2, there is no prescribed setting for diff.slimit1, but maintaining it at 10% of diff.slimit2 is sensible.

   p4 configure set diff.sthresh=100000
   p4 configure set diff.slimit2=200000000
   p4 configure set diff.slimit1=20000000


Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255