Perforce Public Knowledge Base - Legacy SCM System Migrations
Perforce Software logo
Reset Search
 

 

Article

Legacy SCM System Migrations

« Go Back

Information

 
Problem

This article provides information for planning a migration from a legacy SCM system to Perforce SCM.

This article discusses migration planning and review strategies for importing your legacy SCM data. In particular, a lightweight migration strategy known as the baseline & branch import strategy (BBI) is explored in detail. The BBI strategy provides an alternative to detailed history import (DHI) strategies.

Solution

Perforce migration projects vary greatly in scale and complexity. Small, simple environments with basic migration requirements are typically migrated in a few weeks, including Perforce server setup, SCM data migration, build system changes, and training for users and administrators. Large, complex environments may perform a series of migrations that occur over the course of a year or more, as teams migrate at times convenient for them.

This article is not intended to be a replacement for an actual assessment of your environment. An assessment would focus on those factors most relevant to your environment, and produce a migration strategy tailored to your situation and requirements.

Migration Preparation

Review Your Existing Branching Strategy

Early in migration planning, it is a good idea to review the current branching strategy used in your legacy SCM system.  You may want to adjust your branching strategy to be in line with best practices, both generic and specific to Perforce.  Perforce has powerful branching and merging capabilities you can take advantage of, whether you map your existing branching strategy to Perforce or establish a new one.

Creating an initial branching strategy is a best practice when getting started in Perforce in any case.

Directory Structure Planning

With Perforce's Inter-File branching mechanism, the directory structure and branch model are related. A well-defined directory structure helps convey branch structure and software life cycle information, making it intuitive to use. Once a branching strategy is identified, it is mapped to high-level directory structure in Perforce.

Release Processes and Directory Structure Standards

The directory structure in Perforce can be thought of as low and high levels. Low levels represent your software products.   High levels of a Perforce directory structure convey branching structure, project management, and software lifecycle information.  A well-designed high level directory structure is intuitive for developers and lends itself well to project management metrics.

Migration to Perforce typically involves defining a Perforce Directory Standard (PDS) for your organization.  Creating a PDS helps you visualize what your imported software will look like in Perforce.  It encourages consistency in release processes for various software product.  It can be as flexible as needed to allow different software products to have different release processes.

For example, one software product might be a licensed software product, the release process for which would define how to maintain old releases and deliver patches.  A web-based software product operated in your own data center would follow a different release process, in which there is little need to maintain old releases, but which must support rapid updates.  Still another product might be a set of generic components that are delivered to customers and then heavily customized, perhaps by your own professional services organization.

Release processes for different software products may also vary due to the number of contributors and the degree of structure of QA processes.  Software products can follow the same release process, even though they might have very different release schedules.

The low levels of the directory structure are left untouched by the migration, to minimize the difficulty of performing the migration and minimize the impact of the migration to your environment, such as build scripts, release processes and tools, etc.

Addressing Intellectual Property Concerns

Maintaining IP provenance (i.e., knowing where your source code came from, knowing what legal rights you have to it) can be a priority in legacy SCM migration scenarios.  From the perspective of a migration process, your goal should be to ensure that IP provenance is not affected by the migration.  Your migration processes should provide a clear audit trail so that all imported files can be traced back to the original SCM repository.

SCM systems inherently store valuable intellectual property.  If sensitive information is being migrated, both the migration process and the resulting Perforce environment should ensure that access is controlled to at least the same degree as it was in the original system.

Migrations provide an opportunity review access control policies.  In some cases, ensuring strong IP protections requires extra effort, causing people to wonder if strong access controls really benefit the organization.  In other cases, migrations expose particularly weak access controls, and Perforce's powerful and flexible access control capabilities provide a straightforward means of guarding IP with relative ease.

Training

Training for Perforce users and administrators is essential to help a migration go smoothly, and to help get the most from Perforce after the migration.  With respect to scheduling, we find it most effective if training for the bulk of Perforce users occurs between a few days and a few weeks prior to the cutover to Perforce.

For information on training options available from Perforce, see: http://www.perforce.com/perforce/services/training.html.

Perforce Transition Team

We recommend establishing a transition team.  This core group may include application administrators, system administrators, and influential users.  If you need assistance, contact Perforce consulting to participate in your transition team.   The transition team defines how Perforce will be used in your organization, how it will tie into your various processes and workflows, and how it will be integrated with other systems.

For larger migration and more complex migrations, training for this team should occur early in the planning process.  This allows best practices established by the team to evolve, be documented and proliferated during the training for the larger user community, which occurs later in the migration process.

Import Strategies

There are three approaches for importing files into: starting over, detailed history import (DHI), which can be exhaustive or selective, and baseline & branch import (BBI) strategy. The following is an overview of the strengths and limitations of each import strategy.

Starting Over

This isn't really a conversion strategy.  This "get latest" approach is to get the latest file versions from your legacy SCM system and simply add them to Perforce.

 

Based on the PDS, a high level directory is identified in which the files will be stored, perhaps something like:

//Eng/Gizmo/MAIN/src

Here, "Eng" is for Engineering, Gizmo is a product name, MAIN indicates files in the main stream of development, and src is the root of the low-level directory tree.  The low-level directory tree is copied verbatim into Perforce.

The Starting Over approach is sometimes appropriate for things like documentation repositories, or repositories for shelved (but not terminated) projects.  It is rarely ideal for source code, except for prototype and demo code.

Pros:

  • It is easy.  You need only define target directories in Perforce, and then add the files.
  • It is fast.
  • It results in no undesirable metadata in Perforce.

Cons:

  • No historical information is available in Perforce.

Detailed History Import (DHI)

This is the logical extreme case of conversion.  The goal with this approach is to capture as much legacy SCM information as practical, so that comprehensive historical research can be done in the new system, with the old system being taken offline entirely.

Published, supported tools exist to import to Perforce various SCM systems.  See the The First 20 Minutes with a new conversion for details.  Conversion tools can be developed to extract legacy SCM data and create roughly equivalent Perforce data.  Or, you might consider the Baseline & Branch Import strategy, discussed below.

Because SCM systems vary in architecture, functionality, and what data they store, there are limits on exactly what "detailed history" can be imported.  At the very least, contents of file versions at each checkin, and associated metadata are preserved including userid of the submitter, checkin comments, and timestamp of checkin.  More sophisticated approaches may also capture branching and integration history.

Pros:

  • Comprehensive historical research can be done without the benefit of the old system.
  • Your old SCM system can be taken offline permanently after the migration.
  • Once in Perforce, you can use Perforce's powerful file and directory diff tools, the Revision Graph, and Time Lapse view to see your old files in a new light.

Cons:

  • Complexity of migrations translates into potential schedule and budget risks if snags are encountered.
  • Even proven detailed history import tools might not work with your data set.  This might be true if you data set is unusually large, or contains unusual patterns or instances of data corruption.  Unlike Perforce, most legacy SCM systems do not have a way to validate the integrity of versioned file contents using checksums.  Corruption of file contents, e.g. due to disk failures, can go undetected.
  • Hardware capacity planning is impacted.  Any SCM system with say 7 years of history could be expected to require more hardware (more disk space, more RAM, faster CPUs and I/O subsystems, etc.) than one with no history.  If you do a detailed import of 7 years of history, your brand new system will still have 7 years of history, and will initially require as much hardware as if it had it been in operation for 7 years.
  • Detailed history imports may require temporary allocation of powerful hardware to support the migration effort.  Import tools are not always efficient, and have hardware resource needs typically much greater than a nominal operating Perforce server would require for the same size data set.
  • This approach might be initially contemplated in an attempt to meet the objective of achieving a 100% guarantee of reproducing all old builds.  However, because an SCM system migration involves modification of build and release processes, the "100% guarantee" is not achievable in any case.  Even the exhaustive detailed import does not provide 100% guarantee of reproducibility.
  • Detailed history import tools tend to have a variety of limitations and technical caveats.  This is particularly true of complicated legacy systems like IBM Rational ClearCase, where so called "evil-twin" files or directories, or unnatural branching scenarios created by misconfigured config specs can be difficult to follow.
  • Detailed history imports generate significant Perforce metadata, potentially excessive amounts.
 

"Front Door" and "Back Door" Conversions

Conversion tools come in two flavors, "front door" and "back door".  Back door conversions generate Perforce metadata in the form of a Perforce checkpoint.  Front door conversions translate SCM data into a series of Perforce commands that can be run against a live Perforce server.

Back door conversions produce stand-alone Perforce server instances.  If you have an existing Perforce server in use and don't want to manage multiple Perforce server instances, you will need to use the Perfmerge++ to combine Perforce server instances. This is a non-trivial operation, and requires downtime for the target Perforce server instance.

Back door conversions do not lend themselves well to a phased cutover approach, where different teams cutover to Perforce at different times.  However, back door conversion tools typically operate much faster than front door tools.

Baseline & Branch Import (BBI)

The baselines & branch import (BBI) strategy provides a lightweight migration alternative that is far more sophisticated than the simple Start Over approach, yet without the technical complexity, schedule and budget risks involved in detailed history imports.

The baseline & branch import process is a generic from-anything-to-Perforce process, and has been used to migrate to Perforce from a variety of SCM systems, including IBM Rational ClearCase, Borland StarTeam, Merant PVCS, Subversion, CVS, Microsoft Visual Source Safe, AccuRev, and even unsophisticated SCM systems like a set of network drives with directories named to indicate releases.

With the BBI approach, the interesting history to be imported is described in the form of a branch diagram that shows the baselines (snapshots of a directory structure at a point in time) and major branching operations.  For example, a diagram like the following might represent a software product:

kA0F0000000Cq9XKAS_en_US_4_0

Figure 1:  Sample Baseline & Branch Import Diagram

The baselines (blue dots) indicate what interesting versions are to be imported.  Baselines represent a specific set of files, perhaps representing a software product or component, as they existed at a certain point in time.  The arrows indicate major branching operations  that is, branching operations that affect an entire branch.   In this scenario, a 2.0-Rel branch has been created, and four patches were created on that branch.   As of the time of cutover to Perforce, only two of those four patches have been merged back to MAIN.   The BBI process imports all the baselines, records the fact that the merges of two patches were completed with resulting updates to MAIN, and tracks the two unmerged patches remaining on the release branch.  Once in Perforce, Perforce can be used to complete those merges.

Importing the branching operations allows Perforce to select common ancestors when doing merge work, thus allowing Perforce to pick up where you left off with branching activities, after the cutover to Perforce.  The BBI process imports branching operations at a high level, capturing the sum of merge operations.  For example, in the diagram above, the arrow representing the merge of p2 back to MAIN would likely have occurred as a series merges carried out by several developers.  The individual file merges are not tracked, but the sum of the results of the merge (file adds, edits, and deletes) is tracked.  The imported baseline represents a point in time when the merge of p2 is considered complete.

The intent is to bring over just enough branching history to answer key questions, like
'What did Release 2.0 look like?", "Where was this file branched from?" and "What files do I need in my workspace to start maintenance work on Release 2.3?"  The BBI approach preserves file contents at key points in time, and preserves enough branching history so that cutover to Perforce can happen at any point in the release cycle, rather than just at ideal times in the schedule (which can be elusive).

After conversion, Perforce shows the history of your software product with powerful Revision Graph and Time Lapse View tools that are essentially similar to what would be shown had development occurred in Perforce to begin with.  You will know what the state of your product looked like at Release 1.0 and Release 2.0.  The hundreds or thousands of checkins between those baselines are ignored, as are the userid, date, time, and checkin comments.

At Perforce we promote using good, detailed checkin comments as a best practice to help maintain software over time.  But if you have years of checkin comments like "asdf", "aaa", or "Updated.", do you really need to import that history?

Accurate diagrams are essential for planning a BBI migration.  Ideally, release engineers can quickly draw a branch history picture that they know to be accurate for each software product to be imported.  If that is not the case, such information can be extracted by exploring the legacy SCM system.  Once the diagram is drawn and vetted by key people, it is translated into a set of Perforce commands that replay the high-level history in Perforce.  The first baseline will appear as an initial addition of the entire product directory tree.  Subsequent baselines result in Perforce changelists that show only the changes (files added, deleted, or modified).  Branching operations are translated into Perforce equivalents.  Merges done in the legacy SCM system are recorded in such a way that Perforce honors the results of the merges done in the original system.

If detailed historical research is often needed, the legacy SCM system can be kept online (perhaps with a single license).  It is a good idea to keep the legacy SCM system around for a year or two after a BBI migration.

Pros:

  • BBI is a front-door approach.  Ideally, it is nice to have the flexibility to do a multi-system migration of different teams, with each team going to Perforce on their own schedule and without impacting others.  Because the BBI approach works against a live, running Perforce server (rather than generating separate Perforce server instances like some detailed history import tools), the project planning for the various teams does not require much coordination.  Each team can migrate to Perforce without impacting those already on Perforce.
  • Interesting history is available in Perforce. Once in Perforce, you can use Perforce's powerful file and directory diff tools, the Revision Graph, and Time Lapse view to see your old files in a new light.  Unlike the detailed imports, you won't be able to tell exactly who changed what, when, and why.  But you can tell what how the software product evolved from baseline to baseline.
  • The BBI process is fairly straightforward, and has less risk of technical snags compared to DHI conversions.
  • Compared to DHI conversions, BBI lends itself well to early validation, because all the historical information can be loaded into Perforce at any point in time prior to cutover.  Then, on the day of cutover, only the baselines representing latest state of development on active branches need to be brought into Perforce, as all historical information would already be imported and verified.
  • BBI runs very quickly, so throw-away dry runs can be done in order to develop and test any source code changes that may need to be done as part of the migration, such as updates to build scripts or makefiles.
  • The amount of metadata resulting from BBI is negligible, and does not unduly impact performance or initial capacity planning.
  • You have the opportunity to normalize past history into a new Perforce Directory Standard, which indicates activities such as software releases and creation of development branches in a consistent manner.  In cases where branching strategies evolved over time with the legacy SCM system (or even over a series of legacy SCM systems), this provides a chance to simplify historical research of the imported baselines.  With the BBI approach, it can be made so that common concepts such as "software product X went to production" can be indicated the same way for each of the imported software products.

Cons:

  • In cases where files were renamed or directory structures reorganized between releases, the historical connection between the files in the old name and the new name are hard (but possible) to capture.  In other words, if a file hello.c in v1.0 of your software product was renamed greetings.c in v1.1, the fact that greetings.c used to be hello.c requires analysis of your data to detect refactoring changes (file moves/renames) in your legacy system.  Typically that historical linkage of renamed files is forgone in BBI migrations, because that detection is often deemed more work then it is worth.  However, it is possible to handle refactoring, and it is straightforward once occurrences of refactoring are detected.
  • If detailed history research must be done, the old SCM system must be kept online (in read-only mode, and perhaps with just a single concurrent license).  With the BBI approach, most teams plan to keep the old SCM system available for 1-2 years after the migration, longer in some cases.

BBI and Integrated Defect Tracking/Workflow Management Systems

Loss of detailed SCM data can have implications for integrated systems.  For example, If you have a defect tracking system integrated with your legacy SCM system, that integration would typically indicate something like "Activity 324" was fixed by the 83rd checkin between baselines v1.0 and v1.1.  If you value that level of detail the BBI approach may not be right for you.  As an alternative, you might summarize with a "roll up" technique.

For example, say that Activity 324 was fixed by the 83rd checkin between two baselines, Activity 325 and 340 were fixed by the 106th checkin, and the 124th checkin was imported as the v1.1 baseline.  You could represent the Activities as Perforce jobs, and associate them all with the changelists representing the v1.1 baseline.  This indicates that Activities 324, 325, and 340 are addressed as of baseline v1.1.  The file contents at the 83rd and 106th checkins would not be imported, but since those fixes are presumably included in the imported 124th checkin, you might have a "good enough" solution, where Activities are rolled up to the first baseline after the Activities were marked as complete.

BBI Conversion Tool

At some point, we expect to make a generic from-anything-to-Perforce BBI conversion tool and documented conversion process available in the Perforce public depot. For more details, contact Perforce Consulting.

 

Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255