Perforce Public Knowledge Base - Internationalization and Localization
× PRODUCTS SOLUTIONS CUSTOMERS LEARN SUPPORT
Downloads Company Partners Careers Contact Free Trials
Menu Search
Perforce
Reset Search
 

 

Article

Internationalization and Localization

« Go Back

Information

 
Problem

This article explains how to configure a Perforce Server to run in internationalization mode and how to configure Perforce clients to work with different character sets. This articles also discusses possible problems you might encounter when handling Unicode or non-ASCII data in Perforce, as well as remedies to these problems.

Solution
In Perforce there are several ways to work with multiple character sets depending on your requirements:
 
  • If your filenames or Perforce metadata contain non-ASCII characters, then your Perforce administrator might need to consider switching your Perforce Server into unicode mode as described below. When running in unicode mode, all non-file data (identifiers, descriptions, and so on), as well as the content of all files of type "unicode", are translated between the character set specified by the P4CHARSET variable on the client and UTF8 in the server.
    Before switching to unicode mode, verify that the character set you want to work with is supported.
  • If the goal is to just manage files that contain unicode characters, then you may not need p4d in unicode mode at all: "utf8" and "utf16" filetypes solve this very same problem of taking care of unicode files content.
  • If you need to work on unicode files that contain characters saved in the users directory, syncing/submitting such files to/from a single client machine can become a cumbersome process, as extra steps (switching between different P4CHARSETS, installing additional Code Pages and so on) are required to complete the task.  
  • Note, the unicode files can always be added as binary files. This does make diffing such files more difficult, because by default Perforce does not support diffing true binary files. However, if your binary files are true UTF8 or UTF16 files, then the default diff/merge tool in P4V correctly diffs them. In addition, P4V users can also specify a third-party diff/merge tool for such files. Likewise, command line users can force the diff using the "-t" flag.

Switching the Perforce server into unicode mode

Before you use Perforce in a unicode environment, you must first instruct your Perforce Server to run in unicode mode. To set up your server to run in this mode, stop the server, and then run this command from within your Perforce server root directory:

p4d -xi

This command verifies that all existing metadata is valid UTF8 and sets a protected unicode counter, to make sure that future invocations of p4d operate in unicode mode. Once set on the server, unicode mode cannot be deactivated (that is, you cannot return to non-unicode mode). After p4d -xi switches your server into the unicode mode, you may then invoke p4d with your usual flags.

Note:

If you're running the p4d -xi command when the server is running, you must restart your Helix server, in order for the unicode mode to be fully operational.

Important:

Should you try to switch the server to unicode mode with the p4d -xi command and the server responds with "invalid UTF8" messages:

Table db.user has 14 rows with invalid UTF8.

Table db.domain has 1 rows with invalid UTF8.
...

Perforce server error:
Database has 14 tables with non-UTF8 text and can't be switched to Unicode mode.

Take special note of the table names with invalid UTF8:  if one of the db.rev* or db.working tables are listed, you might have a file name with whose archive file or directory will need to be renamed.

To fix this problem, do the following:

  1. Stop the server to prevent updates during this process.
  2. Take a checkpoint
  3. Convert the checkpoint file to be UTF8 encoded.
    Summary:  use any editor or process to remove or convert non-UTF8 byte sequences to be a valid UTF8 byte sequence.
    On Unix, consider using iconv.  On Windows, a version of iconv is:  http://gnuwin32.sourceforge.net/packages/libiconv.htm
    For Windows users, you can also use a windows editor of your choice (e.g., notepad2) that can save in UTF8 encoding. The editor itself is not important, athough word processors should be avoided as they may introduce additional formatting.  Most windows editors have size limitations.  Notepad saves with a BOM which must be removed.
  4. Remove all db.* files.
  5. Restore from the UTF8 checkpoint file
  6. Verify
  7. Try p4d -xi again

To convert to legal UTF8, you can use any of the character set conversion tools that are available. The "iconv" tool/converter is a good choice and it's available for both, Unix and Windows OS's. Note, "iconv" might miss some german umlaut characters; use it diligently. If identifying non-UTF8 metadata becomes a bigger issue, ask support@perforce.com for tool called "jnltool.pl".

Run p4 verify immediately if you had to convert your checkpoint using  any method.  If you had db.rev* tables with invalid UTF8 then your p4 verify might show all revisions as MISSING! and the archive file or directory will need to be renamed.

User Notes

When connecting to unicode enabled Helix server, Helix clients detect and set client's Charset automatically. In very rare cases, users of P4V and other Helix client apps might be asked to choose their encoding when making a first connection to a Unicode enabled server.

Important:
Be aware that mixing different encodings and, consequently, P4CHARSET settings on the same computer is likely to cause file corruption and/or translation problems.

The following table lists a few of the most used (in the USA) P4CHARSET values:

LanguagePlatformWindows
Code page
Unix
Locale
P4CHARSET
setting
English/High-ASCIIWindows1252n/awinansi
English/High-ASCIIUNIX/Linuxn/avariesiso8859-1/utf8
English/High-ASCIIMAC OS Xn/an/autf8
All/untranslatedAlln/an/autf8*
AllAlln/an/autf16**

 

It's worth mentioning "none" as P4CHARSET value which a). overrides any existing P4CHARSET if used with "-C" switch and b). allows to connect to (non)/unicode enabled server. For the complete list of supported P4CHARSET values, run p4 help charset or visit: http://www.perforce.com/perforce/doc.current/user/i18nnotes.txt

If you need a charset other than what we support, please contact Perforce Support regarding the character set encoding you would like supported.  Until we support your charset, you must work with your unicode text files in a currently supported charset.

* utf8 is untranslated, but the file content is validated.

** utf16 requires that P4COMMANDCHARSET be set to a different (non-utf16) charset
for the p4 command line client to function, for example:

p4 -C utf16 -Q utf8 sync some_files
where "-C" is a command line flag for P4CHARSET and "-Q" is for P4COMMANDCHARSET.

Note, that P4V has a field in the Preferences dialog to reset P4CHARSET.

 

Determining if the server is unicode enabled.

If you try to connect to a unicode mode enabled server to perform most commands, the server will return an error:
$ p4 counters
Unicode server permits only unicode enabled clients.
If unicode is enabled, the output of p4 counters will include a 'unicode' counter with a value of '1'.

Example:
$p4 counters
change = 1
unicode = 1
upgrade = 21
If you do not have a P4CHARSET set, or cannot run p4 counters, you can use tagged output with p4 info. The tagged info output, gernerated by p4 -Ztag info will have a field for unicode that will be set to enabled.

Example:
$ p4 -Ztag info
[...]
... clientAddress 127.0.0.1:50936
... unicode enabled
... serverAddress localhost:9988
... serverRoot introot/
... serverDate 2010/10/21 11:36:37 -0700 PDT
... serverUptime 02:46:52
... caseHandling sensitive

 

Possible problems encountered running in unicode mode

"Cannot translate" error message

This message is displayed if your client machine is configured with a character set that does not include characters being sent to it by the Perforce Server. Your client machine cannot display unmapped characters.

For example, if your client machine is configured to use the shift-JIS character set and your depot contains files named using characters from the Japanese EUC character set that do not have mappings in shift-JIS, you see the "Cannot translate..." error message when you execute a p4 files or p4 changes command that lists those files.

Length limit for Unicode Perforce identifiers

The Perforce Server has internal limits on the lengths of strings used to index job descriptions, specify filenames, control view mappings, and identify client names, label names, and other objects.

The most common limit is 1024 bytes. Because some characters in Unicode can expand to more than one byte, it is possible for certain Unicode entries to exceed Perforce internal limits.

Because no basic Unicode character expands to more than three bytes, dividing the Perforce internal limit by three ensures that no Unicode sequence exceeds the limit.

To ensure that no Unicode sequence exceeds the Perforce limit, do not create client names or view patterns that exceed 341 Unicode characters.

Under normal usage conditions, this length limit is not expected to pose a significant limitation.

Possible problems encountered using unicode filetype with a non-unicode server

With a server not running in internationalized mode, the Perforce "unicode" filetype behaves much differently.
The client and server both assume that a file is valid UTF8 and store it as such. The server does not attempt to translate or verify the content of the file in any way. It is imperative that the files be saved using an editor that can save as UTF8 prior to submitting such files to Perforce. Outside of this requirement, users can access the Perforce server normally. There is no need to set P4CHARSET on the client.

Newlines are not correctly saved

The file was checked in UTF16 instead of UTF8 by a user. Rollback to an old revision or resave the file as UTF8.

P4V hanging when connecting your newly unicode enabled server

The Helix server was not restarted after you run the p4d -xi command when the Helix server was running.

Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255