Perforce Public Knowledge Base - Unicode File Type Handling Changes In 2009.2
Reset Search
 

 

Article

Unicode File Type Handling Changes In 2009.2

« Go Back

Information

 
Problem

After upgrading the server to 2009.2, I can no longer submit changes to files with the file type "unicode". However, my files with the file type "utf16" are not affected.

Solution

Perforce server release 2009.2 and later includes a bug fix that prevents users from submitting files with the "unicode" file type on Perforce Servers that are not unicode-enabled:

#217426 (Bug #13033) **
	'p4 submit' will disallow submitting files with a
	unicode file type using a non-unicode server.
The "unicode" file type is intended to be used with unicode-enabled Perforce Servers. Non-unicode Servers do not support "unicode" files, however this behavior was not enforced until the bug fix was implemented. Prior to this, servers not running in unicode mode would allow files to be submitted as "unicode", but they were treated as file type "text" and only the line-endings were translated.

Overriding the automatic file type detection and setting the file type to "unicode" can sometimes cause files to become corrupted as a result of the line-ending conversion. For example, files encoded in UTF-16, which Microsoft commonly calls "unicode", can be corrupted as converting the line-endings could break the file encoding.

Note: the automatic file type detection only happens when a file is opened for add.

After upgrading to 2009.2 and later, some users have encountered problems as a result of this bug-fix: new revisions of existing "unicode" files can no longer be submitted.

To fix the problem, change the file type of "unicode" files to the "text" filetype using the unsupported command, p4 retype. More information about this command can be found by running p4 help retype.

For example, to find all files with type "unicode"

  • in a Unix environment
p4 files -a //... | grep "\(.*unicode\)"
  • in a Windows environment:
p4 files -a //... | findstr "\(.*unicode\)"

Then, to change the file type to "text" you can use the command:

p4 retype -t text //depot/path/to/file

Important: you will need to include any file type modifiers during the retype. For example, if the file type was "unicode+w" you should retype it to "text+w".

Additionally, when you run p4 fstat -Oc (undocumented) against the file, the archive type of the file (or the librarian type of the file) will not be changed from Unicode to text because in a non-Unicode enabled server, the storage type Unicode (such as utf8 which is backward-compatible with ASCII) is equivalent to text.

$ p4 retype -l -t text Makefile
$ p4 fstat -Oc Makefile
... depotFile //depot/test/Makefile
... clientFile /Users/clients_20082/client_ws/test/Makefile
... isMapped 
... headAction add
... headType text
... headTime 1280176969
... headRev 1
... headChange 858
... headModTime 1280176928
... haveRev 1
... lbrFile //depot/test/Makefile
... lbrRev 1.858
... lbrType unicode

 

Files already opened for edit or add

Users with files they have already marked for editing or add will need to use the reopen command to change the file type as needed. For example:
p4 reopen -t text //depot/path/to/file

What about the "utf16" file type?

This file type does not require the Perforce Server to be unicode enabled, and is not affected by this change to the Perforce Server. If Perforce detects a UTF-16 BOM at the start of a file when it is opened for add, Perforce will automatically set the file type to "utf16".

What to do with "unicode" type files in the future?

In most cases, Perforce's automatic file type detection will correctly identify whether the file is safe to be submitted as "text", "utf16" or "binary" when the file is opened for add. Overriding this should only be done if you are certain that Perforce has incorrectly detected the file type. If your typemap uses the "unicode" file type, you will need to update it to prevent assigning this file type.

Non-unicode Servers

In 2001.2, internationalization of the Perforce Server introduced along with the P4CHARSET environment variable for translating unicode content and the unicode filetype.  The P4CHARSET variable is only available for use on unicode-enabled servers.  Unicode file content is translated and validated via the P4CHARSET value to UTF-8 for storage on the Server.   Prior to 2009.2 for non-unicode servers, 'unicode' type files were not translated or verified. Instead, the UTF-8 data is converted to UTF-16 using the byte order appropriate to the client platform.

As non-unicode enabled servers could never configure the translation of the unicode content via P4CHARSET, the ability to add unicode was disabled. The behavior change in 2009.2 to disable adding 'unicode' files to non-unicode server was made to reflect the inability of such servers to configure the translation of unicode content into standard supported charsets.

Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255