UTF8 BOM on Linux after GET

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

Post Reply
tp
Posts: 1
Joined: Mon Jun 07, 2004 3:51 pm

UTF8 BOM on Linux after GET

Post by tp » Thu Jun 10, 2004 4:24 pm

Hi. I am trying to use the vault 2.0.3 command line client on Linux to GET the source files for an automated build.

For the most part, this seems to work fine. However, when I tried to compile my code after the GET, I noticed that ALL of my .cpp and .hpp files had 3 bytes prepended to them (357, 273, 277 in octal). Usually, these 3 bytes represent the UTF8 Byte-Order-Mark. This causes my compilation to fail.
Hello.cpp:1: error: stray '\357' in program
Hello.cpp:1: error: stray '\273' in program
Hello.cpp:1: error: stray '\277' in program

Note that the remainder of each file is correct.
I am using RedHat 8, mono 0.30.1 and icu 2.6.1.

Any ideas why this is happening?
Thanks.

sterwill
Posts: 256
Joined: Thu Nov 06, 2003 10:01 am
Location: SourceGear

Post by sterwill » Wed Jun 16, 2004 8:50 pm

I haven't seen that behavior on the systems I've tested Vault on (mostly Debian Woody and Sid, but also Red Hat 9). Vault doesn't have any code to insert those bytes at any time. Perhaps it's an issue with the way we use Mono's output streams.

Did you build ICU and Mono from source, or use an RPM available somewhere? Also, are the bytes appended to non-text files (like a JPEG image)?
Shaw Terwilliger
SourceGear LLC
`echo sterwill5sourcegear6com | tr 56 @.`

sterwill
Posts: 256
Joined: Thu Nov 06, 2003 10:01 am
Location: SourceGear

Post by sterwill » Fri Jun 18, 2004 8:39 am

I compiled Mono beta 3 (0.96) on both Debian Sid (updated last week) and Red Hat 9 systems with ICU 2.8. These are newer versions than were available at the time of the Vault release (and Mono HOWTO writeup), but I think it's good to track the Mono development closely with 1.0 so close on the horizon.

The results were interesting. On the Debian sytem, all files were fine. I got back exactly what was checked in (from a Windows client). On the Red Hat system, files with certain extensions had three extra bytes prepended (same as your case).

Code: Select all

-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.asc
-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.bat
-r--r--r--    1 sterwill sterwill        6 Jun 18 09:28 foo.c
-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.exe
-r--r--r--    1 sterwill sterwill        6 Jun 18 09:28 foo.h
-rw-r--r--    1 sterwill sterwill        3 Jun 18 09:28 foo.java
-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.pdf
-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.ps
-r--r--r--    1 sterwill sterwill        3 Jun 18 09:27 foo.text
-r--r--r--    1 sterwill sterwill        6 Jun 18 09:28 foo.txt
So somewhere, Mono is deciding to make streams Unicode for files that end in .c, .h, and .txt. I don't consider this a very helpful feature, but it may be documented somewhere and Vault may be using text streams when it should be using binary streams.
Shaw Terwilliger
SourceGear LLC
`echo sterwill5sourcegear6com | tr 56 @.`

Post Reply