Editor entering non-english symbols

Support for our DiffMerge utility.

Moderator: SourceGear

Post Reply
dmo
Posts: 3
Joined: Sun Feb 10, 2013 2:05 am

Editor entering non-english symbols

Post by dmo » Sun Feb 10, 2013 2:19 am

Diffmerge has one old problem in the editor.
I have OS with 2 languages (English and Russian).
Diffmerge enters normally english symbols only. After switching to Russian input language DiffMerge ignores all cyrillic character symbols. Can you help me?

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Editor entering non-english symbols

Post by jeffhostetler » Mon Feb 11, 2013 10:23 am

What OS and version do you have ?
What version of DiffMerge do you have ?

thanks

deeprus
Posts: 3
Joined: Tue Feb 19, 2013 4:16 am

Re: Editor entering non-english symbols

Post by deeprus » Tue Feb 19, 2013 4:33 am

I have the same problem.
Russian UTF-8 characters are not displayed properly.

Here the screenshot - http://goo.gl/tAhzK
This screenshot shows the correct Russian characters (in other program) - http://goo.gl/vjXZA .

I have Mac OS X 10.7.5 and DiffMerge 3.3.2 (1139) [x86].

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Editor entering non-english symbols

Post by jeffhostetler » Thu Mar 07, 2013 5:31 pm

Could you check the status bar and see what character encoding DiffMerge
selected for each file? In the right-most field, it should say something like
"UTF-8(BOM)" or it may have 2 encodings with a ":" between them.

Do these files have byte-order-marks (BOM) in them ?

What are your Ruleset settings for these types of files ?
(See the Preferences dialog / Rulesets.)

Does it help if you switch from "System Local/Default Encoding"
to a specific "Named Character Encoding" ?

deeprus
Posts: 3
Joined: Tue Feb 19, 2013 4:16 am

Re: Editor entering non-english symbols

Post by deeprus » Fri Mar 08, 2013 4:06 am

jeffhostetler wrote:Could you check the status bar and see what character encoding DiffMerge
selected for each file? In the right-most field, it should say something like
"UTF-8(BOM)" or it may have 2 encodings with a ":" between them.

Do these files have byte-order-marks (BOM) in them ?
The original encoding of Russian files is UTF-8. But these files do not have BOM signature because it often causes errors on websites.
In the right-most field of status bar I can only see the text "default" without caption.

jeffhostetler wrote:What are your Ruleset settings for these types of files ?
(See the Preferences dialog / Rulesets.)
Initially, it was the "default" ruleset. I tried to change it to "UTF-8 Text Files", but nothing changes.
Setting are:
[ ] Search for Unicode BOM => NOT checked (this check box does not affect the display of the Russian text).
Fallback Character Encoding Options: (*) Use Named Character Encoding Below (CHECKED) => Unicode 8 bit (UTF-8)

jeffhostetler wrote:Does it help if you switch from "System Local/Default Encoding"
to a specific "Named Character Encoding" ?
No, switching rulesets or encoding settings do not affect the appearance of the Russian text.

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Editor entering non-english symbols

Post by jeffhostetler » Fri Mar 08, 2013 6:54 am

I'm not sure why it isn't working for you.

Could you send me a zip or tar file with those 2 files ?
Either post it here or email it to me at jeffh at sourcegear.com.

It would help if you also included the contents of the
Support Dialog (available from the About Dialog).
That will show me all of the preference settings and
info on the open files.

Thanks.

deeprus
Posts: 3
Joined: Tue Feb 19, 2013 4:16 am

Re: Editor entering non-english symbols

Post by deeprus » Sun Mar 10, 2013 3:27 am

I'm sorry, the problem goes away by itself ... Maybe it was necessary to restart the program, that I did not do after playing with the settings in the last time.

Now when I run the program it prompts me to set the rulesets for each of the compared files. At the same time the preview window at the bottom shows the wrong Russian characters (for example "\xd0\x90\xd0\xb4\xd1\x80\xd0\xb5\xd1\x81"). But after the files have been opened, everything shows correctly!

Here is contents of my "Support dialog" windows - https://www.dropbox.com/s/np6n061yarx7j ... t_info.txt

Thank you for help!

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Editor entering non-english symbols

Post by jeffhostetler » Mon Mar 11, 2013 7:22 am

... for example "\xd0\x90\xd0\xb4\xd1\x80\xd0\xb5\xd1\x81") ...
The \xd0... characters are the raw UTF-8 multi-byte sequences. That dialog
shows the first few raw bytes of the file and is asking for help to figure out
what the encoding is, so it doesn't yet know that \xd0\x90 is a Cyrillic capital A
or just 2 bytes from a random code page.

WRT it asking you for each file, that setting can be changed. Before the
"default" Ruleset was set to assume system default encoding (which is usually
Latin-1 or a code page). You currently have it set to "ask for each file" (which
gives you the most flexibility, but can be annoying). You can also set it to "use
the specific encoding named below" and then force it to UTF-8 in the bottom
combo-box. Take a look at your settings on the "UTF-8 Text Files" Ruleset.

If you wanted, you could add "ini" as a suffix to the "UTF-8 Text Files" Ruleset
and use it rather than the "default" Ruleset.

Let us know if you have any problems getting it to work for you.

dmo
Posts: 3
Joined: Sun Feb 10, 2013 2:05 am

Re: Editor entering non-english symbols

Post by dmo » Thu Mar 28, 2013 7:10 am

jeffhostetler wrote:What OS and version do you have ?
What version of DiffMerge do you have ?

thanks
Sorry, I was absent a long time.
Now, I have Windows 7 x64, but I had the problem in Windows XP also.
DiffMerge Version 3.3.2 (1139) [x64]

Codepage: WINDOWS-1251
I can use copy&paste, but it's very uncomfortably.

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Editor entering non-english symbols

Post by jeffhostetler » Thu Mar 28, 2013 9:11 am

I think you're having a different issue than "deeprus" is/was having.
I think his files were UTF-8 without BOM's.

You mentioned CP1251. Are your files CP1251 rather than UTF-8?

If you have CP-based files, look at the Options dialog and go to the
corresponding Ruleset and select the "Character Encodings" page.
Try changing the "Fallback Character Encoding Options" to
"Use Named Character Encoding Below" and at the bottom, select "CP 1251"
(the exact content and spelling of the drop-down varies by platform).
And see if that helps.

dmo
Posts: 3
Joined: Sun Feb 10, 2013 2:05 am

Re: Editor entering non-english symbols

Post by dmo » Fri Mar 29, 2013 2:02 am

jeffhostetler wrote:I think you're having a different issue than "deeprus" is/was having.
I think his files were UTF-8 without BOM's.

You mentioned CP1251. Are your files CP1251 rather than UTF-8?

If you have CP-based files, look at the Options dialog and go to the
corresponding Ruleset and select the "Character Encodings" page.
Try changing the "Fallback Character Encoding Options" to
"Use Named Character Encoding Below" and at the bottom, select "CP 1251"
(the exact content and spelling of the drop-down varies by platform).
And see if that helps.

Hi!
I have CP1251 files. Fallback Character Encoding Options is CP 1251.
I don't have any problems with the display of russian characters. I cannot enter Russian symbols. If I press key with Russian symbols then DiffMerge ignores its.
I can do video. Do you need it?

Post Reply