Ignoring whitespace causes false positives on text

Support for our DiffMerge utility.

Moderator: SourceGear

Post Reply
kurtis.miller
Posts: 5
Joined: Fri Feb 20, 2009 3:06 pm

Ignoring whitespace causes false positives on text

Post by kurtis.miller » Tue Sep 20, 2011 8:58 am

I'm comparing two Javascript files. I've added the js file type to the Java ruleset. Whitespace is set as not important, and unimportant differences are hidden.

Example 1:
(function( $, undefined ){
(function ($, undefined) {
Basically, in both cases, the space and paren are transposed. As you can see, it highlights the parens as being differences. If I insert a space after the first paren, or remove the existing space before it, the difference goes away. Same thing with the second paren - inserting a space before it or removing the one after it also removes the difference.

Another example, slight variation on the above theme:
var data = $.data( obj ),
var data = $.data(obj),
Here, inserting a space either before or after "obj" removes the difference highlight.

(I was running 3.3.0 when I noticed this; 3.3.1 exhibits the same behavior.)

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Ignoring whitespace causes false positives on text

Post by jeffhostetler » Tue Sep 20, 2011 11:10 am

There are a couple of things here. Marking whitespace is unimportant
doesn't make it go away, rather it just turns down the volume for it.
I think what you're wanting (and what I'd like to have) is a full-blown
ignore whitespace in addition to the existing importance marking.

As it is now, when we see the "( " vs " (", we have to sync-up on either
the '(' or the space. Then the other char looks like an Insert on one side
and a Delete on the other side or vice versa. If we sync up on the '(' life
is good and the hide-unimportant stuff kicks in; if we happen to sync up
on the space, then the LParen stands out as an Insert and Delete (next
to an undecorated space). I think you're seeing the later case. The
mechanism for choosing which char to sync-up on depends on the surrounding
context. We give preference to longer runs of text, so you might see different
sync-up's depending on the amount of text on the line before and after the
"( ", for example.

Sorry.

FWIW, there's one more thing you might look at: On the "Detail Level"
page of the Options Dialog, I set the "Intra-Line Smoothing" to 3. This
causes little spans (of 1, 2, or 3 chars) of *equal* text between 2 changes
to be included in the change and the whole span to be combined into 1
large change. In your second example, you might see "( obj )" as one 7
char change rather than 2 little changes on both sides of an equal "obj".
You can turn this down if you want (0 disables it). That might help a little,
but the real problem (missing feature) that I described earlier is still going
to get in your way.

jeff

kurtis.miller
Posts: 5
Joined: Fri Feb 20, 2009 3:06 pm

Re: Ignoring whitespace causes false positives on text

Post by kurtis.miller » Tue Sep 20, 2011 6:16 pm

I get what you're saying. You can't fully ignore whitespace, because "words" are still important. For example, you don't want to identify "int eger" and "integer" as identical.

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Ignoring whitespace causes false positives on text

Post by jeffhostetler » Wed Sep 21, 2011 7:49 am

Exactly.

There are times when you want to be able to say "treat n spaces/tabs as being equivalent to 1 space"
versus "treat n spaces/tabs as being equivalent to 0 spaces". And its not clear that one is always
more appropriate than the other. And sometimes the answer is context dependent (such as code vs
string literals vs comments or in the case of Python leading- vs non-leading spaces).

What we currently have is a way of recognizing that the spaces are there, but unimportant.

hope this helps,
jeff

jvdh
Posts: 1
Joined: Thu Dec 01, 2011 7:01 am

Re: Ignoring whitespace causes false positives on text

Post by jvdh » Thu Dec 01, 2011 7:08 am

hi,
having similar problems...
I understand that there is no easy general solution but what about optionally (e.g. as part of or in addtion to the current whitespace "ignoring") ignoring repetitions of white space vs. a single white space? I'm having this problem continuously when comparing simple text files (latex, asciidoc, etc) where such repetitions (which are irrelevant for the formatted output of latex and asciidoc, for instance) creep in easily and produce loads of spurious differences which I'm seemingly not able to get rid off with the available whitespace options (or am I missing something?).

best regards (and thanks for a great tool)

joerg

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Ignoring whitespace causes false positives on text

Post by jeffhostetler » Fri Dec 02, 2011 5:14 pm

It's hard to say without an example. Assuming you've set up a Ruleset for Latex
and have marked whitespace as unimportant in the default context and then turn
on "hide unimportant" you should get most of the n-vs-1 space differences to
disappear -- assuming things the non-blank chars on the line can be matched up
reasonably. Also be sure that mult-line-intra-line matching is turned on (which
lets the intra-line sniffing span more than 1 line).

I know that there are a lot of knobs to turn (sorry) and that it can be a bit confusing
(sorry again).

If you want to post a small tarball/zipfile with a pair of files, maybe I can take
a look and make some additional suggestions.

jeff

DanNeely
Posts: 4
Joined: Thu May 10, 2012 8:45 am

Re: Ignoring whitespace causes false positives on text

Post by DanNeely » Thu May 10, 2012 8:52 am

I'm having this sort of false positive on lines that have whitespace and {}'s and are switched from tabs to spaces (or back), the whitespace is disregarded as expected; but the braces are marked as changes.

I have whitespace is important unchecked and treat tabs as spaces checked.
Attachments
tabs.txt
cs file with changed extension
(93 Bytes) Downloaded 2139 times
spaces.txt
cs file with changed extension
(132 Bytes) Downloaded 2087 times

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Re: Ignoring whitespace causes false positives on text

Post by jeffhostetler » Fri May 11, 2012 6:56 am

It took a little tinkering but I was able to reproduce what you're seeing.
I've logged this.

I do have a work around that should make the issue go away.
On the "Detail Level" page of the Options/Preferences Dialog,
select "Simple" rather than "Complex" on the second set of
radio buttons.

(This controls how hard it looks across line-breaks to see if
a change is the result of an actual change in the content versus
a change in how the line was broken up. For example, a function
call where all of the args are on one line versus the same call but
with each arg on a separate line. I'll have to dig a little deeper to
see why this made a difference in what you're seeing.)

When set to "Simple" there are no "Important" changes shown.

jeff

W8584

DanNeely
Posts: 4
Joined: Thu May 10, 2012 8:45 am

Re: Ignoring whitespace causes false positives on text

Post by DanNeely » Fri May 11, 2012 8:36 am

Thanks. That's usable as a workaround on the affected files until a patch to fix it's available; but having found other cases where the complete option gave better results I'm reluctant to change my defaults.

Post Reply