XML Diff whitespace question

Support for our DiffMerge utility.

Moderator: SourceGear

Post Reply
AGBrown
Posts: 34
Joined: Wed Aug 10, 2005 1:42 pm
Contact:

XML Diff whitespace question

Post by AGBrown » Mon Apr 21, 2008 1:07 am

Just upgraded to 4.1.1. In the old sgdm app, I thought that my xml files were compared so that:

Code: Select all

		<add key="someKeyName" value="someValue"/>
was equal to

Code: Select all

		<add key="someKeyName"
			 value="someValue"/>
just with the whitespace highlighted.

Now, though, if I compare two differently formatted (in terms of whitespace and line breaks) config files in visual studio, I see that the first line shows that the value attribute is deleted in the second doc, the second line shows the entire line deleted on the first doc, and the value attribute through to closing bracket inserted on the second doc. After that even lines are a mess, and odd lines are half a mess.

Net result: I can no longer easily see if config files that are formatted differently in the vault history and working folder have any actual changes other than the formatting. I realise I could get the historial version from vault, format it the same as the one I want to compare with and so on, but its no longer one-click. I used to use this all the time during application deployments to check config files :(

I tried playing with a ruleset for .config files - but to no avail. Can anyone help? The desired behaviour would be more 3.5-like, where it realises that the line break and whitespace insert are not the same as a deletion/insertion on the next line.

Andy
Andy Brown
<a href="http://www.k2nenergy.com/index.htm" title="See how K2n can reduce your energy consumption and costs">K2n: Energy Monitoring Solutions</a>

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

Try this.

Post by jeffhostetler » Mon Apr 21, 2008 8:28 am

The feature you're referring to is what I call "multi-line intra-line
analysis". That is, it effectively joins lines before doing the character
highlight. This allows us to detect cases where a line break was inserted
in the middle of a line and yet match the parts with the original/unbroken
line.

But, because it can slow things down considerably, this feature is optional
in the new DiffMerge. It may be turned off on your system.

Look in the "Detail Level" page on the Options Dialog:
[1] Set "Lines and Characters" detail level
[2] and set either "Simple" or "Complete" in the Multi-Line.

This should cause it to see the parts as matching the original. That is,
the tags and attributes should be properly matched up and not colored.

BUT, it will show a (character) change for the new whitespace and EOL
chars. You can hide these by updating your Ruleset to:
[1] enable all 4 "Ignore/Strip..." settings on the "Line Handling" page,
[2] clearing the "Whitespace is Important" on the "Content Handling"
page. (See the XML Files ruleset that shipped with DiffMerge.)
[3] *AND THEN* toggle "Hide Unimportant Differences" in the main
menu.

This will hide any whitespace differences.

If you want you can add "config" to the list of suffixes in the "XML Files"
ruleset, rather than create your own. This will also get you a definition
for XML comment syntax (so that changes within them get hidden too).

let me know if you have any questions or problems.

jeff

AGBrown
Posts: 34
Joined: Wed Aug 10, 2005 1:42 pm
Contact:

Post by AGBrown » Mon Apr 21, 2008 12:09 pm

That's great, thank you, does what it says on the tin. I can see how the two are useful in different situations - whitespace between tags, or whitespace in attributes/values. It was that elusive "Simple" radio button that was selected that was the problem for me.

I read another post where you stated that xml diff is not a direction you are taking with the diffmerge tool. I can understand the choice, but given that aspx files, config files etc etc are all basically xml, and whitespace matters in some places and not in others, it might be very useful as an additional tool in the Visual Studio environment. I would vote for it if it was on a wishlist.

Andy
Andy Brown
<a href="http://www.k2nenergy.com/index.htm" title="See how K2n can reduce your energy consumption and costs">K2n: Energy Monitoring Solutions</a>

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

XML diff

Post by jeffhostetler » Mon Apr 21, 2008 1:26 pm

I read another post where you stated that xml diff is not a direction you are taking with the diffmerge tool. I can understand the choice, but given that aspx files, config files etc etc are all basically xml, and whitespace matters in some places and not in others, it might be very useful as an additional tool in the Visual Studio environment. I would vote for it if it was on a wishlist.
You can add a context to the ruleset that indicates that stuff
inside quoted strings are to be treated as string literals (important)
and so whitespace changes there will still be highlighted. (see the
C/C++ ruleset for an example.)

WRT the larger problem of XML diffing, when people talk about that
feature, what they really want is a tool that *knows* what is equivalent
XML rather than just equivalent text. For example:
<tag foo attr1="abc" attr2="def">
and
<tag foo attr2="def" attr1="abc">
are equivalent xml but textually different.

A tool that operates on an XML document as a data structure and
knows the various equivalent transformations is a completely different
program than one that operates on line-oriented text.

So, that's what I mean when I say that we're not headed in the XML
diffing direction.

sorry,
jeff

PS. I am looking to add a "spawn third-party diffing tool" feature that
would let you configure different external tools to handle file types
(such as XML or various binary file formats) that DiffMerge doesn't
handle very well.

AGBrown
Posts: 34
Joined: Wed Aug 10, 2005 1:42 pm
Contact:

Re: XML diff

Post by AGBrown » Mon Apr 21, 2008 1:30 pm

jeffhostetler wrote: You can add a context to the ruleset that indicates that stuff
inside quoted strings are to be treated as string literals (important)
and so whitespace changes there will still be highlighted. (see the
C/C++ ruleset for an example.)
Pretty cool, I like it more and more.

Thank you,

Andy
Andy Brown
<a href="http://www.k2nenergy.com/index.htm" title="See how K2n can reduce your energy consumption and costs">K2n: Energy Monitoring Solutions</a>

Post Reply