What regular expression (regex) features are supported?

Support for our DiffMerge utility.

Moderator: SourceGear

Post Reply
Herters1893
Posts: 6
Joined: Mon Jan 28, 2008 11:07 pm

What regular expression (regex) features are supported?

Post by Herters1893 » Tue Jan 29, 2008 12:22 pm

I am using DiffMerge and have created a custom ruleset for "Content Handling" to try and get DiffMerge to ignore certain differences on each line being compared. My goal is to ignore the line numbers present on each line (the first 8 bytes of each line), i.e., classify these differences as "unimportant."

The screenshot labeled "default_ruleset" uses the default DiffMerge ruleset, and it comes out as I expect.

The screenshot labeled "my_ruleset" uses a custom ruleset, and it comes out different than what I expect. (I expect the first 8 bytes to be black, and then the values after each equal sign to be red.) The custom ruleset is:

start pattern=^
end pattern=\d{8}

I would expect that the start pattern of ^ would match the "start of line" and then the end pattern of \d{8} would match the eight digits in positions 1 through 8, and that after the end pattern, the highlighting of differences would continue again.

I have my DiffMerge options set to show "important" differences as red text and "unimportant" differences as black text. I think the regex I have supplied is fairly standard, as regexs go, so I'm confused about why it isn't working.

Any suggestions? Thanks.
Attachments
default_ruleset.JPG
Default_ruleset
default_ruleset.JPG (38.37 KiB) Viewed 7373 times
my_ruleset.JPG
My_ruleset
my_ruleset.JPG (39.34 KiB) Viewed 7373 times

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

There's a problem with doing this.

Post by jeffhostetler » Wed Jan 30, 2008 9:06 am

Your understanding of RegEx's is correct. But there's a problem
with doing this.

I'm using the regex to search within each line for patterns such
as quotes and /* */ sequences and then alternate the tagging of
context as "important" and "unimportant". So to allow for things
like multiple quoted strings on a single line (and sequences spanning
multiple lines) to be handled consistently, I need to (effectively)
treat the doc as one long line and alternate searches for the start
regex and the end regex.

So, putting a simple ^ as the start pattern causes the second
match to start immediately after the end of the first. (don't feel
bad, I had to track this down using the debugger.)

I should put a warning in the program that using ^ isn't going to
generate the intended result - or update the alternate searching
stuff to handle a leading ^ (like the ends-at-eol stuff).


I should also point out that even if we do get that behavior working
as expected, DiffMerge will still use the data in the unimportant
columns in matching up lines. If what you're really wanting is a way
to compare 2 source files where the lines have been renumbered, it
isn't going to vertically line up as expected. Completely ignoring
various columns is a feature that has been requested.


Sorry,
jeff

Herters1893
Posts: 6
Joined: Mon Jan 28, 2008 11:07 pm

Post by Herters1893 » Wed Jan 30, 2008 9:46 am

Thanks for the explanation Jeff.

To help me understand, does your algorithm work like what I have below or differently?

The file content is:
11111111 a = aaaaaa;\n22222222 b = bbbbbb;\n33333333 c = cccccc;\n

Search patterns are:
start=^ end=\d{8}

The numbers in the "search_sequence" image mean:
1=start and end of "start pattern"
2=start of "end pattern"
3=end of "end pattern"

Thanks again for your help.
Attachments
search_sequence.JPG
search_sequence.JPG (62.97 KiB) Viewed 7339 times

jeffhostetler
Posts: 534
Joined: Tue Jun 05, 2007 11:37 am
Location: SourceGear
Contact:

It's a little more involved.

Post by jeffhostetler » Wed Jan 30, 2008 1:48 pm

It's a little more involved. your start regex will match the "left edge"
of the 11111111 and the end regex will match the 11111111. then the
start pattern will match the "left edge" of the space following the
11111111. then the end pattern will match the 22222222 or the EOL
if you have that set. and so on.


WAIT! I just figured out how to do it!

[1] Create a "Literal" (Important) with start regex ^[^\d] no end regex
and ends-at-eol.
[2] Create a "Comment" (Unimportant) with start regex ^ and end
regex \d{8}. (end-at-eol doesn't matter, but i'd set it just in case).

make sure that [1] is first in the list in the ruleset.

this worked in my simple 3 line file as in your initial example.
it assumes that all lines in the file will begin with line numbers.
it'll silently hide any lines that don't begin with line numbers, so
be careful.

hope this helps,
jeff

Herters1893
Posts: 6
Joined: Mon Jan 28, 2008 11:07 pm

Post by Herters1893 » Thu Jan 31, 2008 9:51 am

Jeff, thanks for the solution, that does the trick.

Post Reply