Opened 10 years ago

Closed 9 years ago

#746 closed defect (wontfix)

regex replace all is overeager

Reported by: daxim Owned by: zenogantner
Priority: major Milestone:
Component: editor Version: 0.50
Keywords: Cc:


  1. Paste the following content into a new document.
authors/id/A/AB/ABIGAIL/Algorithm-Numerical-Sample-2009102701.tar.gz ... updated
authors/id/A/AB/ABIGAIL/CHECKSUMS ... updated
authors/id/A/AB/ABIGAIL/Date-Maya-2009102701.tar.gz ... updated
authors/id/A/AG/AGROLMS/LWP-Authen-Negotiate-0.07.tar.gz ... updated
authors/id/A/AG/AGROLMS/CHECKSUMS ... updated
authors/id/A/AZ/AZAWAWI/Acme-CPANAuthors-Padre-0.02.tar.gz ... updated
authors/id/A/AZ/AZAWAWI/CHECKSUMS ... updated
authors/id/B/BE/BERLE/DBIx-Class-Tree-CalculateSets-0.03.tar.gz ... updated
authors/id/B/BE/BERLE/CHECKSUMS ... updated
authors/id/B/BO/BOBTFISH/Text-Markdown-1.000028.tar.gz ... updated
authors/id/B/BO/BOBTFISH/CHECKSUMS ... updated
  1. Search → Replace
  2. Enter into »Find Text:« \s+.*$
  3. Check »Regular Expression» and »Replace All«
  4. Confirm »Replace«

Expected result

Leaves the following transformed text:


Actual result

Leaves the following transformed text:


Change History (2)

comment:1 Changed 9 years ago by zenogantner

  • Owner set to zenogantner
  • Status changed from new to accepted

I can reproduce this.

comment:2 Changed 9 years ago by zenogantner

  • Resolution set to wontfix
  • Status changed from accepted to closed

The regular expression that is searched for is (?mi-xs:\s+.*$), which means that

  1. the text is treated as a multi-line text, i.e. "$" means "just before end of the line"
  2. matching is case insensitive (not relevant here)
  3. not an extended regular expression
  4. do not treat text as single line, i.e. "." means "any character except newline"

Let's dissect the regex:
\s+ will match any sequence of whitespace characters in the text (also the newlines)
\s+.*$ will then match as above plus anything that comes along until we reach the end of a line.

So we first match

... updated

and then match

authors/id/A/AB/ABIGAIL/CHECKSUMS ... updated

and so on.

Which is correct: newline is also a whitespace character.

To get the results you want, you could use this regex instead: [ \t]+.*$

I would say this is not a bug, but rather unexpected (but correct) behavior.

I am closing this ticket, feel free to re-open and/or leave comments.

Note: See TracTickets for help on using tickets.