re.DOTALL

Discussion:

re.DOTALL

(too old to reply)

Stefan Ram

2024-07-17 18:09:51 UTC

Below, I use [\s\S] to match each and every character.
I can't seem to get the same effect using "re.DOTALL"!

Yet the Python Library Reference says,

|(Dot.) In the default mode, this matches any character except
|a newline. If the DOTALL flag has been specified, this
|matches any character including a newline.
what the Python Library Reference says.

main.py

import re

text = '''
alpha
<hr>
gamma
<hr>
epsilon
'''[ 1: -1 ]

pattern = r'^.*?\n<hr.*?\n(.*)\n<hr.*$'

output = re.sub( pattern.replace( r'.', r'[\s\S]' ), r'\1', text )
print( output )

print( '--' )

output = re.sub( pattern, r'\1', text, re.DOTALL )
print( output )

stdout

gamma
--
alpha
<hr>
gamma
<hr>
epsilon

Stefan Ram

2024-07-17 18:21:26 UTC

Permalink

Post by Stefan Ram
I can't seem to get the same effect using "re.DOTALL"!

PS: But (?s) works.

Lawrence D'Oliveiro

2024-07-17 23:54:25 UTC

Permalink

Post by Stefan Ram
Below, I use [\s\S] to match each and every character.
I can't seem to get the same effect using "re.DOTALL"!

This might help clarify things:

text = "alpha\n<hr>\ngamma\n<hr>\nepsilon"
pattern = r'^(.*?)\n(<hr.*?)\n(.*)\n(<hr.*)$'

re.search(pattern, text, re.DOTALL).groups()

⇒

('alpha', '<hr>', 'gamma', '<hr>\nepsilon')