Discussion:
re.DOTALL
(too old to reply)
Stefan Ram
2024-07-17 18:09:51 UTC
Permalink
Below, I use [\s\S] to match each and every character.
I can't seem to get the same effect using "re.DOTALL"!

Yet the Python Library Reference says,

|(Dot.) In the default mode, this matches any character except
|a newline. If the DOTALL flag has been specified, this
|matches any character including a newline.
what the Python Library Reference says.

main.py

import re

text = '''
alpha
<hr>
gamma
<hr>
epsilon
'''[ 1: -1 ]

pattern = r'^.*?\n<hr.*?\n(.*)\n<hr.*$'

output = re.sub( pattern.replace( r'.', r'[\s\S]' ), r'\1', text )
print( output )

print( '--' )

output = re.sub( pattern, r'\1', text, re.DOTALL )
print( output )

stdout

gamma
--
alpha
<hr>
gamma
<hr>
epsilon
Stefan Ram
2024-07-17 18:21:26 UTC
Permalink
Post by Stefan Ram
I can't seem to get the same effect using "re.DOTALL"!
PS: But (?s) works.
Lawrence D'Oliveiro
2024-07-17 23:54:25 UTC
Permalink
Post by Stefan Ram
Below, I use [\s\S] to match each and every character.
I can't seem to get the same effect using "re.DOTALL"!
This might help clarify things:

text = "alpha\n<hr>\ngamma\n<hr>\nepsilon"
pattern = r'^(.*?)\n(<hr.*?)\n(.*)\n(<hr.*)$'

re.search(pattern, text, re.DOTALL).groups()



('alpha', '<hr>', 'gamma', '<hr>\nepsilon')

Loading...