Discussion:
what would be the regular expression for null byte present in a string
(too old to reply)
Shambhu Rajak
2015-01-13 13:40:52 UTC
Permalink
I have a string that I get as an output of a command as:
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'

I want to fetch '10232ae8944a' from the above string.

I want to find a re pattern that could replace all the \x01..\x0z to be replace by empty string '', so that I can get the desired portion of string

Can anyone help me with a working regex for it.

Thanks,
Shambhu

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
Peter Otten
2015-01-13 14:41:51 UTC
Permalink
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
Post by Shambhu Rajak
I want to fetch '10232ae8944a' from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.
I think you want the str.tranlate() method rather than a regex.
Post by Shambhu Rajak
delenda = "".join(map(chr, range(32)))
identity = "".join(map(chr, range(256)))
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'.translate(identity,
delenda)
'10232ae8944a'
Post by Shambhu Rajak
mapping = dict.fromkeys(range(32))
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'.translate(mapping)
'10232ae8944a'
Thomas 'PointedEars' Lahn
2015-01-14 13:52:06 UTC
Permalink
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
Post by Peter Otten
Post by Shambhu Rajak
I want to fetch '10232ae8944a' from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.
I think you want the str.tranlate() method rather than a regex.
Another possibility, but given the length of the list probably not the most
efficient one. Aside from re and this one, here is another (tested with
Python 3.4.2):

filtered = ''.join(filter(lambda ch: ord(ch) > 0x0f, s))
--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Thomas 'PointedEars' Lahn
2015-01-14 14:11:24 UTC
Permalink
Post by Thomas 'PointedEars' Lahn
Post by Peter Otten
Post by Shambhu Rajak
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.
I think you want the str.tranlate() method rather than a regex.
Another possibility, but given the length of the list probably not the
most efficient one. Aside from re and this one, here is another (tested
filtered = ''.join(filter(lambda ch: ord(ch) > 0x0f, s))
filtered = ''.join(filter(lambda ch: ch > "\x0f", s))
--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Denis McMahon
2015-01-13 17:25:11 UTC
Permalink
Post by Shambhu Rajak
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00
\x00\x00\x00\x00\x00\x00\n'
Post by Shambhu Rajak
I want to fetch '10232ae8944a' from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.
What have you tried, and what was the result?

Regex isn't designed to work with byte strings.
Post by Shambhu Rajak
str = '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a
\x02\x00\x00\x00\x00\x00\x00\x00\n'
Post by Shambhu Rajak
str.replace('\x00','').replace('\x0c','').replace('\x01','').replace
('\x02','').replace('\n','')
'10232ae8944a'

This works for the specific example you gave, will your "string" ever
contain unwanted characters apart from \x00, \x01, \x02, \x0c, \n, and is
it ever possible for one of those to be in the wanted set?
Post by Shambhu Rajak
str[12:24]
'10232ae8944a'

This also works for the specific example you gave, is the data you want
to extract always going to be at the same offset in the string, and of
the same length?
Post by Shambhu Rajak
''.join([str[x] for x in range(len(str)) if str[x] >= ' ' and str[x]
<= '~'])
'10232ae8944a'

This also works for the specific example you gave, and is a way to remove
non printing and 8bit characters from a string. Is this what you actually
want to do?
Post by Shambhu Rajak
str.strip('\x00\x0c\x01\x02\n')
'10232ae8944a'

This also works for the specific example that you gave, it uses the strip
function with a string of characters to be stripped, this will work as
long as you can predefine all the characters to strip and none of the
characters to strip is ever desired as part of the result.

So 4 different methods, each of which seems to do, in the case of the
specific example you gave, exactly what you want.

However although I tried a few patterns, I don't seem to be able to
create an re that will do the job.
Post by Shambhu Rajak
patt = re.compile(r'[0-9a-zA-Z]+')
res = patt.match(str)
res
print res
None
Post by Shambhu Rajak
type(res)
<type 'NoneType'>
Post by Shambhu Rajak
patt = re.compile(r'[0-z]+')
res = patt.match(str)
res
print res
None
Post by Shambhu Rajak
type(res)
<type 'NoneType'>
Post by Shambhu Rajak
patt = re.compile(r'[ -~]+')
res = patt.match(str)
res
print res
None
Post by Shambhu Rajak
type(res)
<type 'NoneType'>
--
Denis McMahon, ***@gmail.com
Thomas 'PointedEars' Lahn
2015-01-14 13:42:45 UTC
Permalink
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
Post by Shambhu Rajak
I want to fetch '10232ae8944a' from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
You need a character class with that range.
Post by Shambhu Rajak
Can anyone help me with a working regex for it.
Yes.
Post by Shambhu Rajak
________________________________
PLEASE NOTE: The information contained in this electronic mail message is
intended only for the use of the designated recipient(s) named above. […]
Please disable this nonsense, do not post multi-part messages, or if this is
not possible with your e-mail client or provider, use another one,
respectively.
--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Wolfgang Maier
2015-01-14 14:21:33 UTC
Permalink
Post by Shambhu Rajak
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
I want to fetch ‘*10232ae8944a*’ from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string ‘’, so that I can get the desired portion of string
Can anyone help me with a working regex for it.
Sorry, no regex either, but depending on what exact behavior you need
str.isprintable() or even str.isalnum() could be for you:

e.g.

s_in =
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
s_out = ''.join(c for c in s_in if c.isprintable())

Wolfgang
Wolfgang Maier
2015-01-14 14:35:40 UTC
Permalink
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00*10232ae8944a*\x02\x00\x00\x00\x00\x00\x00\x00\n'
I want to fetch ‘*10232ae8944a*’ from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string ‘’, so that I can get the desired portion of string
Can anyone help me with a working regex for it.
Thanks,
Shambhu
If the characters you need to keep always form a contiguous stretch, you
can also use:

exclude = ''.join(chr(n) for n in range(32))
s_in.strip(exclude)

Wolfgang

Loading...