what would be the regular expression for null byte present in a string

Discussion:

(too old to reply)

Shambhu Rajak

2015-01-13 13:40:52 UTC

I have a string that I get as an output of a command as:
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'

I want to fetch '10232ae8944a' from the above string.

I want to find a re pattern that could replace all the \x01..\x0z to be replace by empty string '', so that I can get the desired portion of string

Can anyone help me with a working regex for it.

Thanks,
Shambhu

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

Peter Otten

2015-01-13 14:41:51 UTC

Permalink

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'

Post by Shambhu Rajak
I want to fetch '10232ae8944a' from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.

I think you want the str.tranlate() method rather than a regex.

Post by Shambhu Rajak

delenda = "".join(map(chr, range(32)))
identity = "".join(map(chr, range(256)))

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'.translate(identity,
delenda)
'10232ae8944a'

Post by Shambhu Rajak

mapping = dict.fromkeys(range(32))

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'.translate(mapping)
'10232ae8944a'

Thomas 'PointedEars' Lahn

2015-01-14 13:52:06 UTC

Permalink

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'

Post by Peter Otten

I think you want the str.tranlate() method rather than a regex.

Another possibility, but given the length of the list probably not the most
efficient one. Aside from re and this one, here is another (tested with
Python 3.4.2):

filtered = ''.join(filter(lambda ch: ord(ch) > 0x0f, s))

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Thomas 'PointedEars' Lahn

2015-01-14 14:11:24 UTC

Permalink

Post by Thomas 'PointedEars' Lahn

Post by Peter Otten

Post by Shambhu Rajak
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string '', so that I can get the desired portion of
string
Can anyone help me with a working regex for it.

I think you want the str.tranlate() method rather than a regex.

Another possibility, but given the length of the list probably not the
most efficient one. Aside from re and this one, here is another (tested
filtered = ''.join(filter(lambda ch: ord(ch) > 0x0f, s))

filtered = ''.join(filter(lambda ch: ch > "\x0f", s))

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Denis McMahon

2015-01-13 17:25:11 UTC

Permalink

Post by Shambhu Rajak
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00

\x00\x00\x00\x00\x00\x00\n'

What have you tried, and what was the result?

Regex isn't designed to work with byte strings.

Post by Shambhu Rajak

str = '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a

\x02\x00\x00\x00\x00\x00\x00\x00\n'

Post by Shambhu Rajak

str.replace('\x00','').replace('\x0c','').replace('\x01','').replace

('\x02','').replace('\n','')
'10232ae8944a'

This works for the specific example you gave, will your "string" ever
contain unwanted characters apart from \x00, \x01, \x02, \x0c, \n, and is
it ever possible for one of those to be in the wanted set?

Post by Shambhu Rajak

str[12:24]

'10232ae8944a'

This also works for the specific example you gave, is the data you want
to extract always going to be at the same offset in the string, and of
the same length?

Post by Shambhu Rajak

''.join([str[x] for x in range(len(str)) if str[x] >= ' ' and str[x]

<= '~'])
'10232ae8944a'

This also works for the specific example you gave, and is a way to remove
non printing and 8bit characters from a string. Is this what you actually
want to do?

Post by Shambhu Rajak

str.strip('\x00\x0c\x01\x02\n')

'10232ae8944a'

This also works for the specific example that you gave, it uses the strip
function with a string of characters to be stripped, this will work as
long as you can predefine all the characters to strip and none of the
characters to strip is ever desired as part of the result.

So 4 different methods, each of which seems to do, in the case of the
specific example you gave, exactly what you want.

However although I tried a few patterns, I don't seem to be able to
create an re that will do the job.

Post by Shambhu Rajak

patt = re.compile(r'[0-9a-zA-Z]+')
res = patt.match(str)
res
print res

None

Post by Shambhu Rajak

type(res)

Post by Shambhu Rajak

patt = re.compile(r'[0-z]+')
res = patt.match(str)
res
print res

None

Post by Shambhu Rajak

type(res)

Post by Shambhu Rajak

patt = re.compile(r'[ -~]+')
res = patt.match(str)
res
print res

None

Post by Shambhu Rajak

type(res)

--
Denis McMahon, ***@gmail.com

Thomas 'PointedEars' Lahn

2015-01-14 13:42:45 UTC

Permalink

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'

You need a character class with that range.

Post by Shambhu Rajak
Can anyone help me with a working regex for it.

Yes.

Post by Shambhu Rajak
________________________________
PLEASE NOTE: The information contained in this electronic mail message is
intended only for the use of the designated recipient(s) named above. […]

Please disable this nonsense, do not post multi-part messages, or if this is
not possible with your e-mail client or provider, use another one,
respectively.

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Wolfgang Maier

2015-01-14 14:21:33 UTC

Permalink

Post by Shambhu Rajak
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
I want to fetch ‘*10232ae8944a*’ from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string ‘’, so that I can get the desired portion of string
Can anyone help me with a working regex for it.

Sorry, no regex either, but depending on what exact behavior you need
str.isprintable() or even str.isalnum() could be for you:

e.g.

s_in =
'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00\x00\x00\x00\x00\x00\x00\n'
s_out = ''.join(c for c in s_in if c.isprintable())

Wolfgang

Wolfgang Maier

2015-01-14 14:35:40 UTC

Permalink

'\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00*10232ae8944a*\x02\x00\x00\x00\x00\x00\x00\x00\n'
I want to fetch ‘*10232ae8944a*’ from the above string.
I want to find a re pattern that could replace all the \x01..\x0z to be
replace by empty string ‘’, so that I can get the desired portion of string
Can anyone help me with a working regex for it.
Thanks,
Shambhu

If the characters you need to keep always form a contiguous stretch, you
can also use:

exclude = ''.join(chr(n) for n in range(32))
s_in.strip(exclude)

Wolfgang