Discussion:
string.replace non-ascii characters
(too old to reply)
Samuel Karl Peterson
2007-02-12 04:55:17 UTC
Permalink
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Some googling seemed to indicate other people have reported similar
troubles:

http://mail.python.org/pipermail/python-list/2006-July/391617.html

Anyone have any enlightening advice for me?
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
Steven Bethard
2007-02-12 05:23:59 UTC
Permalink
Post by Samuel Karl Peterson
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).
Was it something like this?
Post by Samuel Karl Peterson
u'\xa0'.replace('\xa0', '')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)

You might get that if you're mixing str and unicode. If both strings are
Post by Samuel Karl Peterson
u'\xa0'.replace(u'\xa0', '')
u''
Post by Samuel Karl Peterson
'\xa0'.replace('\xa0', '')
''

STeVe
Samuel Karl Peterson
2007-02-12 05:38:29 UTC
Permalink
Post by Steven Bethard
Post by Samuel Karl Peterson
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).
Was it something like this?
Post by Samuel Karl Peterson
u'\xa0'.replace('\xa0', '')
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
0: ordinal not in range(128)
Yeah that looks like exactly what was happening, thank you. I wonder
why I had a unicode string though. I thought urllib2 always spat out
a plain string. Oh well.

u'\xa0'.encode('latin-1').replace('\xa0', " ")

Horray.
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
Gabriel Genellina
2007-02-12 06:01:56 UTC
Permalink
En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
Post by Samuel Karl Peterson
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
--
Gabriel Genellina
Gabriel Genellina
2007-02-12 06:01:55 UTC
Permalink
En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
Post by Samuel Karl Peterson
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
--
Gabriel Genellina
Steven D'Aprano
2007-02-12 06:23:14 UTC
Permalink
Post by Gabriel Genellina
En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
Post by Samuel Karl Peterson
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
I just did that last week. Around 250 useless lines removed from a 1000
lines module.
[snip]

Hot out of uni, my first programming job was assisting a consultant who
was writing an application in Apple's "Hypertalk", a so-called "fourth
generation language" with an English-like syntax, aimed at non-programmers.

Virtually the first thing I did was refactor part of his code that looked
something like this:

set the name of button id 1 to 1
set the name of button id 2 to 2
set the name of button id 3 to 3
...
set the name of button id 399 to 399
set the name of button id 400 to 400


into something like this:

for i = 1 to 400:
set the name of button id i to i
--
Steven D'Aprano
Gabriel Genellina
2007-02-12 06:01:56 UTC
Permalink
En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
Post by Samuel Karl Peterson
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
--
Gabriel Genellina
Duncan Booth
2007-02-12 10:44:14 UTC
Permalink
Post by Gabriel Genellina
If I were paid for the number of lines *written* that would not be a
great deal :)
You don't by any chance get paid by the number of posts to c.l.python?
Deniz Dogan
2007-02-12 12:44:29 UTC
Permalink
Post by Duncan Booth
Post by Gabriel Genellina
If I were paid for the number of lines *written* that would not be a
great deal :)
You don't by any chance get paid by the number of posts to c.l.python?
I was thinking the same thing.
John Machin
2007-02-12 13:29:22 UTC
Permalink
Post by Deniz Dogan
Post by Duncan Booth
Post by Gabriel Genellina
If I were paid for the number of lines *written* that would not be a
great deal :)
You don't by any chance get paid by the number of posts to c.l.python?
I was thinking the same thing.
O maker of the monstrous millisecond-muncher, I was thinking that you
were paid by the number of times that you typed 3600000 :-)
Gabriel Genellina
2007-02-12 18:11:30 UTC
Permalink
En Mon, 12 Feb 2007 07:44:14 -0300, Duncan Booth
Post by Duncan Booth
Post by Gabriel Genellina
If I were paid for the number of lines *written* that would not be a
great deal :)
You don't by any chance get paid by the number of posts to c.l.python?
I post a few messages but certainly I'm not the most prolific poster here!
--
Gabriel Genellina
Loading...