Discussion:
Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3
(too old to reply)
John Nagle
2015-03-12 19:55:20 UTC
Permalink
I have working code from Python 2 which uses "pickle"
to talk to a subprocess via stdin/stdio. I'm trying to
make that work in Python 3.

First, the subprocess Python is invoked with the "-d' option,
so stdin and stdio are supposed to be unbuffered binary streams.
That was enough in Python 2, but it's not enough in Python 3.

The subprocess and its connections are set up with

proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
stdout=subprocess.PIPE, env=env)

...
self.reader = pickle.Unpickler(self.proc.stdout)
self.writer = pickle.Pickler(self.proc.stdin, 2)

after which I get

result = self.reader.load()
TypeError: 'str' does not support the buffer interface

That's as far as traceback goes, so I assume this is
disappearing into C code.

OK, I know I need a byte stream. I tried

self.reader = pickle.Unpickler(self.proc.stdout.buffer)
self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)

That's not allowed. The "stdin" and "stdout" that are
fields of "proc" do not have "buffer". So I can't do that
in the parent process. In the child, though, where
stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
That fixes the ""str" does not support the buffer interface
error." But now I get the pickle error "Ran out of input"
on the process child side. Probably because there's a
str/bytes incompatibility somewhere.

So how do I get clean binary byte streams between parent
and child process?

John Nagle
Cameron Simpson
2015-03-12 21:56:04 UTC
Permalink
Post by John Nagle
I have working code from Python 2 which uses "pickle"
to talk to a subprocess via stdin/stdio. I'm trying to
make that work in Python 3.
First, the subprocess Python is invoked with the "-d' option,
so stdin and stdio are supposed to be unbuffered binary streams.
You shouldn't need to use unbuffered streams specificly. It should be enough to
.flush() the output stream (at whichever end) after you have written the pickle
data.

I'm skipping some of your discussion; I can see nothing wong. I don't use
pickle itself so aside from saying that your use seems to conform to the python
3 doco I can't comment more deeply. That said, I do use subprocess a fair bit.

[...]
Post by John Nagle
result = self.reader.load()
TypeError: 'str' does not support the buffer interface
That's as far as traceback goes, so I assume this is
disappearing into C code.
No line numbers at all? Or, I suppose, just the line number from your program
and nothing from the pickle module?
Post by John Nagle
OK, I know I need a byte stream. I tried
self.reader = pickle.Unpickler(self.proc.stdout.buffer)
self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
You should not need to care about these. They're not required.
Post by John Nagle
That's not allowed. The "stdin" and "stdout" that are
fields of "proc" do not have "buffer". So I can't do that
in the parent process. In the child, though, where
stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
But irrelevant. Besides, the stream buffer may not contain the whole pickle
data anyway; it will be empty before a read and quite possibly incomplete
afterwards. It is just a buffer.
Post by John Nagle
That fixes the ""str" does not support the buffer interface
error."
I'm not sure "fix" is the right characterisation here.
Post by John Nagle
But now I get the pickle error "Ran out of input"
on the process child side. Probably because there's a
str/bytes incompatibility somewhere.
No, probably because the buffer is only ever a snapshot of part of the stream.

str/bytes errors are more glaringly obviously so.
Post by John Nagle
So how do I get clean binary byte streams between parent
and child process?
This is where I'm confused: my experience is that Popen.subprocess gives you
binary streams; I always need to put an encoder/decoder on them to use text.
Did that just the other day.

BTW, this is on some UNIX variant? Should not be very relevant...

Further questions:

What does self.proc.stdout.__class__ say? And for stdin?

Cheers,
Cameron Simpson <***@zip.com.au>

My opinions are borrowed from someone who no longer needs them.
-- ***@uga.cc.uga.edu
John Nagle
2015-03-13 00:18:54 UTC
Permalink
I have working code from Python 2 which uses "pickle" to talk to a
subprocess via stdin/stdio. I'm trying to make that work in Python
3. First, the subprocess Python is invoked with the "-d' option, so
stdin and stdio are supposed to be unbuffered binary streams.
You shouldn't need to use unbuffered streams specifically. It should
be enough to .flush() the output stream (at whichever end) after you
have written the pickle data.
Doing that.

It's a repeat-transaction thing. Main process sends pickeled
item to subprocess, subprocess reads item, subprocess does work,
subprocess writes picked item to parent. This repeats.

I call writer.clear_memo() and set reader.memo = {} at the
end of each cycle, to clear Pickle's cache. That all worked
fine in Python 2. Are there any known problems with reusing
Python 3 "pickle"s streams?

The identical code works with Python 2.7.9; it's converted to Python
3 using "six" so I can run on both Python versions and look for
differences. I'm using Pickle format 2, for compatibility.
(Tried 0, the ASCII format; it didn't help.)
I'm skipping some of your discussion; I can see nothing wrong. I
don't use pickle itself so aside from saying that your use seems to
conform to the python 3 docs I can't comment more deeply. That said,
I do use subprocess a fair bit.
I'll have to put in more logging and see exactly what's going
over the pipes.

John Nagle
John Nagle
2015-03-13 06:05:41 UTC
Permalink
I have working code from Python 2 which uses "pickle" to talk to a
subprocess via stdin/stdio. I'm trying to make that work in Python
3.
I'm starting to think that the "cpickle" module, which Python 3
uses by default, has a problem. After the program has been
running for a while, I start seeing errors such as

File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
if len(self.badbusinessinfo) > 0 : # if bad stuff
NameError: name 'len' is not defined

which ought to be impossible in Python, and

File "C:\projects\sitetruth\subprocesscall.py", line 129, in send
self.writer.dump(args) # send data
OSError: [Errno 22] Invalid argument

from somewhere deep inside CPickle.

I got

File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in
get_rating_text
(ratingsmalliconurl, ratinglargiconurl, ratingalttext) =
DetailsPageBuilder.getratingiconinfo(rating)
NameError: name 'DetailsPageBuilder' is not defined
(That's an imported module. It worked earlier in the run.)

and finally, even after I deleted all .pyc files and all Python
cache directories:

Fatal Python error: GC object already tracked

Current thread 0x00001a14 (most recent call first):
File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
in description
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
in _get_descriptions
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
in _read_result_packet
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
in read
File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
in _read_query_result
File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
in query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
_query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
execute
File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
File "C:\projects\sitetruth\domaincache.py", line 30 in search
File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module>

That's a definite memory error.

So something is corrupting memory. Probably CPickle.

All my code is in Python. Every library module came in via "pip", into a
clean Python 3.4.3 (32 bit) installation on Win7/x86-64.
Currently installed packages:

beautifulsoup4 (4.3.2)
dnspython3 (1.12.0)
html5lib (0.999)
pip (6.0.8)
PyMySQL (0.6.6)
pyparsing (2.0.3)
setuptools (12.0.5)
six (1.9.0)

And it works fine with Python 2.7.9.

Is there some way to force the use of the pure Python pickle module?
My guess is that there's something about reusing "pickle" instances
that botches memory uses in CPython 3's C code for "cpickle".

John Nagle
Steven D'Aprano
2015-03-13 08:43:08 UTC
Permalink
Post by John Nagle
I'm starting to think that the "cpickle" module, which Python 3
uses by default, has a problem. After the program has been
running for a while, I start seeing errors such as
File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
if len(self.badbusinessinfo) > 0 : # if bad stuff
NameError: name 'len' is not defined
which ought to be impossible in Python, and
"Impossible"?

py> len
<built-in function len>
py> import __builtin__ # use builtins in Python 3
py> del __builtin__.len
py> len
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'len' is not defined


Why something is deleting builtins len is a mystery. Sounds to me that your
Python installation is borked.
Post by John Nagle
File "C:\projects\sitetruth\subprocesscall.py", line 129, in send
self.writer.dump(args) # send data
OSError: [Errno 22] Invalid argument
from somewhere deep inside CPickle.
Why do you say "deep inside CPickle"? The traceback says
C:\projects\sitetruth\subprocesscall.py

Is it possible you have accidentally shadowed the CPickle module with
something? What does this say?

import cPickle
print cPickle.__file__

Use _pickle in Python 3.
Post by John Nagle
I got
File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in
get_rating_text
(ratingsmalliconurl, ratinglargiconurl, ratingalttext) =
DetailsPageBuilder.getratingiconinfo(rating)
NameError: name 'DetailsPageBuilder' is not defined
(That's an imported module. It worked earlier in the run.)
and finally, even after I deleted all .pyc files and all Python
Fatal Python error: GC object already tracked
File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
in description
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
in _get_descriptions
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
in _read_result_packet
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
in read
File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
in _read_query_result
File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
in query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
_query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
execute
File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
File "C:\projects\sitetruth\domaincache.py", line 30 in search
File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module>
That's a definite memory error.
So something is corrupting memory. Probably CPickle.
All my code is in Python. Every library module came in via "pip", into a
clean Python 3.4.3 (32 bit) installation on Win7/x86-64.
beautifulsoup4 (4.3.2)
dnspython3 (1.12.0)
html5lib (0.999)
pip (6.0.8)
PyMySQL (0.6.6)
pyparsing (2.0.3)
setuptools (12.0.5)
six (1.9.0)
And it works fine with Python 2.7.9.
Is there some way to force the use of the pure Python pickle module?
Try renaming the _pickle module. This works on Linux:

mv /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so~
Post by John Nagle
My guess is that there's something about reusing "pickle" instances
that botches memory uses in CPython 3's C code for "cpickle".
How are you reusing instances?
--
Steven
Peter Otten
2015-03-12 21:57:01 UTC
Permalink
Post by John Nagle
I have working code from Python 2 which uses "pickle"
to talk to a subprocess via stdin/stdio. I'm trying to
make that work in Python 3.
First, the subprocess Python is invoked with the "-d' option,
so stdin and stdio are supposed to be unbuffered binary streams.
That was enough in Python 2, but it's not enough in Python 3.
The subprocess and its connections are set up with
proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
stdout=subprocess.PIPE, env=env)
...
self.reader = pickle.Unpickler(self.proc.stdout)
self.writer = pickle.Pickler(self.proc.stdin, 2)
after which I get
result = self.reader.load()
TypeError: 'str' does not support the buffer interface
That's as far as traceback goes, so I assume this is
disappearing into C code.
OK, I know I need a byte stream. I tried
self.reader = pickle.Unpickler(self.proc.stdout.buffer)
self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
That's not allowed. The "stdin" and "stdout" that are
fields of "proc" do not have "buffer". So I can't do that
in the parent process. In the child, though, where
stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
That fixes the ""str" does not support the buffer interface
error." But now I get the pickle error "Ran out of input"
on the process child side. Probably because there's a
str/bytes incompatibility somewhere.
So how do I get clean binary byte streams between parent
and child process?
I don't know what you have to do to rule out deadlocks when you use pipes
for both stdin and stdout, but binary streams are the default for
subprocess. Can you provide a complete example?

Anyway, here is a demo for two-way communication using the communicate()
method:

$ cat parent.py
import pickle
import subprocess

data = (5, 4.3, "üblich ähnlich nötig")

p = subprocess.Popen(
["python3", "child.py"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)

result = p.communicate(pickle.dumps(data, protocol=2))[0]
print(pickle.loads(result))

$ cat child.py
import sys
import pickle

a, b, c = pickle.load(sys.stdin.buffer)
pickle.dump((a, b, c.upper()), sys.stdout.buffer)

$ python3 parent.py
(5, 4.3, 'ÜBLICH ÄHNLICH NÖTIG')

This is likely not what you want because here everything is buffered so that
continuous interaction is not possible.
Loading...