Discussion:
any issues with long running python apps?
(too old to reply)
Les Schaffer
2010-07-09 19:13:30 UTC
Permalink
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.

The app would read instrument data from a serial port, store the data in
file, and display in matplotlib. typical sampling times are about once
per hour. our design would be to read the serial port once a second
(long story, but it worked for ten days straight so far) and just store
and display the values once an hour. so basically we'd have a long
running for-loop. maybe running in a separate thread.

i have thought through the issues on our end: we have to properly
handle the serial port and make sure our GUI doesn't hang easily. we'd
have to be sure we don't have any memory leaks in numpy or matplotlib
and we're not storing data in memory stupidly. i've never looked into
Windows serial port driver reliability over the long term. But we think
if the serial port hangs, we can detect that and react accordingly.

but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?

or need we only concern ourselves with the nuts behind the wheel:that
is, we the developers?

thanks

Les
Michael Torrie
2010-07-09 19:56:12 UTC
Permalink
Post by Les Schaffer
or need we only concern ourselves with the nuts behind the wheel:that
is, we the developers?
It never hurts to separate the data collection and
visualization/analysis parts into separate programs. That way you can
keep the critical, long-running data collection program running, and if
you get an exception in the GUI because of some divide by zero
programming error or some other problem, you can restart that part
without impacting the overall system.
Christian Heimes
2010-07-09 19:58:35 UTC
Permalink
Post by Les Schaffer
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?
At our company we have long running Python processes with an uptime of
60, 90 days or more. Python is up to the challenge. ;)

But it's a challenge to keep any process written in any programming
language running for two months or more. If I were in your case I would
split up the work across multiple python process. One guardian process
that keeps the other processes alive and checks if they are still
reacting on events, another process that reads data from the serial port
and writes it to a database or file and a third process to process the
data. Small processes reduce the chance of an error. I assume that the
"read from serial port" part is the mission critical element of your app.

Christian
William Heymann
2010-07-09 20:03:15 UTC
Permalink
Post by Les Schaffer
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?
or need we only concern ourselves with the nuts behind the wheel:that
is, we the developers?
thanks
Les
I have been running zope apps for about 10 years now and they normally run for
many months between being restarted so python has no inherent problems with
running that long. Your specific python program might though.

You have to make sure you don't have any reference leaks so your program keeps
growing in memory but you also have to deal with windows. The program can not
be any more reliable then the os it is running on. Personally I would never
make a guarantee like that on a windows box. I would have to hand choose every
piece of hardware and run some kind of unix on it with a guarantee like that.

Last I ran a program for a long time on windows it ran into some kind of
address space fragmentation and eventually it would die on windows. There is
some kind of problem with the windows VM system. 64bit windows will solve that
mostly by having an address space so large you can't fragment it enough to
kill the program in any reasonable time frame. If your program is not
allocating and destroying large data structures all the time you probably
don't have to worry about that but you do have to test it.
Terry Reedy
2010-07-09 20:16:52 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
The app would read instrument data from a serial port, store the data in
file, and display in matplotlib. typical sampling times are about once
per hour. our design would be to read the serial port once a second
(long story, but it worked for ten days straight so far) and just store
and display the values once an hour. so basically we'd have a long
running for-loop. maybe running in a separate thread.
Is this a dedicated machine, so you do not have anything else going that
can delay for more than a second?
Post by Les Schaffer
i have thought through the issues on our end: we have to properly handle
the serial port and make sure our GUI doesn't hang easily. we'd have to
be sure we don't have any memory leaks in numpy or matplotlib and we're
not storing data in memory stupidly. i've never looked into Windows
serial port driver reliability over the long term. But we think if the
serial port hangs, we can detect that and react accordingly.
I read the ibmpc serial port bios code in the early 80s. It was pretty
simple then. I would be very surprised if it has been messed up since
and not fixed.
Post by Les Schaffer
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?
or need we only concern ourselves with the nuts behind the wheel:that
is, we the developers?
Python has been used for long-running background processes, at least on
*nix, so the Python devs are sensitive to memory leak issues and respond
to leak reports. That still leaves memory fragmentation. To try to avoid
that, I would allocate all the needed data arrays immediately on startup
(with dummy None pointers) and reuse them instead of deleting and
regrowing them. Keeping explicit pointers is, of course more tedious and
slightly error prone.

I hope someone with actual experience also answers.
--
Terry Jan Reedy
geremy condra
2010-07-09 20:42:43 UTC
Permalink
i have been asked to guarantee that a proposed Python application will run
continuously under MS Windows for two months time. And i am looking to know
what i don't know.
I normally use Linux for this sort of thing, so YMMV on the following advice.
The app would read instrument data from a serial port, store the data in
file, and display in matplotlib.  typical sampling times are about once per
hour. our design would be to read the serial port once a second (long story,
but it worked for ten days straight so far) and just store and display the
values once an hour. so basically we'd have a long running for-loop. maybe
running in a separate thread.
I'd keep the two timers, the process that actually checks and logs the data,
and any postprocessing code completely separate. I'd also use something
like the logging module to double up on where your data is stored- one on
the local machine, another physically separated in case you lose a hard
drive. It will also give you more information about where a failure might
have occurred if/when it does. I'd also handle output/display on a separate
machine.
i have thought through the issues on our end:  we have to properly handle
the serial port and make sure our GUI doesn't hang easily. we'd have to be
sure we don't have any memory leaks in numpy or matplotlib and we're not
storing data in memory stupidly. i've never looked into Windows serial port
driver reliability over the long term. But we think if the serial port
hangs, we can detect that and react accordingly.
I would launch this as a subprocess and log so that even if you miss a
measurement you still get what you need further on in the process.
Just make sure that you can detect it at the time and that you also
log an error when it occurs.

This also minimizes the amount of memory your application directly
handles and the susceptibility of your code to non-fatal problems with
the serial port.
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've never
had to deal with a two-month guarantee before.  is there something else i am
missing here that i should be concerned about on the pure-Python side of
things? something under the hood of the python interpreter that could be
problematic when run for a long time?
Just ask all the what-ifs and you'll probably be ok.

Geremy Condra
Martin P. Hellwig
2010-07-09 21:03:15 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
Get a good lawyer and put into the contract, the last thing you want is
a windows update that restarts the host and you are responsible because
you guaranteed that it would run continuously.

On the technical side; as Christian Heimes already pointed out, split
the programs. Specifically I would have 1 service for data gathering,
two separate watchdog services (that checks whether the other watchdog
is still running and the 'core' service).

The GUI should be an user side app and the services could be too,
however consider running the services under the appropriate system
account as in the past I have seen some strange things happening with
services under user account, especially if there are password policies.

I don't see from the interpreter point of view no reason why it couldn't
work, it is much more likely your host system will mess things up (even
if it wouldn't be windows).
<cut rest>
--
mph
John Nagle
2010-07-09 21:19:44 UTC
Permalink
This post might be inappropriate. Click to display it.
Emile van Sebille
2010-07-09 23:30:56 UTC
Permalink
On 7/9/2010 12:13 PM Les Schaffer said...
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time.
Keep users off the machine, turn off automated updates, and point dns to
127.0.0.1. Oh, put it on a UPS. I've got a handful or two of these
automated systems in place and rarely have trouble. Add a watchdog
scheduled task process to restart the job and check disk space or memory
usage and push out a heartbeat.

I found Chris Liechti's serial module years ago and have used it
successfully since.

The only times that come to mind when I've problems on the python side
were related to memory usage and the system started thrashing. Adding
memory fixed it.

HTH,

Emile
Post by Les Schaffer
And i am looking
to know what i don't know.
The app would read instrument data from a serial port, store the data in
file, and display in matplotlib. typical sampling times are about once
per hour. our design would be to read the serial port once a second
(long story, but it worked for ten days straight so far) and just store
and display the values once an hour. so basically we'd have a long
running for-loop. maybe running in a separate thread.
i have thought through the issues on our end: we have to properly handle
the serial port and make sure our GUI doesn't hang easily. we'd have to
be sure we don't have any memory leaks in numpy or matplotlib and we're
not storing data in memory stupidly. i've never looked into Windows
serial port driver reliability over the long term. But we think if the
serial port hangs, we can detect that and react accordingly.
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?
or need we only concern ourselves with the nuts behind the wheel:that
is, we the developers?
thanks
Les
Roy Smith
2010-07-09 23:32:57 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
Heh. The OS won't stay up that long.
Tim Chase
2010-07-10 00:23:37 UTC
Permalink
Post by Roy Smith
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
Heh. The OS won't stay up that long.
While I'm not sure how much of Roy's comment was "hah, hah, just
serious", this has been my biggest issue with long-running Python
processes on Win32 -- either power outages the UPS can't handle,
or (more frequently) the updates (whether Microsoft-initiated or
by other vendors' update tools) require a reboot for every
${EXPLETIVE}ing thing. The similar long-running Python processes
I have on my Linux boxes have about 0.1% of the reboots/restarts
for non-electrical reasons (just kernel and Python updates).

As long as you're not storing an ever-increasing quantity of data
in memory (write it out to disk storage and you should be fine),
I've not had problems with Python-processes running for months.
If you want belt+suspenders with that, you can take others'
recommendations for monitoring processes and process separation
of data-gathering vs. front-end GUI/web interface.

-tkc
sturlamolden
2010-07-11 00:48:57 UTC
Permalink
Post by Tim Chase
While I'm not sure how much of Roy's comment was "hah, hah, just
serious", this has been my biggest issue with long-running Python
processes on Win32 -- either power outages the UPS can't handle,
or (more frequently) the updates
Win32 is also the only OS in common use known to fragment memory
enough to make long-running processes crash or hang (including system
services), and require reboots on regular basis. Algorithms haven't
changed, but it takes a bit "longer" for the heap to go fubar with
Win64. (That is, "longer" as in "you're dead long before it happens".)
For processes that needs to run that long, I would really recommend
using Win64 and Python compiled for amd64.
Bruno Desthuilliers
2010-07-10 16:23:22 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
(snip)
Post by Les Schaffer
but none of this has anything to do with Python itself. i am sure python
servers have been running reliably for long periods of time, but i've
never had to deal with a two-month guarantee before. is there something
else i am missing here that i should be concerned about on the
pure-Python side of things? something under the hood of the python
interpreter that could be problematic when run for a long time?
Zope is (rightly) considered as a memory/resources hog, and I have a
Zope instance hosting two apps on a cheap dedicated server that has not
been restarted for the past 2 or 3 years. So as long as your code is
clean you should not have much problem with the Python runtime itself,
at least on a a linux box. Can't tell how it would work on Windows.
John Nagle
2010-07-10 18:54:46 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
The app would read instrument data from a serial port,
If the device you're listening to is read-only, and you're just
listening, make a cable to feed the serial data into two machines,
and have them both log it. Put them on separate UPSs and in
a place where nobody can knock them over or mess with them.

John Nagle
Alf P. Steinbach /Usenet
2010-07-10 19:07:12 UTC
Permalink
Post by John Nagle
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
to know what i don't know.
The app would read instrument data from a serial port,
If the device you're listening to is read-only, and you're just
listening, make a cable to feed the serial data into two machines,
and have them both log it. Put them on separate UPSs and in
a place where nobody can knock them over or mess with them.
"The Ramans do everything in triplicate" - Old jungle proverb


Cheers,

- Alf
--
blog at <url: http://alfps.wordpress.com>
Grant Edwards
2010-07-12 14:19:32 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
^^^^^^^^^^

IMO, that's going to be your main problem.
--
Grant Edwards grant.b.edwards Yow! PIZZA!!
at
gmail.com
John Nagle
2010-07-12 17:16:07 UTC
Permalink
Post by Grant Edwards
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
^^^^^^^^^^
IMO, that's going to be your main problem.
If you're doing a real-time job, run a real-time OS. QNX,
a real-time variant of Linux, Windows CE, Windows Embedded, LynxOS,
etc. There's too much background junk going on in a consumer OS
today.

Yesterday, I was running a CNC plasma cutter that's controlled
by Windows XP. This is a machine that moves around a plasma torch that
cuts thick steel plate. A "New Java update is available" window
popped up while I was working. Not good.

John Nagle
CM
2010-07-12 18:54:15 UTC
Permalink
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
                          ^^^^^^^^^^
IMO, that's going to be your main problem.
    If you're doing a real-time job, run a real-time OS.  QNX,
a real-time variant of Linux, Windows CE, Windows Embedded, LynxOS,
etc.  There's too much background junk going on in a consumer OS
today.
    Yesterday, I was running a CNC plasma cutter that's controlled
by Windows XP.  This is a machine that moves around a plasma torch that
cuts thick steel plate.  A "New Java update is available" window
popped up while I was working.  Not good.
                                John Nagle
I'm not sure I can like that example any better.
John Bokma
2010-07-12 20:27:07 UTC
Permalink
This post might be inappropriate. Click to display it.
Tim Chase
2010-07-12 22:01:33 UTC
Permalink
Post by John Nagle
Post by Grant Edwards
Post by Les Schaffer
i have been asked to guarantee that a proposed Python application will
run continuously under MS Windows for two months time. And i am looking
^^^^^^^^^^
IMO, that's going to be your main problem.
Yesterday, I was running a CNC plasma cutter that's controlled
by Windows XP. This is a machine that moves around a plasma torch that
cuts thick steel plate. A "New Java update is available" window
popped up while I was working. Not good.
<Clippy> Hi, it looks like you're attempting to cut something
with a plasma torch. Would you like help?

(_) inserting a steel plate to cut

(_) severing the tip of your finger

[ ] Don't show me this tip again.




-tkc
CM
2010-07-13 04:47:16 UTC
Permalink
Post by Tim Chase
     Yesterday, I was running a CNC plasma cutter that's controlled
by Windows XP.  This is a machine that moves around a plasma torch that
cuts thick steel plate.  A "New Java update is available" window
popped up while I was working.  Not good.
<Clippy> Hi, it looks like you're attempting to cut something
with a plasma torch.  Would you like help?
(_) inserting a steel plate to cut
(_) severing the tip of your finger
[ ] Don't show me this tip again.
-tkc
Brilliant. (I almost think you should have gone with
far more than severing the tip of the finger, but then
you'd lose that killer play on words at the end).
Stephen Hansen
2010-07-13 01:36:32 UTC
Permalink
Post by John Nagle
Yesterday, I was running a CNC plasma cutter that's controlled
by Windows XP. This is a machine that moves around a plasma torch that
cuts thick steel plate. A "New Java update is available" window
popped up while I was working. Not good.
That's downright frightening.
--
Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
Maria R
2010-07-14 10:23:24 UTC
Permalink
I can second the stated opinion that Python per se is stable enough.
We deliver production systems running 24/7 with uptimes counted in
several months
and from what I can see, compared to the OP's app, ours is vastly more
complex.

The only Python-related issue we have encountered so far, wrt to
stability, is how
timers are working. Extensive creation of timer threads have locked up
after some
indeterminable time. We found that the issue was probably related to
some update in
Windows at the time.
We do not know whether this issue is resolved (we encountered it back
in Python 1.4)
and we rewrote our code to use timers differently.

I also second that partitioning the solution in working (server) parts
and GUI (client)
parts is important.

I do not second the generally outspoken distrust in Windows. Indeed, I
would prefer *nix/*nux
but in our case, stability concerns is not one of the issues behind
that.

We use threading to a certain extent (in addition to partioning into
processes). One approach we have,
and have shown very useful to gain stability, is to use Exception
handling carefully and extensively.
We catch *every* exception, report and counteract but do not allow the
process/thread to die.
This is not a trival approach, by no means, but when one know the app
sufficiently, it can be applied
with good results.

Just my 2 c
//Maria
Les Schaffer
2010-07-16 13:07:42 UTC
Permalink
thanks to all for the replies.

the Windows memory fragmentation was one of the "i didn't know that"
items. we will use 64-bit Windows OS if the job happens.

agree with all the other suggestions: multiple threads for data and GUI,
etc. Also, might push for Linux even though the company is Windows-only
presently.

but thinking about the issue some more as well as everyone's comments,
have decided to proceed cautiously and let the client decide first that
they really need this app regardless of guaranteed Windows-based uptime.

Les
Stefan Behnel
2010-07-16 13:29:35 UTC
Permalink
Post by Les Schaffer
agree with all the other suggestions: multiple threads for data and GUI,
The way I read it, the suggestion was to use separate processes, not
multiple threads. That's a pretty important difference.

Stefan
Les Schaffer
2010-07-16 14:13:17 UTC
Permalink
Post by Stefan Behnel
Post by Les Schaffer
agree with all the other suggestions: multiple threads for data and GUI,
The way I read it, the suggestion was to use separate processes, not
multiple threads. That's a pretty important difference.
check. processes, not threads.

Les

Continue reading on narkive:
Loading...