Discussion:
Thread safety issue (I think) with defaultdict
(too old to reply)
Israel Brewster
2017-10-31 17:38:10 UTC
Permalink
A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe, whereas more complicated constructs, such as lambdas, make it not thread safe. In my situation, I'm using a lambda, specifically:

lambda: datetime.min

So presumably *not* thread safe.

My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.

When the program first starts up, two things happen:

1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from hard storage.

The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.

So my questions are:

1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
2) How can I fix this?

Note: I really don't care if the "initial" update happens after the thread receives a data point or two, and therefore overwrites one or two values. I just need the dictionary to be fully populated at some point early in execution. In usage, the dictionary is used to see of an aircraft has been seen "recently", so if the most recent datapoint gets overwritten with a slightly older one from disk storage, that's fine - it's just if it's still showing datetime.min because we haven't gotten in any datapoint since we launched the program, even though we have "recent" data in disk storage thats a problem. So I don't care about the obvious race condition between the two operations, just that the end result is a populated dictionary. Note also that as datapoint come in, they are being written to disk, so the disk storage doesn't lag significantly anyway.

The framework of my code is below:

File: watcher.py

last_points = defaultdict(lambda:datetime.min)

# This function is launched as a thread using the threading module when the first client connects
def watch():
while true:
<wait for datapoint>
pointtime= <extract/parse timestamp from datapoint>
if last_points[<aircraft_identifier>] < pointtime:
<do stuff>
last_points[<aircraft_identifier>]=pointtime
#DEBUGGING
print("At update:", len(last_points))


File: main.py:

from .watcher import last_points

# This function will be triggered by a web call from a client, so could happen at any time
# Client will call this function immediately after connecting, as well as in response to various user actions.
def getac():
<load list of aircraft and times from disk>
<do stuff to send the list to the client>
for record in aclist:
last_points[<aircraft_identifier>]=record_timestamp
#DEBUGGING
print("At get AC:", len(last_points))


-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Israel Brewster
2017-11-01 17:04:58 UTC
Permalink
Let me rephrase the question, see if I can simplify it. I need to be able to access a defaultdict from two different threads - one thread that responds to user requests which will populate the dictionary in response to a user request, and a second thread that will keep the dictionary updated as new data comes in. The value of the dictionary will be a timestamp, with the default value being datetime.min, provided by a lambda:

lambda: datetime.min

At the moment my code is behaving as though each thread has a *separate* defaultdict, even though debugging shows the same addresses - the background update thread never sees the data populated into the defaultdict by the main thread. I was thinking race conditions or the like might make it so one particular loop of the background thread occurs before the main thread, but even so subsequent loops should pick up on the changes made by the main thread.

How can I *properly* share a dictionary like object between two threads, with both threads seeing the updates made by the other?
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Post by Israel Brewster
lambda: datetime.min
So presumably *not* thread safe.
My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.
1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from hard storage.
The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.
1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
2) How can I fix this?
Note: I really don't care if the "initial" update happens after the thread receives a data point or two, and therefore overwrites one or two values. I just need the dictionary to be fully populated at some point early in execution. In usage, the dictionary is used to see of an aircraft has been seen "recently", so if the most recent datapoint gets overwritten with a slightly older one from disk storage, that's fine - it's just if it's still showing datetime.min because we haven't gotten in any datapoint since we launched the program, even though we have "recent" data in disk storage thats a problem. So I don't care about the obvious race condition between the two operations, just that the end result is a populated dictionary. Note also that as datapoint come in, they are being written to disk, so the disk storage doesn't lag significantly anyway.
File: watcher.py
last_points = defaultdict(lambda:datetime.min)
# This function is launched as a thread using the threading module when the first client connects
<wait for datapoint>
pointtime= <extract/parse timestamp from datapoint>
<do stuff>
last_points[<aircraft_identifier>]=pointtime
#DEBUGGING
print("At update:", len(last_points))
from .watcher import last_points
# This function will be triggered by a web call from a client, so could happen at any time
# Client will call this function immediately after connecting, as well as in response to various user actions.
<load list of aircraft and times from disk>
<do stuff to send the list to the client>
last_points[<aircraft_identifier>]=record_timestamp
#DEBUGGING
print("At get AC:", len(last_points))
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
--
https://mail.python.org/mailman/listinfo/python-list
Israel Brewster
2017-11-01 17:45:49 UTC
Permalink
Post by Israel Brewster
lambda: datetime.min
At the moment my code is behaving as though each thread has a *separate* defaultdict, even though debugging shows the same addresses - the background update thread never sees the data populated into the defaultdict by the main thread. I was thinking race conditions or the like might make it so one particular loop of the background thread occurs before the main thread, but even so subsequent loops should pick up on the changes made by the main thread.
How can I *properly* share a dictionary like object between two threads, with both threads seeing the updates made by the other?
For what it's worth, if I insert a print statement in both threads (which I am calling "Get AC", since that is the function being called in the first thread, and "update", since that is the purpose of the second thread), I get the following output:

Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788
Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399
Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853
Length At update: 3 ID: 4524152200 Time: 2017-11-01 09:41:25.530434
Length At update: 4 ID: 4524152200 Time: 2017-11-01 09:41:25.532073
Length At update: 5 ID: 4524152200 Time: 2017-11-01 09:41:25.682161
Length At update: 6 ID: 4524152200 Time: 2017-11-01 09:41:26.807127
...

So the object ID hasn't changed as I would expect it to if, in fact, we have created a separate object for the thread. And the first call that populates it with 54 items happens "well" before the first update call - a full .3 seconds, which I would think would be an eternity is code terms. So it doesn't even look like it's a race condition causing the issue.

It seems to me this *has* to be something to do with the use of threads, but I'm baffled as to what.
Post by Israel Brewster
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Post by Israel Brewster
lambda: datetime.min
So presumably *not* thread safe.
My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.
1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from hard storage.
The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.
1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
2) How can I fix this?
Note: I really don't care if the "initial" update happens after the thread receives a data point or two, and therefore overwrites one or two values. I just need the dictionary to be fully populated at some point early in execution. In usage, the dictionary is used to see of an aircraft has been seen "recently", so if the most recent datapoint gets overwritten with a slightly older one from disk storage, that's fine - it's just if it's still showing datetime.min because we haven't gotten in any datapoint since we launched the program, even though we have "recent" data in disk storage thats a problem. So I don't care about the obvious race condition between the two operations, just that the end result is a populated dictionary. Note also that as datapoint come in, they are being written to disk, so the disk storage doesn't lag significantly anyway.
File: watcher.py
last_points = defaultdict(lambda:datetime.min)
# This function is launched as a thread using the threading module when the first client connects
<wait for datapoint>
pointtime= <extract/parse timestamp from datapoint>
<do stuff>
last_points[<aircraft_identifier>]=pointtime
#DEBUGGING
print("At update:", len(last_points))
from .watcher import last_points
# This function will be triggered by a web call from a client, so could happen at any time
# Client will call this function immediately after connecting, as well as in response to various user actions.
<load list of aircraft and times from disk>
<do stuff to send the list to the client>
last_points[<aircraft_identifier>]=record_timestamp
#DEBUGGING
print("At get AC:", len(last_points))
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Ian Kelly
2017-11-01 17:58:29 UTC
Permalink
Post by Israel Brewster
A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe,
I would not rely on this. It might be true for current versions of
CPython, but I don't think there's any general guarantee and you could
run into trouble with other implementations.
Post by Israel Brewster
lambda: datetime.min
So presumably *not* thread safe.
My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.
Since you're going to immediately replace the default value with an
actual value, it's not clear to me what the purpose of using a
defaultdict is here. You could use a regular dict and just check if
the key is present, perhaps with the additional argument to .get() to
return a default value.

Individual lookups and updates of ordinary dicts are atomic (at least
in CPython). A lookup followed by an update is not, and this would be
true for defaultdict as well.
Post by Israel Brewster
1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from hard storage.
The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.
1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
No, a thread-safety issue would be something like this:

account[user] = account[user] + 1

where the value of account[user] could potentially change between the
time it is looked up and the time it is set again. That said it's not
obvious to me what your problem actually is.
Israel Brewster
2017-11-01 18:53:20 UTC
Permalink
Post by Ian Kelly
Post by Israel Brewster
A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe,
I would not rely on this. It might be true for current versions of
CPython, but I don't think there's any general guarantee and you could
run into trouble with other implementations.
Right, completely agreed. Kinda feels "dirty" to rely on things like this to me.
Post by Ian Kelly
Post by Israel Brewster
[...]
[...] You could use a regular dict and just check if
the key is present, perhaps with the additional argument to .get() to
return a default value.
True. Using defaultdict is simply saves having to stick the same default in every call to get(). DRY principal and all. That said, see below - I don't think the defaultdict is the issue.
Post by Ian Kelly
Individual lookups and updates of ordinary dicts are atomic (at least
in CPython). A lookup followed by an update is not, and this would be
true for defaultdict as well.
Post by Israel Brewster
[...]
1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
account[user] = account[user] + 1
where the value of account[user] could potentially change between the
time it is looked up and the time it is set again.
That's what I thought - changing values/different values from expected, not missing values.

All that said, I just had a bit of an epiphany: the main thread is actually a Flask app, running through UWSGI with multiple *processes*, and using the flask-uwsgi-websocket plugin, which further uses greenlets. So what I was thinking was simply a separate thread was, in reality, a completely separate *process*. I'm sure that makes a difference. So what's actually happening here is the following:

1) the main python process starts, which initializes the dictionary (since it is at a global level)
2) uwsgi launches off a bunch of child worker processes (10 to be exact, each of which is set up with 10 gevent threads)
3a) a client connects (web socket connection to be exact). This connection is handled by an arbitrary worker, and an arbitrary green thread within that worker, based on UWSGI algorithms.
3b) This connection triggers launching of a *true* thread (using the python threading library) which, presumably, is now a child thread of that arbitrary uwsgi worker. <== BAD THING, I would think
4) The client makes a request for the list, which is handled by a DIFFERENT (presumably) arbitrary worker process and green thread.

So the end result is that the thread that "updates" the dictionary, and the thread that initially *populates* the dictionary are actually running in different processes. In fact, any given request could be in yet another process, which would seem to indicate that all bets are off as to what data is seen.

Now that I've thought through what is really happening, I think I need to re-architect things a bit here. For one thing, the update thread should be launched from the main process, not an arbitrary UWSGI worker. I had launched it from the client connection because there is no point in having it running if there is no one connected, but I may need to launch it from the __init__.py file instead. For another thing, since this dictionary will need to be accessed from arbitrary worker processes, I'm thinking I may need to move it to some sort of external storage, such as a redis database. Oy, I made my life complicated :-)
Post by Ian Kelly
That said it's not
obvious to me what your problem actually is.
--
https://mail.python.org/mailman/listinfo/python-list
Steve D'Aprano
2017-11-02 00:53:57 UTC
Permalink
On Thu, 2 Nov 2017 05:53 am, Israel Brewster wrote:

[...]
Post by Israel Brewster
So the end result is that the thread that "updates" the dictionary, and the
thread that initially *populates* the dictionary are actually running in
different processes.
If they are in different processes, that would explain why the second
(non)thread sees an empty dict even after the first thread has populated it:


# from your previous post
Post by Israel Brewster
Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788
Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399
Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853
You cannot rely on IDs being unique across different processes. Its an
unfortunately coincidence(!) that they end up with the same ID.

Or possibly there's some sort of weird side-effect or bug in Flask that, when
it shares the dict between two processes (how?) it clears the dict.

Or... have you considered the simplest option, that your update thread clears
the dict when it is first called? Since you haven't shared your code with us,
I cannot rule out a simple logic error like this:

def launch_update_thread(dict):
dict.clear()
# code to start update thread
Post by Israel Brewster
In fact, any given request could be in yet another
process, which would seem to indicate that all bets are off as to what data
is seen.
Now that I've thought through what is really happening, I think I need to
re-architect things a bit here.
Indeed. I've been wondering why you are using threads at all, since there
doesn't seem to be any benefit to initialising the dict and updating it in
different thread. Now I learn that your architecture is even more complex. I
guess some of that is unavailable, due to it being a web app, but still.
Post by Israel Brewster
For one thing, the update thread should be
launched from the main process, not an arbitrary UWSGI worker. I had
launched it from the client connection because there is no point in having
it running if there is no one connected, but I may need to launch it from
the __init__.py file instead. For another thing, since this dictionary will
need to be accessed from arbitrary worker processes, I'm thinking I may need
to move it to some sort of external storage, such as a redis database
That sounds awful. What if the arbitrary worker decides to remove a bunch of
planes from your simulation, or insert them? There should be one and only one
way to insert or remove planes from the simulation (I **really** hope it is a
simulation).

Surely the right solution is to have the worker process request whatever
information it needs, like "the next plane", and have the main process
provide the data. Having worker processes have the ability to reach deep into
the data structures used by the main program and mess with them seems like a
good way to have mind-boggling bugs.
Post by Israel Brewster
Oy, I made my life complicated :-)
"Some people, when confronted with a problem, think, 'I know, I'll use
threads. Nothew y htwo probave lems."

:-)
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Israel Brewster
2017-11-02 16:27:41 UTC
Permalink
Post by Steve D'Aprano
[...]
Post by Israel Brewster
So the end result is that the thread that "updates" the dictionary, and the
thread that initially *populates* the dictionary are actually running in
different processes.
If they are in different processes, that would explain why the second
# from your previous post
Post by Israel Brewster
Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788
Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399
Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853
You cannot rely on IDs being unique across different processes. Its an
unfortunately coincidence(!) that they end up with the same ID.
I think it's more than a coincidence, given that it is 100% reproducible. Plus, in an earlier debug test I was calling print() on the defaultdict object, which gives output like "<defaultdict object at 0x1066467f0>", where presumably the 0x1066467f0 is a memory address (correct me if I am wrong in that). In every case, that address was the same. So still a bit puzzling.
Post by Steve D'Aprano
Or possibly there's some sort of weird side-effect or bug in Flask that, when
it shares the dict between two processes (how?) it clears the dict.
Well, it's UWSGI that is creating the processes, not Flask, but that's semantics :-) The real question though is "how does python handle such situations?" because, really, there would be no difference (I wouldn't think) between what is happening here and what is happening if you were to create a new process using the multiprocessing library and reference a variable created outside that process.

In fact, I may have to try exactly that, just to see what happens.
Post by Steve D'Aprano
Or... have you considered the simplest option, that your update thread clears
the dict when it is first called? Since you haven't shared your code with us,
dict.clear()
# code to start update thread
Actually, I did share my code. It's towards the end of my original message. I cut stuff out for readability/length, but nothing having to do with the dictionary in question. So no, clear is never called, nor any other operation that could clear the dict.
Post by Steve D'Aprano
Post by Israel Brewster
In fact, any given request could be in yet another
process, which would seem to indicate that all bets are off as to what data
is seen.
Now that I've thought through what is really happening, I think I need to
re-architect things a bit here.
Indeed. I've been wondering why you are using threads at all, since there
doesn't seem to be any benefit to initialising the dict and updating it in
different thread. Now I learn that your architecture is even more complex. I
guess some of that is unavailable, due to it being a web app, but still.
What it boils down to is this: I need to update this dictionary in real time as data flows in. Having that update take place in a separate thread enables this update to happen without interfering with the operation of the web app, and offloads the responsibility for deciding when to switch to the OS. There *are* other ways to do this, such as using gevent greenlets or asyncio, but simply spinning off a separate thread is the easiest/simplest option, and since it is a long-running thread the overhead of spinning off the thread (as opposed to a gevent style interlacing) is of no consequence.

As far as the initialization, that happens in response to a user request, at which point I am querying the data anyway (since the user asked for it). The idea is I already have the data, since the user asked for it, why not save it in this dict rather than waiting to update the dict until new data comes in? I could, of course, do a separate request for the data in the same thread that updates the dict, but there doesn't seem to be any purpose in that, since until someone requests the data, I don't need it for anything.
Post by Steve D'Aprano
Post by Israel Brewster
For one thing, the update thread should be
launched from the main process, not an arbitrary UWSGI worker. I had
launched it from the client connection because there is no point in having
it running if there is no one connected, but I may need to launch it from
the __init__.py file instead. For another thing, since this dictionary will
need to be accessed from arbitrary worker processes, I'm thinking I may need
to move it to some sort of external storage, such as a redis database
That sounds awful. What if the arbitrary worker decides to remove a bunch of
planes from your simulation, or insert them? There should be one and only one
way to insert or remove planes from the simulation (I **really** hope it is a
simulation).
UWSGI uses worker processes to respond to requests from web clients. What can and can't be done from a web interface is, of course, completely up to me as the developer, and may well be modifying basic data structures. HOW the requests are handled, however, is completely up to UWSGI.
Post by Steve D'Aprano
Surely the right solution is to have the worker process request whatever
information it needs, like "the next plane", and have the main process
provide the data. Having worker processes have the ability to reach deep into
the data structures used by the main program and mess with them seems like a
good way to have mind-boggling bugs.
Except the worker processes *are* the main program. That's how UWSGI works - it launches a number of worker processes to handle incoming web requests. It's not like I have a main process that is doing something, and *additionally* a bunch of worker processes. While I'm sure UWSGI does have a "master" process it uses to control the workers, that's all an internal implementation detail of UWSGI, not something I deal with directly. I just have the flask code, which doesn't deal with or know about separate processes at all. The only exception is the one *thread* I launch (not process, thread) to handle the background updating.
Post by Steve D'Aprano
Post by Israel Brewster
Oy, I made my life complicated :-)
"Some people, when confronted with a problem, think, 'I know, I'll use
threads. Nothew y htwo probave lems."
:-)
Actually, that saying is about regular expressions, not threads :-) . In the end, threads are as good a way as handling concurrency as any other, and simpler than many. They have their drawbacks, of course, mainly in the area of overhead, and of course only multiprocessing can *really* take advantage of multiple cores/CPU's on a machine, but unlike regular expressions, threads aren't ugly or complicated. Only the details of dealing with concurrency make things complicated, and you'll have to deal with that in *any* concurrency model.
Post by Steve D'Aprano
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
--
https://mail.python.org/mailman/listinfo/python-list
Chris Angelico
2017-11-02 20:24:55 UTC
Permalink
Post by Israel Brewster
Actually, that saying is about regular expressions, not threads :-) . In the end, threads are as good a way as handling concurrency as any other, and simpler than many. They have their drawbacks, of course, mainly in the area of overhead, and of course only multiprocessing can *really* take advantage of multiple cores/CPU's on a machine, but unlike regular expressions, threads aren't ugly or complicated. Only the details of dealing with concurrency make things complicated, and you'll have to deal with that in *any* concurrency model.
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
that threads aren't magically more dangerous than other options. They
have a few limitations (for instance, you can't viably have more than
a few thousand threads in a process, but you could easily have orders
of magnitude more open sockets managed by asyncio), but for many
situations, they're the perfect representation of program logic.
They're also easy to explain, and then other concurrency models can be
explained in terms of threads (eg async functions are like threads but
they only switch from thread to thread at an 'await' point).

ChrisA
Steve D'Aprano
2017-11-03 00:58:16 UTC
Permalink
Post by Chris Angelico
Post by Israel Brewster
Actually, that saying is about regular expressions, not threads :-) . In
the end, threads are as good a way as handling concurrency as any other,
and simpler than many. They have their drawbacks, of course, mainly in the
area of overhead, and of course only multiprocessing can *really* take
advantage of multiple cores/CPU's on a machine, but unlike regular
expressions, threads aren't ugly or complicated. Only the details of
dealing with concurrency make things complicated, and you'll have to deal
with that in *any* concurrency model.
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
Of course I grok that all concurrency has costs. Apart from comparatively rare
cases of "embarrassingly parallel" algorithms, any form of concurrent or
parallel processing is significantly harder than sequential code.
Post by Chris Angelico
that threads aren't magically more dangerous than other options.
There's nothing magical about it.

Threads are very much UNMAGICALLY more dangerous than other options because
they combine:

- shared data; and

- non-deterministic task switching.

Having both together is clearly more dangerous than only one or the other:

- async: shared data, but fully deterministic task switching;

- multiprocessing: non-deterministic task switching, but by default
fully isolated data.
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Rustom Mody
2017-11-03 03:19:59 UTC
Permalink
Post by Steve D'Aprano
Post by Chris Angelico
Post by Israel Brewster
Actually, that saying is about regular expressions, not threads :-) . In
the end, threads are as good a way as handling concurrency as any other,
and simpler than many. They have their drawbacks, of course, mainly in the
area of overhead, and of course only multiprocessing can *really* take
advantage of multiple cores/CPU's on a machine, but unlike regular
expressions, threads aren't ugly or complicated. Only the details of
dealing with concurrency make things complicated, and you'll have to deal
with that in *any* concurrency model.
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
Of course I grok that all concurrency has costs. Apart from comparatively rare
cases of "embarrassingly parallel" algorithms, any form of concurrent or
parallel processing is significantly harder than sequential code.
Post by Chris Angelico
that threads aren't magically more dangerous than other options.
There's nothing magical about it.
Threads are very much UNMAGICALLY more dangerous than other options because
- shared data; and
- non-deterministic task switching.
… which is to say «bad mix of imperative programming and concurrency»



«The world is concurrent» [Joe Armstrong creator of Erlang]

If you get up from your computer just now for a coffee, it does not mean I have
to at the same time. More pertinently, it would be rather wasteful if the
billion+ transistors of an i7 waited for each other rather than switching independently.

The problem is that von Neumann preferred to simplify the programming task along
the lines nowadays called "imperative programming"… after whom we get the
terms "von Neumann model", "von Neumann machine" etc

IOW threads are a particularly extreme example of the deleterious effects
of stuffing the world into the mold of someone's (von Neumann's) brain.

ie shared data + task switching = combinatorially explosive results

Take your own statement «any form of concurrent or parallel processing is
significantly harder than sequential code»

and apply it to the abc of imperative programming:

Problem: Interchange values of variables x and y

Layman answer:
x = y
y = x

[Ignore for a moment that python has an answer that is almost identical to the
above and is correct: x,y = y,x]

"Correct" answer:
temp = x
x = y
y = temp

Correct? Really???
Or is that being trained to "think like a programmer" means learning to
convolute our brains into an arbitrary and unnecessary sequentiality?
Stefan Ram
2017-11-03 03:32:53 UTC
Permalink
Post by Rustom Mody
Problem: Interchange values of variables x and y
x = y
y = x
[Ignore for a moment that python has an answer that is almost identical to the
above and is correct: x,y = y,x]
temp = x
x = y
y = temp
Correct? Really???
Or is that being trained to "think like a programmer" means learning to
convolute our brains into an arbitrary and unnecessary sequentiality?
I give you two buckets. One is filled with water and one
with orange juice. How do you exchange the contents - in
the real world?

Here is an excerpt from a text from Edward E. Lee:

A part of the Ptolemy Project experiment was to see
whether effective software engineering practices could be
developed for an academic research setting. We developed a
process that included a code maturity rating system (with
four levels, red, yellow, green, and blue), design
reviews, code reviews, nightly builds, regression tests,
and automated code coverage metrics [43]. The portion of
the kernel that ensured a consistent view of the program
structure was written in early 2000, design reviewed to
yellow, and code reviewed to green. The reviewers included
concurrency experts, not just inexperienced graduate
students (Christopher Hylands (now Brooks), Bart Kienhuis,
John Reekie, and myself were all reviewers). We wrote
regression tests that achieved 100 percent code coverage.
The nightly build and regression tests ran on a two
processor SMP machine, which exhibited different thread
behavior than the development machines, which all had a
single processor. The Ptolemy II system itself began to be
widely used, and every use of the system exercised this
code. No problems were observed until the code deadlocked
on April 26, 2004, four years later.

``
Steve D'Aprano
2017-11-03 04:35:45 UTC
Permalink
Post by Stefan Ram
A part of the Ptolemy Project experiment was to see
whether effective software engineering practices could be
developed for an academic research setting.
[...]
Post by Stefan Ram
No problems were observed until the code deadlocked
on April 26, 2004, four years later.
That is a fantastic anecdote, thank you.
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Steve D'Aprano
2017-11-03 04:33:33 UTC
Permalink
Post by Rustom Mody
«The world is concurrent» [Joe Armstrong creator of Erlang]
And the world is extremely complex, complicated and hard to understand.

The point of programming is to simplify the world, not emulate it in its full
complexity.
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Rhodri James
2017-11-03 11:26:17 UTC
Permalink
Post by Chris Angelico
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
that threads aren't magically more dangerous than other options.
I'm with Steven. To be fair, the danger with threads is that most
people don't understand thread-safety, and in particular don't
understand either that they have a responsibility to ensure that shared
data access is done properly or what the cost of that is. I've seen far
too much thread-based code over the years that would have been markedly
less buggy and not much slower if it had been written sequentially.
--
Rhodri James *-* Kynesim Ltd
Chris Angelico
2017-11-03 14:50:13 UTC
Permalink
Post by Chris Angelico
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
that threads aren't magically more dangerous than other options.
I'm with Steven. To be fair, the danger with threads is that most people
don't understand thread-safety, and in particular don't understand either
that they have a responsibility to ensure that shared data access is done
properly or what the cost of that is. I've seen far too much thread-based
code over the years that would have been markedly less buggy and not much
slower if it had been written sequentially.
Yes, but what you're seeing is that *concurrent* code is more
complicated than *sequential* code. Would the code in question have
been less buggy if it had used multiprocessing instead of
multithreading? What if it used explicit yield points? Multiprocessing
brings with it a whole lot of extra complications around moving data
around. Multithreading brings with it a whole lot of extra
complications around NOT moving data around. Yield points bring with
them the risk of locking another thread out unexpectedly (particularly
since certain system calls aren't async-friendly on certain OSes). All
three models have their pitfalls. It's not that threads are somehow
worse than every other model.

ChrisA
Steve D'Aprano
2017-11-03 15:45:14 UTC
Permalink
Post by Chris Angelico
I'm with Steven. To be fair, the danger with threads is that most people
don't understand thread-safety, and in particular don't understand either
that they have a responsibility to ensure that shared data access is done
properly or what the cost of that is. I've seen far too much thread-based
code over the years that would have been markedly less buggy and not much
slower if it had been written sequentially.
Yes, but what you're seeing is that *concurrent* code is more
complicated than *sequential* code. Would the code in question have
been less buggy if it had used multiprocessing instead of
multithreading?
Maybe.

There's no way to be sure unless you actually compare a threading
implementation with a processing implementation -- and they have to
be "equally good, for the style" implementations. No fair comparing the
multiprocessing equivalent of "Stooge Sort" with the threading equivalent
of "Quick Sort", and concluding that threading is better.

However, we can predict the likelihood of which will be less buggy by
reasoning in general principles. And the general principle is that shared
data tends, all else being equal, to lead to more bugs than no shared data.
The more data is shared, the more bugs, more or less.

I don't know if there are any hard scientific studies on this, but experience
and anecdote strongly suggests it is true. Programming is not yet fully
evidence-based.

For example, most of us accept "global variables considered harmful". With few
exceptions, the use of application-wide global variables to communicate
between functions is harmful and leads to problems. This isn't because of any
sort of mystical or magical malignity from global variables. It is because
the use of global variables adds coupling between otherwise distant parts of
the code, and that adds complexity, and the more complex code is, the more
likely we mere humans are to screw it up.

So, all else being equal, which is likely to have more bugs?


1. Multiprocessing code with very little coupling between processes; or

2. Threaded code with shared data and hence higher coupling between threads?


Obviously the *best* threaded code will have fewer bugs than the *worst*
multiprocessing code, but my heuristic is that, in general, the average
application using threading is likely to be more highly coupled, hence more
complicated, than the equivalent using multiprocessing.

(Async is too new, and to me, too confusing, for me to have an opinion on yet,
but I lean slightly towards the position that deterministic task-switching is
probably better than non-deterministic.)
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Chris Angelico
2017-11-03 16:18:11 UTC
Permalink
On Sat, Nov 4, 2017 at 2:45 AM, Steve D'Aprano
Post by Steve D'Aprano
So, all else being equal, which is likely to have more bugs?
1. Multiprocessing code with very little coupling between processes; or
2. Threaded code with shared data and hence higher coupling between threads?
Obviously, option 1. But that's "all else being equal". How often can
you actually have your processes that decoupled? And if you can write
your code to be completely (or largely) decoupled, what's to stop you
having your *threads* equally decoupled? You're assuming that "running
in the same memoryspace" equates to "higher coupling", which is no
more proven than any other assertion. Ultimately, real-world code IS
going to have some measure of coupling (you could concoct a scenario
in which requests are handled 100% independent of each other, but even
with a web application, there's going to be SOME connection between
different requests), so all you do is move it around (eg in a web app
scenario, the most common solution is to do all coupling through a
database or equivalent).

ChrisA
Grant Edwards
2017-11-03 16:28:03 UTC
Permalink
Post by Chris Angelico
On Sat, Nov 4, 2017 at 2:45 AM, Steve D'Aprano
Post by Steve D'Aprano
So, all else being equal, which is likely to have more bugs?
1. Multiprocessing code with very little coupling between processes; or
2. Threaded code with shared data and hence higher coupling between threads?
Obviously, option 1. But that's "all else being equal". How often can
you actually have your processes that decoupled? And if you can write
your code to be completely (or largely) decoupled, what's to stop you
having your *threads* equally decoupled? You're assuming that "running
in the same memoryspace" equates to "higher coupling", which is no
more proven than any other assertion.
The big difference is that with threaded code you can have accidental
coupling. With multiprocessing, code you have to explicitly work to
create coupling.

That said, I do a lot of threading coding (in both Python and C), and
have never found it particularly tricky.

It does require that you understand what you're doing and probably
doesn't work well if you're a stack-overflow, cargo-cult,
cut-and-paste programmer. But then again, what does?
--
Grant Edwards grant.b.edwards Yow! Maybe I should have
at asked for my Neutron Bomb
gmail.com in PAISLEY --
Dennis Lee Bieber
2017-11-03 18:12:10 UTC
Permalink
On Sat, 04 Nov 2017 02:45:14 +1100, Steve D'Aprano
Post by Steve D'Aprano
So, all else being equal, which is likely to have more bugs?
1. Multiprocessing code with very little coupling between processes; or
2. Threaded code with shared data and hence higher coupling between threads?
3) Threaded code created with a discipline to not rely upon shared data?
Post by Steve D'Aprano
(Async is too new, and to me, too confusing, for me to have an opinion on yet,
but I lean slightly towards the position that deterministic task-switching is
probably better than non-deterministic.)
Every time I glance at async and kin (I'm looking at you, coroutine), I
get totally lost. It's like: not only do I have to be concerned about
shared data, but I also have to be concerned about scheduling the tasks to
provide responsiveness.

Note that I'm not discussing the difference between a preemptive vs
non-preemptive threading model. When coroutines were discussed in my
classes (in that long ago era of late 1970s) not only did a "task" have to
explicitly release its control, but it also had to explicitly identify
which "task" (and possibly where in the task) control was given. A
non-preemptive model only requires a task to explicitly give up control (by
invoking some blocking operation: I/O, sleep/suspend, similar) and let the
runtime determine the next task to gain control. Preemptive, of course,
allows for outside events (interrupts -- including timers, or for a
byte-code interpreted language, an instruction counter) to transfer
control.

Hmmm -- is there a way to turn of Python's runtime thread swapper so
that it only activates on blocking operations, and not preemptively on
time/instructions?
--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com HTTP://wlfraed.home.netcom.com/
Rhodri James
2017-11-03 15:11:11 UTC
Permalink
Post by Chris Angelico
Post by Chris Angelico
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
that threads aren't magically more dangerous than other options.
I'm with Steven. To be fair, the danger with threads is that most people
don't understand thread-safety, and in particular don't understand either
that they have a responsibility to ensure that shared data access is done
properly or what the cost of that is. I've seen far too much thread-based
code over the years that would have been markedly less buggy and not much
slower if it had been written sequentially.
Yes, but what you're seeing is that *concurrent* code is more
complicated than *sequential* code. Would the code in question have
been less buggy if it had used multiprocessing instead of
multithreading? What if it used explicit yield points?
My experience with situations where I can do a reasonable comparison is
limited, but the answer appears to be "Yes".
Multiprocessing
Post by Chris Angelico
brings with it a whole lot of extra complications around moving data
around.
People generally understand how to move data around, and the mistakes
are usually pretty obvious when they happen. People may not understand
how to move data around efficiently, but that's a separate argument.

Multithreading brings with it a whole lot of extra
Post by Chris Angelico
complications around NOT moving data around.
I think this involves more subtle bugs that are harder to spot. People
seem to find it harder to reason about atomicity and realising that
widely separated pieces of code may interact unexpectedly.

Yield points bring with
Post by Chris Angelico
them the risk of locking another thread out unexpectedly (particularly
since certain system calls aren't async-friendly on certain OSes).
I've got to admit I find coroutines straightforward, but I did cut my
teeth on a cooperative OS. It certainly makes the atomicity issues
easier to deal with.

All
Post by Chris Angelico
three models have their pitfalls.
Assuredly. I just think threads are soggier and hard to light^W^W^W^W^W
prone to subtler and more mysterious-looking bugs.
--
Rhodri James *-* Kynesim Ltd
Ian Kelly
2017-11-03 16:39:36 UTC
Permalink
Post by Israel Brewster
Post by Steve D'Aprano
[...]
Post by Israel Brewster
So the end result is that the thread that "updates" the dictionary, and the
thread that initially *populates* the dictionary are actually running in
different processes.
If they are in different processes, that would explain why the second
# from your previous post
Post by Israel Brewster
Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788
Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399
Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853
You cannot rely on IDs being unique across different processes. Its an
unfortunately coincidence(!) that they end up with the same ID.
I think it's more than a coincidence, given that it is 100% reproducible. Plus, in an earlier debug test I was calling print() on the defaultdict object, which gives output like "<defaultdict object at 0x1066467f0>", where presumably the 0x1066467f0 is a memory address (correct me if I am wrong in that). In every case, that address was the same. So still a bit puzzling.
If the empty dict is created before the process is forked then I don't
think it's all that surprising.
Israel Brewster
2017-11-03 18:12:39 UTC
Permalink
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Post by Chris Angelico
Post by Chris Angelico
Thank you. I've had this argument with many people, smart people (like
Steven), people who haven't grokked that all concurrency has costs -
that threads aren't magically more dangerous than other options.
I'm with Steven. To be fair, the danger with threads is that most people
don't understand thread-safety, and in particular don't understand either
that they have a responsibility to ensure that shared data access is done
properly or what the cost of that is. I've seen far too much thread-based
code over the years that would have been markedly less buggy and not much
slower if it had been written sequentially.
Yes, but what you're seeing is that *concurrent* code is more
complicated than *sequential* code. Would the code in question have
been less buggy if it had used multiprocessing instead of
multithreading? What if it used explicit yield points?
My experience with situations where I can do a reasonable comparison is limited, but the answer appears to be "Yes".
Multiprocessing
Post by Chris Angelico
brings with it a whole lot of extra complications around moving data
around.
People generally understand how to move data around, and the mistakes are usually pretty obvious when they happen.
I think the existence of this thread indicates otherwise :-) This mistake was far from obvious, and clearly I didn't understand properly how to move data around *between processes*. Unless you are just saying I am ignorant or something? :-)
People may not understand how to move data around efficiently, but that's a separate argument.
Multithreading brings with it a whole lot of extra
Post by Chris Angelico
complications around NOT moving data around.
I think this involves more subtle bugs that are harder to spot.
Again, the existence of this thread indicates otherwise. This bug was quite subtile and hard to spot. It was only when I started looking at how many times a given piece of code was called (specifically, the part that handled data coming in for which there wasn't an entry in the dictionary) that I spotted the problem. If I hadn't had logging in place in that code block, I would have never realized it wasn't working as intended. You don't get much more subtile than that. And, furthermore, it only existed because I *wasn't* using threads. This bug simply doesn't exist in a threaded model, only in a multiprocessing model. Yes, the *explanation* of the bug is simple enough - each process "sees" a different value, since memory isn't shared - but the bug in my code was neither obvious or easy to spot, at least until you knew what was happening.
People seem to find it harder to reason about atomicity and realising that widely separated pieces of code may interact unexpectedly.
Yield points bring with
Post by Chris Angelico
them the risk of locking another thread out unexpectedly (particularly
since certain system calls aren't async-friendly on certain OSes).
I've got to admit I find coroutines straightforward, but I did cut my teeth on a cooperative OS. It certainly makes the atomicity issues easier to deal with.
I still can't claim to understand them. Threads? No problem. Obviously I'm still lacking some understanding of how data works in the multiprocessing model, however.
All
Post by Chris Angelico
three models have their pitfalls.
Assuredly. I just think threads are soggier and hard to light^W^W^W^W^W prone to subtler and more mysterious-looking bugs.
And yet, this thread exists because of a subtle and mysterious-looking bug with multiple *processes* that doesn't exist with multiple *threads*. Thus the point - threads are no *worse* - just different - than any other concurrency model.
--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>
Steve D'Aprano
2017-11-04 01:04:22 UTC
Permalink
On Sat, 4 Nov 2017 05:12 am, Israel Brewster wrote:

[...]
Post by Israel Brewster
People generally understand how to move data around, and the mistakes are
usually pretty obvious when they happen.
I think the existence of this thread indicates otherwise :-) This mistake
was far from obvious, and clearly I didn't understand properly how to move
data around *between processes*. Unless you are just saying I am ignorant or
something? :-)
Yes, you were ignorant -- you didn't even realise that you were using
processes, you thought you were using threaded code when it was actually
multiprocessing code. No wonder you got it wrong.

Of course you have a good excuse: the multiprocessing is hidden deep inside
not just the library you were using, but the library *it* was using.

(I don't know how obvious the documentation of the libraries make this --
maybe they're to blame, for not being clear enough -- or maybe you were
simply ignorant about the tools you were using.)

You can't judge multiprocessing code on the basis of bugs caused by assuming
that it was threading code, writing in a threading style with shared data. If
you misuse your tools, that's not the tool's fault.

If anything, we can say that the ultimate error was that you decided to write
in a threaded style without actually using threads: the error was your
(dangerous?) choice to write non-deterministic code using shared data.
--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
Rhodri James
2017-11-03 19:28:46 UTC
Permalink
Post by Israel Brewster
People generally understand how to move data around, and the mistakes are usually pretty obvious when they happen.
I think the existence of this thread indicates otherwise :-) This mistake was far from obvious, and clearly I didn't understand properly how to move data around *between processes*. Unless you are just saying I am ignorant or something? :-)
Ah, but from the point of view of this argument, you didn't make a
mistake, you made a meta-mistake. It wasn't that you didn't understand
how to move data around between processes, it was that you didn't think
you were moving between processes! Whether or not you do understand
remains to be seen :-)
--
Rhodri James *-* Kynesim Ltd
Gene Heskett
2017-11-03 16:52:00 UTC
Permalink
Post by Chris Angelico
Post by Rhodri James
Post by Chris Angelico
Thank you. I've had this argument with many people, smart people
(like Steven), people who haven't grokked that all concurrency has
costs - that threads aren't magically more dangerous than other
options.
I'm with Steven. To be fair, the danger with threads is that most
people don't understand thread-safety, and in particular don't
understand either that they have a responsibility to ensure that
shared data access is done properly or what the cost of that is.
I've seen far too much thread-based code over the years that would
have been markedly less buggy and not much slower if it had been
written sequentially.
Yes, but what you're seeing is that *concurrent* code is more
complicated than *sequential* code. Would the code in question have
been less buggy if it had used multiprocessing instead of
multithreading? What if it used explicit yield points? Multiprocessing
brings with it a whole lot of extra complications around moving data
around. Multithreading brings with it a whole lot of extra
complications around NOT moving data around. Yield points bring with
them the risk of locking another thread out unexpectedly (particularly
since certain system calls aren't async-friendly on certain OSes). All
three models have their pitfalls. It's not that threads are somehow
worse than every other model.
ChrisA
I think that this discussion of threads and threading must be a different
context than threading as I am using it in linuxcnc.

There, one assigns a function to run in a certain sequence in an assigned
thread, which there can be several of. There, each thread is assigned a
repetition rate, and the higher repetition rate stuff can always
interrupt the slower threaded function in order to get the real time
stuff done in a usable for the job time frame, and the high frequency
thread can be as fast as every 25 microseconds on a good motherboard.
Surprisingly, this seems to be far more board dependent than processor
dependent, altho its pushing an AMD processor quite hard at 40
microseconds, while the slower intel atoms can do 25 microseconds with
about the same effort.

This is where a stepper motor is being stepped by software which didles
pins on a parport. And it limits how fast you can move the motor
compared to using an fpga card running at 200 MHz, not because of the
step rate, but because of the latency of the mobo/cpu combination.

Jitter in step rate issuance is death to high performance with stepper
motors because torque to do usefull work vanishes when the instant speed
of the motor is wobbling with that timing jitter.

OTOH, hand controls of a machine using an encoder dial are nicely done in
a thread running at 100 Hz, far more reliably that I can do it from the
keyboard on a raspberry pi 3b. Why? The dials data goes into linuxcnc by
way of a hardware fpga card that talks to the pi over an SPI buss, with
the pi writing 32 bit packets at 41 megabaud, and reading the results at
25 megabaud. It doesn't have to get thru that usb2 internal hub in the
pi that all the other i/o has to go thru. Mouse and keyboard events get
dropped on the floor, particularly dangerous when its a keyup event that
gets dropped. The machine keeps moving until it crashes into something,
often breaking drive parts or cutting tooling, all of which cost real
money.

My point is that with an interpretor such as hal managing things, threads
Just Work(TM).

It does of course take a specially built kernel to do that magic.
I'll get me coat.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>
Loading...