Thread safety issue (I think) with defaultdict

Discussion:

(too old to reply)

Israel Brewster

2017-10-31 17:38:10 UTC

A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe, whereas more complicated constructs, such as lambdas, make it not thread safe. In my situation, I'm using a lambda, specifically:

lambda: datetime.min

So presumably *not* thread safe.

My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.

When the program first starts up, two things happen:

1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from hard storage.

The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.

So my questions are:

1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
2) How can I fix this?

Note: I really don't care if the "initial" update happens after the thread receives a data point or two, and therefore overwrites one or two values. I just need the dictionary to be fully populated at some point early in execution. In usage, the dictionary is used to see of an aircraft has been seen "recently", so if the most recent datapoint gets overwritten with a slightly older one from disk storage, that's fine - it's just if it's still showing datetime.min because we haven't gotten in any datapoint since we launched the program, even though we have "recent" data in disk storage thats a problem. So I don't care about the obvious race condition between the two operations, just that the end result is a populated dictionary. Note also that as datapoint come in, they are being written to disk, so the disk storage doesn't lag significantly anyway.

The framework of my code is below:

File: watcher.py

last_points = defaultdict(lambda:datetime.min)

# This function is launched as a thread using the threading module when the first client connects
def watch():
while true:
<wait for datapoint>
pointtime= <extract/parse timestamp from datapoint>
if last_points[<aircraft_identifier>] < pointtime:
<do stuff>
last_points[<aircraft_identifier>]=pointtime
#DEBUGGING
print("At update:", len(last_points))

File: main.py:

from .watcher import last_points

# This function will be triggered by a web call from a client, so could happen at any time
# Client will call this function immediately after connecting, as well as in response to various user actions.
def getac():
<load list of aircraft and times from disk>
<do stuff to send the list to the client>
for record in aclist:
last_points[<aircraft_identifier>]=record_timestamp
#DEBUGGING
print("At get AC:", len(last_points))

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Israel Brewster

2017-11-01 17:04:58 UTC