Discussion:
Can you help me with this memoization simple example?
Add Reply
marc nicole
2024-03-31 00:09:43 UTC
Reply
Permalink
I am creating a memoization example with a function that adds up / averages
the elements of an array and compares it with the cached ones to retrieve
them in case they are already stored.

In addition, I want to store only if the result of the function differs
considerably (passes a threshold e.g. 500000 below).

I created an example using a decorator to do so, the results using the
decorator is slightly faster than without the memoization which is OK, but
is the logic of the decorator correct ? anybody can tell me ?

My code is attached below:



import time


def memoize(f):
cache = {}

def g(*args):
if args[1] == "avg":
sum_key_arr = sum(list(args[0])) / len(list(args[0]))
elif args[1] == "sum":
sum_key_arr = sum(list(args[0]))
if sum_key_arr not in cache:
for (
key,
value,
) in (
cache.items()
): # key in dict cannot be an array so I use the sum of the
array as the key
if (
abs(sum_key_arr - key) <= 500000
): # threshold is great here so that all values are
approximated!
# print('approximated')
return cache[key]
else:
# print('not approximated')
cache[sum_key_arr] = f(args[0], args[1])
return cache[sum_key_arr]

return g


@memoize
def aggregate(dict_list_arr, operation):
if operation == "avg":
return sum(list(dict_list_arr)) / len(list(dict_list_arr))
if operation == "sum":
return sum(list(dict_list_arr))
return None


t = time.time()
for i in range(200, 15000):
res = aggregate(list(range(i)), "avg")

elapsed = time.time() - t
print(res)
print(elapsed)
MRAB
2024-03-31 00:39:09 UTC
Reply
Permalink
Post by marc nicole
I am creating a memoization example with a function that adds up / averages
the elements of an array and compares it with the cached ones to retrieve
them in case they are already stored.
In addition, I want to store only if the result of the function differs
considerably (passes a threshold e.g. 500000 below).
I created an example using a decorator to do so, the results using the
decorator is slightly faster than without the memoization which is OK, but
is the logic of the decorator correct ? anybody can tell me ?
import time
cache = {}
sum_key_arr = sum(list(args[0])) / len(list(args[0]))
'list' will iterate over args[0] to make a list, and 'sum' will iterate
over that list.

It would be simpler to just let 'sum' iterate over args[0].
Post by marc nicole
sum_key_arr = sum(list(args[0]))
for (
key,
value,
) in (
cache.items()
): # key in dict cannot be an array so I use the sum of the
array as the key
You can't use a list as a key, but you can use a tuple as a key,
provided that the elements of the tuple are also immutable.
Post by marc nicole
if (
abs(sum_key_arr - key) <= 500000
): # threshold is great here so that all values are
approximated!
# print('approximated')
return cache[key]
# print('not approximated')
cache[sum_key_arr] = f(args[0], args[1])
return cache[sum_key_arr]
return g
@memoize
return sum(list(dict_list_arr)) / len(list(dict_list_arr))
return sum(list(dict_list_arr))
return None
t = time.time()
res = aggregate(list(range(i)), "avg")
elapsed = time.time() - t
print(res)
print(elapsed)
marc nicole
2024-03-31 08:04:13 UTC
Reply
Permalink
Thanks for the first comment which I incorporated

but when you say "You can't use a list as a key, but you can use a tuple as
a key,
provided that the elements of the tuple are also immutable."

does it mean the result of sum of the array is not convenient to use as
key as I do?
Which tuple I should use to refer to the underlying list value as you
suggest?

Anything else is good in my code ?

Thanks
Post by marc nicole
Post by marc nicole
I am creating a memoization example with a function that adds up /
averages
Post by marc nicole
the elements of an array and compares it with the cached ones to retrieve
them in case they are already stored.
In addition, I want to store only if the result of the function differs
considerably (passes a threshold e.g. 500000 below).
I created an example using a decorator to do so, the results using the
decorator is slightly faster than without the memoization which is OK,
but
Post by marc nicole
is the logic of the decorator correct ? anybody can tell me ?
import time
cache = {}
sum_key_arr = sum(list(args[0])) / len(list(args[0]))
'list' will iterate over args[0] to make a list, and 'sum' will iterate
over that list.
It would be simpler to just let 'sum' iterate over args[0].
Post by marc nicole
sum_key_arr = sum(list(args[0]))
for (
key,
value,
) in (
cache.items()
): # key in dict cannot be an array so I use the sum of the
array as the key
You can't use a list as a key, but you can use a tuple as a key,
provided that the elements of the tuple are also immutable.
Post by marc nicole
if (
abs(sum_key_arr - key) <= 500000
): # threshold is great here so that all values are
approximated!
# print('approximated')
return cache[key]
# print('not approximated')
cache[sum_key_arr] = f(args[0], args[1])
return cache[sum_key_arr]
return g
@memoize
return sum(list(dict_list_arr)) / len(list(dict_list_arr))
return sum(list(dict_list_arr))
return None
t = time.time()
res = aggregate(list(range(i)), "avg")
elapsed = time.time() - t
print(res)
print(elapsed)
--
https://mail.python.org/mailman/listinfo/python-list
MRAB
2024-03-31 19:23:50 UTC
Reply
Permalink
Post by marc nicole
Thanks for the first comment which I incorporated
but when you say "You can't use a list as a key, but you can use a
tuple as a key,
provided that the elements of the tuple are also immutable."
does it mean  the result of sum of the array is not convenient to use
as key as I do?
Which tuple I should use to refer to the underlying list value as you
suggest?
I was suggesting using `tuple` on the argument:

def memoize(f):
     cache = {}

     def g(*args):
         key = tuple(args[0]), args[1]

         if key not in cache:
             cache[key] = f(args[0], args[1])

         return cache[key]

     return g
Post by marc nicole
Anything else is good in my code ?
Thanks
Le dim. 31 mars 2024 à 01:44, MRAB via Python-list
Post by marc nicole
I am creating a memoization example with a function that adds up
/ averages
Post by marc nicole
the elements of an array and compares it with the cached ones to
retrieve
Post by marc nicole
them in case they are already stored.
In addition, I want to store only if the result of the function
differs
Post by marc nicole
considerably (passes a threshold e.g. 500000 below).
I created an example using a decorator to do so, the results
using the
Post by marc nicole
decorator is slightly faster than without the memoization which
is OK, but
Post by marc nicole
is the logic of the decorator correct ? anybody can tell me ?
import time
      cache = {}
              sum_key_arr = sum(list(args[0])) / len(list(args[0]))
'list' will iterate over args[0] to make a list, and 'sum' will
iterate
over that list.
It would be simpler to just let 'sum' iterate over args[0].
Post by marc nicole
              sum_key_arr = sum(list(args[0]))
              for (
                  key,
                  value,
              ) in (
                  cache.items()
              ):  # key in dict cannot be an array so I use the
sum of the
Post by marc nicole
array as the key
You can't use a list as a key, but you can use a tuple as a key,
provided that the elements of the tuple are also immutable.
Post by marc nicole
                  if (
                      abs(sum_key_arr - key) <= 500000
                  ):  # threshold is great here so that all
values are
Post by marc nicole
approximated!
                      # print('approximated')
                      return cache[key]
                  # print('not approximated')
                  cache[sum_key_arr] = f(args[0], args[1])
          return cache[sum_key_arr]
      return g
@memoize
          return sum(list(dict_list_arr)) / len(list(dict_list_arr))
          return sum(list(dict_list_arr))
      return None
t = time.time()
      res = aggregate(list(range(i)), "avg")
elapsed = time.time() - t
print(res)
print(elapsed)
--
https://mail.python.org/mailman/listinfo/python-list
a***@gmail.com
2024-03-31 20:56:19 UTC
Reply
Permalink
I am not sure if it was made clear that there is a general rule in python for what is HASHABLE and lists are changeable while tuples are not so the latter can be hashed as a simple copy of a list, albeit the contents must also be immutable.

The memorize function uses a dictionary to store things and thus the things are hashed to decide how to store it in the inner representation of a dictionary and anything new that you want to look up in the dictionary has similar considerations as it is hashed to see where in the dictionary to look for it.

Of course, if you add enough overhead and the memorize function you make gets relatively few requests that are identical, it may not be worthwhile.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=***@python.org> On Behalf Of MRAB via Python-list
Sent: Sunday, March 31, 2024 3:24 PM
To: python-***@python.org
Subject: Re: Can you help me with this memoization simple example?
Post by marc nicole
Thanks for the first comment which I incorporated
but when you say "You can't use a list as a key, but you can use a
tuple as a key,
provided that the elements of the tuple are also immutable."
does it mean the result of sum of the array is not convenient to use
as key as I do?
Which tuple I should use to refer to the underlying list value as you
suggest?
I was suggesting using `tuple` on the argument:

def memoize(f):
cache = {}

def g(*args):
key = tuple(args[0]), args[1]

if key not in cache:
cache[key] = f(args[0], args[1])

return cache[key]

return g
Post by marc nicole
Anything else is good in my code ?
Thanks
Le dim. 31 mars 2024 à 01:44, MRAB via Python-list
Post by marc nicole
I am creating a memoization example with a function that adds up
/ averages
Post by marc nicole
the elements of an array and compares it with the cached ones to
retrieve
Post by marc nicole
them in case they are already stored.
In addition, I want to store only if the result of the function
differs
Post by marc nicole
considerably (passes a threshold e.g. 500000 below).
I created an example using a decorator to do so, the results
using the
Post by marc nicole
decorator is slightly faster than without the memoization which
is OK, but
Post by marc nicole
is the logic of the decorator correct ? anybody can tell me ?
import time
cache = {}
sum_key_arr = sum(list(args[0])) / len(list(args[0]))
'list' will iterate over args[0] to make a list, and 'sum' will
iterate
over that list.
It would be simpler to just let 'sum' iterate over args[0].
Post by marc nicole
sum_key_arr = sum(list(args[0]))
for (
key,
value,
) in (
cache.items()
): # key in dict cannot be an array so I use the
sum of the
Post by marc nicole
array as the key
You can't use a list as a key, but you can use a tuple as a key,
provided that the elements of the tuple are also immutable.
Post by marc nicole
if (
abs(sum_key_arr - key) <= 500000
): # threshold is great here so that all
values are
Post by marc nicole
approximated!
# print('approximated')
return cache[key]
# print('not approximated')
cache[sum_key_arr] = f(args[0], args[1])
return cache[sum_key_arr]
return g
@memoize
return sum(list(dict_list_arr)) / len(list(dict_list_arr))
return sum(list(dict_list_arr))
return None
t = time.time()
res = aggregate(list(range(i)), "avg")
elapsed = time.time() - t
print(res)
print(elapsed)
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Loading...