Coding with effect¶
Let’s walk through an example to illustrate my grievances with the effect
library. For starters, let’s say we are using effect
to interact with a
database. Reading values from and writing values to a database are certainly
operations that have side-effects, so we believe this to be a good candidate
use case for our new toy.
Aside
Apologies for this rather long example, I just wanted to walk through a sufficiently complex scenario as a matter of proving to myself that this library adds value.
For sake of example I will assume we are using a simple revision-based document
store (perhaps a wrapper on CouchDB). This document store has a simple
synchronous python API that consists of merely db.get(doc_id, rev=LATEST)
and db.put(doc_id, rev, doc)
. As this is a fictional API, rather than
giving a full spec, I will demonstrate how it works with a simple demo of
functionality:
>>> # Make a new db.
>>> db = DB()
>>> # Create an id for a doc we'll work with.
>>> my_id = uuid4()
>>> # Getting a doc that doesn't exist is an error:
>>> db.get(my_id)
DB Response<NOT_FOUND>
>>> # Putting revision 0 for a doc that doesn't exist succeeds:
>>> db.put(my_id, 0, {'cat': 0})
DB Response<OK rev=0>
>>> # `get`ing a doc gets the latest version:
>>> db.get(my_id)
DB Response<OK rev=0 {"cat": 0}>
>>> # Attempting to put a document at existant revision is an error:
>>> db.put(my_id, 0, {'cat': 12})
DB Response<CONFLICT>
>>> # Instead `put` it at the next revision:
>>> db.put(my_id, 1, {'cat': 12})
DB Response<OK rev=1>
>>> # `get`ing a doc gets the latest version:
>>> db.get(my_id)
DB Response<OK rev=1 {"cat": 12}>
>>> # But old revisions can still be gotten:
>>> db.get(my_id, 0)
DB Response<OK rev=0 {"cat": 0}>
Using this system, we will try to implement a piece of code that will execute a change on a document in the database. This code should take as inputs:
- A
DB
instance where the document is stored. - The
doc_id
of the document that is to be changed within the database. - A pure function to execute on the document.
The code will get the document from the database, execute the pure function on
the document, and put it back in the database. If the put
fails, then the
code should get the latest version of the document, execute the pure function
on the latest version of the document, attempt to put
it again, and repeat
until it succeeds.
For good measure, this code can return the final version of the document.
So let’s take a stab at implementing this piece of code. We are using effect,
so I guess that means we want to put db.get
and db.put
behind intents
and performers, and then we want to create a function that returns an “effect
generator” that can be performed by a dispatcher.
Aside
I’m still pretty new to effect
, and playing around with how to do
good design in this paradigm. You may notice this in my tenative design
desisions. If you have any recommendations on how I could do it better, tell
me on github as an issue filed against
ziffect.
from effect import TypeDispatcher, sync_performer
class GetIntent(object):
def __init__(self, doc_id, rev=LATEST):
self.doc_id = doc_id
self.rev = rev
def get_performer_generator(db):
def get(dispatcher, intent):
return db.get(intent.doc_id, intent.rev)
return get
class UpdateIntent(object):
def __init__(self, doc_id, rev, doc):
"""
Slightly different API that the DB gives us, because we need to update a
document below rather than just put a new doc into the DB.
:param doc_id: The document id of the document to put in the database.
:param rev: The last revision gotten from the database for the document.
This update will put revision rev + 1 into the db.
:param doc: The new document to send to the server.
"""
self.doc_id = doc_id
self.rev = rev
self.doc = doc
def update_performer_generator(db):
def update(dispatcher, intent):
intent.rev += 1
return db.put(intent.doc_id, intent.rev, intent.doc)
return update
def db_dispatcher(db):
return TypeDispatcher({
GetIntent: sync_performer(get_performer_generator(db)),
UpdateIntent: sync_performer(update_performer_generator(db)),
})
Okay, so now we have the Effect
-ive building blocks that we can use to
create our implementation:
from effect import Effect
from effect.do import do
@do
def execute_function(doc_id, pure_function):
result = yield Effect(GetIntent(doc_id=doc_id))
new_doc = pure_function(result.doc)
yield Effect(UpdateIntent(doc_id, result.rev, new_doc))
We still don’t technically have what we set out for, as this effect generator only takes two arguments, not the underlying db. So we’ll add one more convenience function that we can play around with on the interpreter:
from effect import (
sync_perform, ComposedDispatcher, base_dispatcher
)
def sync_execute_function(db, doc_id, function):
dispatcher = ComposedDispatcher([
db_dispatcher(db),
base_dispatcher
])
sync_perform(
dispatcher,
execute_function(
doc_id, function
)
)
The implementation of execute_function
should fairly obviously have bugs,
but it’s a good enough implementation that we can convince ourselves that the
happy case works:
>>> db = DB()
>>> doc_id = uuid4()
>>> doc = {"cat": "mouse", "count": 10}
>>> db.put(doc_id, 0, doc)
DB Response<OK rev=0>
>>> def increment(doc_id):
... return sync_execute_function(
... db,
... doc_id,
... lambda x: dict(x, count=x.get('count', 0) + 1)
... )
>>> increment(doc_id)
>>> db.get(doc_id)
DB Response<OK rev=1 {"cat": "mouse", "count": 11}>
>>> increment(doc_id)
>>> db.get(doc_id)
DB Response<OK rev=2 {"cat": "mouse", "count": 12}>
>>> increment(doc_id)
>>> db.get(doc_id)
DB Response<OK rev=3 {"cat": "mouse", "count": 13}>
In the interest of test driven development, at this point we want to write our
unit tests. They should fail, then we’ll fix the implementation of
execute_function
, write more unit tests, etc.
from effect.testing import perform_sequence
class DBExecuteFunctionTests(TestCase):
def test_happy_case(self):
doc_id = uuid4()
doc_1 = {"test": "doc", "a": 1}
doc_1_u = {"test": "doc", "a": 2}
seq = [
(GetIntent(doc_id),
lambda _: DBResponse(status=DBStatus.OK, rev=0, doc=doc_1)),
(UpdateIntent(doc_id, 0, doc_1_u),
lambda _: DBResponse(status=DBStatus.OK)),
]
perform_sequence(seq, execute_function(
doc_id, lambda x: dict(x, a=x.get("a", 0) + 1)
)
)
def test_sad_case(self):
doc_id = uuid4()
doc_1 = {"test": "doc", "a": 1}
doc_1_u = {"test": "doc", "a": 2}
doc_2 = {"test": "doc2", "a": 5}
doc_2_u = {"test": "doc2", "a": 6}
seq = [
(GetIntent(doc_id),
lambda _: DBResponse(status=DBStatus.OK, rev=0, doc=doc_1)),
(UpdateIntent(doc_id, 0, doc_1_u),
lambda _: DBResponse(status=DBStatus.CONFLICT)),
(GetIntent(doc_id),
lambda _: DBResponse(status=DBStatus.OK, rev=1, doc=doc_2)),
(UpdateIntent(doc_id, 1, doc_2_u),
lambda _: DBResponse(status=DBStatus.OK)),
]
perform_sequence(seq, execute_function(
doc_id, lambda x: dict(x, a=x.get("a", 0) + 1)
)
)
Now a few iterations of TDD:
>>> run_test(DBExecuteFunctionTests)
FAILURE(test_happy_case)
Traceback (most recent call last):
File "<interactive-shell>", line 17, in test_happy_case
File "effect/testing.py", line 115, in perform_sequence
return sync_perform(dispatcher, eff)
File "effect/_sync.py", line 34, in sync_perform
six.reraise(*errors[0])
File "effect/_base.py", line 78, in guard
return (False, f(*args, **kwargs))
File "effect/do.py", line 121, in <lambda>
error=lambda e: _do(e, generator, True))
File "effect/do.py", line 98, in _do
val = generator.throw(*result)
File "<interactive-shell>", line 6, in execute_function
File "effect/_base.py", line 150, in _perform
performer = dispatcher(effect.intent)
File "effect/testing.py", line 108, in dispatcher
intent, fmt_log()))
AssertionError: Performer not found: <GetIntent object at 0x7fff0000>! Log follows:
{{{
NOT FOUND: <GetIntent object at 0x7fff0000>
NEXT EXPECTED: <GetIntent object at 0x7fff0001>
}}}
...
First bug: Intents need to have valid __eq__
implementations. Also let’s give
them a __repr__
that makes them slightly less hard to work with.
class GetIntent(object):
def __init__(self, doc_id, rev=LATEST):
self.doc_id = doc_id
self.rev = rev
def __eq__(self, other):
return (
type(self) == type(other) and
self.doc_id == other.doc_id and
self.rev == other.rev
)
def __repr__(self):
return 'GetIntent<%s, %s>' % (
rev_render(self.rev), self.doc_id)
class UpdateIntent(object):
def __init__(self, doc_id, rev, doc):
self.doc_id = doc_id
self.rev = rev
self.doc = doc
def __eq__(self, other):
return (
type(self) == type(other) and
self.doc_id == other.doc_id and
self.rev == other.rev and
self.doc == other.doc
)
def __repr__(self):
return 'UpdateIntent<%s, %s, %s>' % (
rev_render(self.rev),
self.doc_id,
repr(self.doc)
)
Rerun the tests:
>>> run_test(DBExecuteFunctionTests)
FAILURE(test_sad_case)
Traceback (most recent call last):
File "<interactive-shell>", line 41, in test_sad_case
File "effect/testing.py", line 115, in perform_sequence
return sync_perform(dispatcher, eff)
File "effect/testing.py", line 463, in consume
[x[0] for x in self.sequence]))
AssertionError: Not all intents were performed: [GetIntent<LATEST, f456150c-d4ba-5b09-a3fc-7ce3a7dbe905>, UpdateIntent<1, f456150c-d4ba-5b09-a3fc-7ce3a7dbe905, {'a': 6, 'test': 'doc2'}>]
...
Cool, now that we have a failing test, lets improve our implementation to handle the case where the DB was updated while we were running:
@do
def execute_function(doc_id, pure_function):
done = False
while not done:
original_doc = yield Effect(GetIntent(doc_id=doc_id))
new_doc = pure_function(original_doc.doc)
update_result = yield Effect(
UpdateIntent(doc_id, original_doc.rev, new_doc))
done = (update_result.status == DBStatus.OK)
Rerun the tests:
>>> run_test(DBExecuteFunctionTests)
[OK]
Okay, so that all seems reasonable. This style of testing reminds me a lot of mocks. I am creating a canned sequence of expected inputs and return values for my dependencies, and running my code under test using this canned dependency.
Aside
I’m sure you can search the internet for debates of mocks versus fakes and find out more about the issues that some people have with mocks. In my view, two of the best arguments against mocks are:
- Does the mock sufficiently behave like a real implementation so that the
test is meaningful? This is particularly pertinent in python, because
something simple like, “your mock does not return the correct type of
value” might mean that your unit test fails to catch a
TypeError
that will always happen with the real implementation. - Mocks create tests that are tightly tied to the implementation of the code under test; if the implementation is changed, the test must also be modified. Consider, for instance, if we add a 2nd GetIntent to the beginning of the implementation, it should not change the correctness, but the test would now fail without modification. Specifically the sequence that is passed to perform_sequence would need a second GetIntent call at the beginning of the sequence.
Personally, I think mocks do have a place in unit tests like the one above. Specifically you are interfacing with an API that can return different values for the same inputs, and you need to force some external state change at a specific time in order to force the different inputs.
There are other strategies to do similar testing, but as long as you have a solid, simple interface to mock, I believe that form of testing gets the most bang for your buck.
Let’s build on our existing implementation. Let’s say after using this code for
awhile we realize that the DB commands can also return a NETWORK_ERROR
.
We are going to take the simple policy of retrying any attempt that results in
a NETWORK_ERROR
. We are not going to bother with exponential back-off or
any other nice-to-have right now, just a dead simply retry.
Aside
Assuming that NETWORK_ERRORS
can happen before or after an operation is
complete, this has some interesting ramifications. Our implementation of
execute_function()
will be an at-least-once implementation, where it
guarantees that the function you specified will have occured at least once on
the doc_id specified. A poorly timed NETWORK_ERROR
after a successful
update will cause our code to retry the update, get a conflict, and cycle
through the code again.
In response to some of the fears about using mocks, lets utilize an
InMemoryDB
fake and a NetoworkErrorDB
fake in the next implementation.
This will force our tests to actually test in the performers in conjunction
with the other code. We are still using perform_sequence
to inject the
fakes in a mock-like manner mind you.
class NetworkErrorDB(object):
def get(self, doc_id, rev=LATEST):
return DBResponse(status=DBStatus.NETWORK_ERROR)
def put(self, doc_id, rev, doc):
return DBResponse(status=DBStatus.NETWORK_ERROR)
class DBExecuteNetworkErrorTests(TestCase):
def test_network_error(self):
doc_id = uuid4()
db = InMemoryDB()
update_performer = update_performer_generator(db)
get_performer = get_performer_generator(db)
bad_db = NetworkErrorDB()
bad_update_performer = update_performer_generator(bad_db)
bad_get_performer = get_performer_generator(bad_db)
db.put(doc_id, 0, {"test": "doc", "a": 1})
doc_1 = {"test": "doc", "a": 1}
doc_1_u = {"test": "doc", "a": 2}
seq = [
(GetIntent(doc_id), lambda i: bad_get_performer(None, i)),
(GetIntent(doc_id), lambda i: get_performer(None, i)),
(UpdateIntent(doc_id, 0, doc_1_u),
lambda i: bad_update_performer(None, i)),
(UpdateIntent(doc_id, 0, doc_1_u),
lambda i: update_performer(None, i)),
]
perform_sequence(seq, execute_function(
doc_id, lambda x: dict(x, a=x.get("a", 0) + 1)
)
)
Test Failure:
>>> run_test(DBExecuteNetworkErrorTests)
ERROR(test_network_error)
Traceback (most recent call last):
File "<interactive-shell>", line 36, in test_network_error
File "effect/testing.py", line 115, in perform_sequence
return sync_perform(dispatcher, eff)
File "effect/_sync.py", line 34, in sync_perform
six.reraise(*errors[0])
File "effect/_base.py", line 78, in guard
return (False, f(*args, **kwargs))
File "effect/do.py", line 120, in <lambda>
return val.on(success=lambda r: _do(r, generator, False),
File "effect/do.py", line 100, in _do
val = generator.send(result)
File "<interactive-shell>", line 6, in execute_function
File "<interactive-shell>", line 36, in <lambda>
AttributeError: 'NoneType' object has no attribute 'get'
...
The NETWORK_ERROR
on the get is causing issues...
@do
def execute_function(doc_id, pure_function):
done = False
while not done:
original_doc = None
while original_doc is None:
original_doc = yield Effect(GetIntent(doc_id=doc_id))
if original_doc.status == DBStatus.NETWORK_ERROR:
original_doc = None
new_doc = pure_function(original_doc.doc)
update_result = yield Effect(
UpdateIntent(doc_id, original_doc.rev, new_doc))
done = (update_result.status == DBStatus.OK)
Run the test again:
>>> run_test(DBExecuteNetworkErrorTests)
FAILURE(test_network_error)
Traceback (most recent call last):
File "<interactive-shell>", line 36, in test_network_error
File "effect/testing.py", line 115, in perform_sequence
return sync_perform(dispatcher, eff)
File "effect/_sync.py", line 34, in sync_perform
six.reraise(*errors[0])
File "effect/_base.py", line 78, in guard
return (False, f(*args, **kwargs))
File "effect/do.py", line 121, in <lambda>
error=lambda e: _do(e, generator, True))
File "effect/do.py", line 98, in _do
val = generator.throw(*result)
File "<interactive-shell>", line 7, in execute_function
File "effect/_base.py", line 150, in _perform
performer = dispatcher(effect.intent)
File "effect/testing.py", line 108, in dispatcher
intent, fmt_log()))
AssertionError: Performer not found: GetIntent<LATEST, 9515f7cf-8e34-c0f0-49ab-ddee515684b5>! Log follows:
{{{
sequence: GetIntent<LATEST, 9515f7cf-8e34-c0f0-49ab-ddee515684b5>
sequence: GetIntent<LATEST, 9515f7cf-8e34-c0f0-49ab-ddee515684b5>
sequence: UpdateIntent<1, 9515f7cf-8e34-c0f0-49ab-ddee515684b5, {'a': 2, 'test': 'doc'}>
NOT FOUND: GetIntent<LATEST, 9515f7cf-8e34-c0f0-49ab-ddee515684b5>
NEXT EXPECTED: UpdateIntent<0, 9515f7cf-8e34-c0f0-49ab-ddee515684b5, {'a': 2, 'test': 'doc'}>
}}}
...
The NETWORK_ERROR
on the update is causing issues...
@do
def execute_function(doc_id, pure_function):
done = False
while not done:
original_doc = None
get_intent = GetIntent(doc_id=doc_id)
while original_doc is None:
original_doc = yield Effect(get_intent)
if original_doc.status == DBStatus.NETWORK_ERROR:
original_doc = None
new_doc = pure_function(original_doc.doc)
update_result = None
update_intent = UpdateIntent(doc_id, original_doc.rev, new_doc)
while update_result is None:
update_result = yield Effect(update_intent)
if update_result.status == DBStatus.NETWORK_ERROR:
update_result = None
done = (update_result.status == DBStatus.OK)
>>> run_test(DBExecuteNetworkErrorTests)
FAILURE(test_network_error)
Traceback (most recent call last):
File "<interactive-shell>", line 36, in test_network_error
File "effect/testing.py", line 115, in perform_sequence
return sync_perform(dispatcher, eff)
File "effect/_sync.py", line 34, in sync_perform
six.reraise(*errors[0])
File "effect/_base.py", line 78, in guard
return (False, f(*args, **kwargs))
File "effect/do.py", line 121, in <lambda>
error=lambda e: _do(e, generator, True))
File "effect/do.py", line 98, in _do
val = generator.throw(*result)
File "<interactive-shell>", line 15, in execute_function
File "effect/_base.py", line 150, in _perform
performer = dispatcher(effect.intent)
File "effect/testing.py", line 108, in dispatcher
intent, fmt_log()))
AssertionError: Performer not found: UpdateIntent<1, c2d99fe7-48e7-9846-a601-ce405b5baedf, {'a': 2, 'test': 'doc'}>! Log follows:
{{{
sequence: GetIntent<LATEST, c2d99fe7-48e7-9846-a601-ce405b5baedf>
sequence: GetIntent<LATEST, c2d99fe7-48e7-9846-a601-ce405b5baedf>
sequence: UpdateIntent<1, c2d99fe7-48e7-9846-a601-ce405b5baedf, {'a': 2, 'test': 'doc'}>
NOT FOUND: UpdateIntent<1, c2d99fe7-48e7-9846-a601-ce405b5baedf, {'a': 2, 'test': 'doc'}>
NEXT EXPECTED: UpdateIntent<0, c2d99fe7-48e7-9846-a601-ce405b5baedf, {'a': 2, 'test': 'doc'}>
}}}
...
For those of you who are familiar with Effect
, you probably noticed pretty
early in this post what the error is about. My implementation of the
update_performer
modifies the intent that is passed in when it is called.
Specifically it increments the revision of the intent in place before passing
it to the underlying call to db.put
. With this implementation of how we
handle NETWORK_ERRORS we are re-using the same intent with the next performance
of update. The second run of update
is unaware that the first one already
incremented rev
, so it is incremented a second time. This is the source of
our bug.
Effect recommends against mutating intents, but there is not any mechanism that enforces it. Luckily, depending on your code it might be sort of rare to re-use intents. If you do happen to re-use intents though, and you have not been diligent about never mutating them, you might be vulnerable to some pretty pesky bugs to track down.
The quick fix is simply not to modify intent in the function:
def update_performer_generator(db):
def update(dispatcher, intent):
return db.put(intent.doc_id, intent.rev + 1, intent.doc)
return update
>>> run_test(DBExecuteNetworkErrorTests)
[OK]
This for now pretty much wraps up my implementation using pure Effect
, but
there is one last observation I’d like to make:
TypeDispatchers are just classes¶
Look at db_dispatcher:
def db_dispatcher(db):
return TypeDispatcher({
GetIntent: sync_performer(get_performer_generator(db)),
UpdateIntent: sync_performer(update_performer_generator(db)),
})
This is a chunk of python that describes what functions to execute when a certain identifier (type of intent) occurs. At some later point during the program some values will be passed to one of the code chucks associated with one of the identifiers.
It is sort of a funny way of describing it, but to me this describes a class
definition. The intents are bundles of arguments, the type of the intents are
the names of the methods, and the TypeDispatcher
instance represents an
object that is an instance of that type.
Think about attempting to create a TypeDispatcher
that can perform the same
effects as the objects returned by db_dispatcher
, but rather than
performing db interactions just writes an object to a file or reads an object
from a file:
_FILEPATH = '/tmp/datastore'
def _get_stored_obj():
return json.load(open(_FILEPATH, "r"))
def _store_obj(obj):
return json.dump(obj, open(_FILEPATH, "w"))
def file_update_performer(intent):
file_store = _get_stored_obj()
obj_revs = file_store.get(intent.doc_id, [])
if len(obj_revs) != intent.rev:
return DBResponse(status=DBStatus.CONFLICT)
file_store[doc_id] = obj_revs
obj_revs.push(intent.doc)
_store_obj(file_store)
def file_get_performer(dispatcher, intent):
file_store = _get_stored_obj()
if intent.rev < LATEST:
return DBResponse(status=DBStatus.BAD_REQUEST)
try:
return DBResponse(
status=DBStatus.OK,
rev=intent.rev,
doc=file_store[intent.doc_id][intent.rev]
)
except KeyError:
return DBResponse(
status=DBStatus.NOT_FOUND
)
except IndexError:
return DBResponse(
status=DBStatus.NOT_FOUND
)
def file_dispatcher():
return TypeDispatcher({
GetIntent: sync_performer(file_get_performer),
UpdateIntent: sync_performer(file_update_performer),
})
This feels a lot like implementing another class that implements the same
interface. It is just writing performers for a specific intent types
(GetIntent
and UpdateIntent
) rather than writing methods with specific
names.
If you put a bunch of dispatchers together using a ComposedDispatcher
it
is similar to subclassing, in that you are adding more performers to the same
namespace, just like adding more methods to the same class. There even is the
ability to overload since ComposedDispatchers prefer earlier dispatchers over
later dispatchers.