REST API + Python - Generate infinite consumption

REST is an awesome technology that allows for interactions with complex web services to be expressed in a simple manner. Its minimalistic vocabulary does not belong in client code, however. It's sad (and very obvious) when language bindings to REST APIs do little to enrich the client experience.

The problem

Write a binding library to a rich and shiny looking REST API, avoiding verbs like create, fetch and the like.

Although GET, PUT, POST and DELETE totally make sense in the context of a REST backend service, they are an unnecessary and ugly looking artifacts in client code.

A practical example

Iterating over the representations in a REST collection. I've seen code for that which looks like this:

objects = Object.fetch()  
count = offset = 0  
while some_condition:  
    if count == len(objects):
        offset += count
        objects = Object.fetch(offset=offset)
        count = 0
    object = objects[count]
    # do something with the object

In the above snippet, the implementation of Object takes care of the network calls. That is good. But the consumer is still obviously aware of the discontinuous nature of the collection stream, and needs to compensate for it (keep state, issue subsequent fetch calls). That is bad.

That is, in fact, broken. Consuming code should be able to approach the remote collection the way it would any collection. Objects can be pulled from the network to local memory on demand, but the client code does not need to know that.

for object in objects():  
    # do something with object

Alternative: Generators

The best definition of python generators I've ever heard:

Generators transform iterations.

Iterating over a collection, albeit a remote one, is still an interation. A generator is capable of transforming the iteration experience, handling the details and exposing a pythonic interface, as in the client code above.

An tentative implementation can be described as:

  • get a batch of default_limit objects and load them in memory,
  • yield from the loaded values until exhausted,
  • at that point, if there's more values to request do so, extending the objects previously loaded.
def objects(offset=0):  
    params = {
        'offset': offset,
        'limit': default_limit,
        }
    req = requests.get(endpoint, params=params)
    results, paging = req.json()['data'], req.json()['paging']

    count = offset
    for result in results:
        yield result
        count += 1
        if count == len(results) and paging['has_more']:
            params['offset'] = count
            req = requests.get(endpoint, params=params)
            new_results, paging = req.json()['data'], req.json()['paging']
            results.extend(new_results)

The snippet above encapsulates the call to the remote service, and caches the results in memory as they're requested.

One nice thing to notice is that extending the collection in memory is not really necessary. I was assuming that it'd be nice to cache those values, but if it turns out to be too expensive, or simple not useful, it is easily to discard used up objects:

    # ...
            new_results, paging = req.json()['data'], req.json()['paging']
            del results[:]
            results.extend(new_results)

...just adding that extra del results[:] line.

Asynchronous pre-fetching?

The asynchronous approach is not easy to implement. It sounds super to pre-fetch values when the consuming code has reached, say, 80% of the current batch... the complexity of handling a concurrent producer-consumer is not to be taken lightly, however. In the Python world I'd say gevent is yet the easiest option if async is required.

Por Elvio Rogelio Toccalino

Professional programmer, enthusiast hacker, mad entrepreneur