Thursday, November 4, 2010

Bad servers, chunked encoding and IncompleteRead

Have you ever got IncompleteRead exception on trying to fetch chunked data with urllib2? I did. Look at the snippet:

import urllib2, httplib

try:
    data = urllib2.urlopen('http://some.url/address').read()
except httplib.IncompleteRead:
    # Ahtung! At this point you lose any fetched data except last chunk.

IRL most bad servers transmit all data, but due implementation errors they wrongly close session and urllib raise error and bury your precious bytes.

What you have to do to handle such situation?

I don't like any solutions which involve manual data reading loop, so I prefer to patch read function.

import httplib

def patch_http_response_read(func):
    def inner(*args):
        try:
            return func(*args)
        except httplib.IncompleteRead, e:
            return e.partial

    return inner
httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)

It allows you to deal with defective http servers.

15 comments:

  1. That is a nice tip, I have ran in to this issue quite a few times and couldn't find a way around it. Thank for the post.

    ReplyDelete
  2. Not a good idea to use the word 'beaver' in a blog title. It has several meanings...

    ReplyDelete
  3. > It has several meanings...

    I know, I know.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Thank you very much, you solved my problem.

    ReplyDelete
  6. Does this mean the client will be able to read only a portion of the data/webpage?

    ReplyDelete
  7. > Does this mean the client will be able to read only a portion of the data/webpage?

    This is true in case of a "real" read error (a protocol error, more exactly). However I faced it only during closing connection (from server side) when all data was sent.

    ReplyDelete
  8. This totally, 100% saved my day. +10 Karma. Object patching is awesome.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. You should run the snippet before using urllib2 functions. Copy paste it into module header in the simple case.

      Delete
  10. Excellent. Just saved my laptop from taking a flight off the balcony! Thanks.

    ReplyDelete
  11. Workaround works with httplib2! Thanks!!!

    ReplyDelete
  12. maximum recursion depth exceeded for inner(*args) function

    ReplyDelete