Accessing Firefox sync (Weave) data from Python

Since delicious was terminating its service, I needed a new bookmark syncing service. I’ve switched to Firefox sync which finally seemed mature enough and, indeed, it works like a charm. It synchronized passwords, history perfectly between computers and android devices.

However, there is no way to access your online data and/or to sync data with, for example, Google chrome. This made me wonder how Sync works, why there’s no web interface to it and how hard it would be to write custom clients (javascript or python).

Unfortunately, documentation and (working) examples are scarce. The storage server itself is rather simpel and doesn’t need a lot of explanation, but the way the Sync code uses the storage definately does. There is a “weave.py” implementation from mozilla, but it’s rather outdated and won’t work with the current storage version used (version 5).

Nonetheless I managed to figure things out and get a working client that, at least, fetches bookmarks. This should be a good start for anyone trying to access Firefox sync data.

The full code can be found on github. I’ll highlight some specific details in this post.

Basics

When accessing the sync storage, you will need three things:

  • Your username. This is usually an email address
  • Your password.
  • Your passphrase. This is what is used to encrypt all your data before storing it. It has the format x-xxxxx-xxxxx-xxxxx-xxxxx where x can be a number or letter (with some limitations). If you lose this, you lose your data.
Your data is stored encrypted on the Firefox sync storage backends. The key used to encrypt the data is also stored here, but it’s encrypted using your passphrase. This means that, to access your data, you need the following steps:
  1. Get your (passphrase encoded) private key (../storage/keys)
  2. Decode your private key using your passphrase
  3. Fetch data from the storage backend (e.g. ../storage/bookmarks/xyzxyz123)
  4. Decode it using your private key

This also explains why there is no webinterface offered by mozilla - they simply can’t access your data!

request structure

The overall url structure (when accessing sync data) is as follows:
https://backend/<api>/<username>/<collection>
backend is the base url to a backend node api is the api version. 1.0 is currently used. username is your username, encoded (see below) collection is the collection you’re accessing. Known collections include:
  • bookmarks
  • passwords
  • history
  • form history
  • open tabs
Data is returned as json, sometimes containing encryped json strings as payload.

Decoding the passphrase

Firefox Sync generates a passphrase for you and creates a base32 encoded string out of this. For readability, it is split into 6 hyphen-separated parts with ‘l’ replaced by ‘8′ and ‘o’ replaced by 9. The following method transforms it back into it’s original form:
    def decode_passphrase(p):
        def denormalize(k):
            """ transform x-xxxxx-xxxxx etc into something b32-decodable """
            tmp = k.replace('-', '').replace('8', 'l').replace('9', 'o').upper()
            padding = (8-len(tmp) % 8) % 8
            return tmp + '=' * padding
        return base64.b32decode(denormalize(p))

Encoding the username

The username needs specific encoding. Specifically, you need to turn it into a SHA1 hash and base32 encode it:
    def encode_username(u):
        return base64.b32encode(hashlib.sha1(u).digest()).lower()
When doing API calls to the backend, you HTTP authenticate using this encoded username and your password (not passphrase!).

Finding a node

Before fetching data, you’ll need to get the base-url of a backend node:
    def get_node(self):
        url = self.server + '/user/1/' + self.username + '/node/weave'
        r = requests.get(url, auth=(self.username, self._password))
        return r.read()

References

Migrated comments

Hi, I had to change line 28 (in the script hosted on github) because it didn’t work. old: url = self.server + ‘/user/1/’ + self.username + ‘/node/weave’ new: url = self.server + ‘/user/1.0/’ + self.username + ‘/node/weave’

Comment by unodipassaggio — Sep 12, 2011 9:36:43 PM

hi, shouldn’t it be r.content instead of r.read()? At least it is working for me this way and I get an error otherwise. Thx for the code! Saved me a lot of time.

Comment by geier — Nov 28, 2011 11:59:30 PM

Last updated April 18, 2013, 4:35 p.m.
comments powered by Disqus