Table of Contents
- CHIP-Backup: wallet offline backup formats
CHIP-Backup: wallet offline backup formats
Title: wallet offline backup formats Type: Standards Layer: Applications Maintainer: Tom Zander Status: Draft Initial Publication Date: 2023-01-01 Latest Revision Date: 2023-01-01 Version: 0.1.0
Table of Contents
A standard for formatting and storing wallet backups.
This proposal does not require coordinated deployment. Wallets can begin implementation immediately.
As the market for wallets that serve Bitcoin Cash is opening up, people migrating away from the most used wallet will start looking for usability features and some users have already learned that being locked-in is not good place to be. Backups are currently just messy and error-prone with as result an overall bad user experience for one of the most stressful parts of wallets.
A HD wallet is based on a seed-phrase plus a derivation path, of which there are multiple competing standards. The seed-phrase itself can be written out in a handful of languages and unfortunately support for multiple languages is lacking in most wallets.
The current state is that the user needs to input a bunch of independent pieces of data which come together to restore a wallet. If one of those is entered incorrectly, or is not supported by the wallet, undefined things happen. Most of the time the user will simply not see their money appear and often is at a loss on how to fix this.
What we propose is to simplify the backup process and combine the different parts into one format, allowing a wallet to increase its user experience for restoring a backup massively.
Multiple derivation paths
What current wallets do today is hide from the user the entire concept of derivation paths as much as they can. A direct effect of this is that a big advantage of derivation paths is made impractical. What would be great is that one seed-phrase could power a large number of user wallets.
A small reminder on derivation paths
As defined in bip 32: Hierarchical Deterministic Wallets, a HD wallet is in essence a single secret. A HD master key.
From this master key we use a pre-defined method to derive individual bitcoin addreses and their private keys. The derivation process is multiple steps and matches the derivation path.
From that HD Master Key you can derive a practicelly infinite amount of bitcoin addresses. For instance the derivation path
m/44'/0'/0' defines a HD-wallet. We then append
/0/1 to it in order to create the first address. Or
/0/2 to create the next one. Up till a 31-bit number (2 147 483 647) for each stage of the path.
If you start with a different base derivation path, the list of addresses with be completely different.
The proposal makes it possible to have a list of derivation paths combined with the one seed-phrase in a single backup. This allows us to have one seed phrase be the basis for multiple wallets. One backup for a nearly endless list of wallets and keys. Which has a great potential to improve privacy without requiring yet another backup.
Usages can be where your desktop wallet can get linked with a daily-usage phone wallet based on the same backup. This allows your desktop wallet to follow your phone wallet and send funds to a new, unused, address every morning to top-up your daily-usage phone wallet. A concept not impossible today, the important part of security is that it is made easy. Such a setup would be made trivial to do the moment you start multiple wallets from the same backup.
Technical detail; the phone can store locally the already derivated xpriv (bip32) from the backup, to make sure that compromising the phone does not cause any compromised secrets on the desktop.
Making a backup of a wallet is generally speaking to recover from disaster and that means it is not typically tested by end users. Having a digital backup is thus much better than a human writing on paper or similar. Making a mistake in hand-writing some numbers can today cause loss of funds.
There have been for decades a product on the market called "NFC-tags". Small & cheap and (very important) dumb data-storage. They don't even have any battery or processor as the process of reading it powers it through the antenna. In other words, ideal for storing a backup as it is offline and a reader has to come within 2 inch (10cm) to be able to read it. And it is digital, no typos or bad handwriting problems!
The downside is that they don't store a lot of data. The simplest store no more than 64 bytes. But it makes sense to turn that limitation into a positive because even with QR codes you don't want them to hold a large text as they get too detailed and too large which negatively affects longer term storage. Needing less data for the full backup is definitely useful.
We propose a binary format based on the basic bitcoin transaction-format primitives in order to have maximum compatibility while keeping the format small. A typical backup, including comment, totals 46 bytes.
Multi-sig and p2sh ownership
While the vast majority of transactions in use today are based on a Bitcoin address, there are many more possibilities on securing money on the Bitcoin Cash blockchain. There are various types of multi-sig. There is payment-to-script-hash and we should not exclude general scripts that don't fit existing templates.
From the perspective of backups there are two stages that are relevant.
A wallet should be able to find such transactions on the blockchain.
For this an imported wallet needs to explicitly search for types of transactions, or even very specific bitcoin-script templates that were used to create them. It follows that a backup needs to specify all types of transactions created in the past.
A wallet should have all the information needed to spend the money.
For pay-to-script-hash this means the entire script has to be able to be created, for multi-sig we need to know who our partners are.
Many of these types of scripts are not used a lot and the technical challenge of doing so is significantly larger than the P2PKH standard script. Those challenges also bleed through in the UX and in doing backups. Probably the biggest issue is that a wallet may start using a new script-type at any time, but this will invalidate a prior made backup as a restored wallet from that old backup will not search for money stored in that now used script-type.
To satisfy the first requirement, we could just include a sha256 hash over the output. But that won't help us spend it since we need to actually unlock the coin. In the p2sh case spending such an output requires appending the actual full script.
It probably is best to expand this specification in a next version when more wallets have been using such types of scripts and we may find a common way of storing them which is generic enough to standardize on.
CashAddress encoding and QR-codes
The backup file itself is a binary format, following the ideas and concepts Satoshi designed for the transaction format. The benefit is that the backup is small, it can be stored offline on a dumb NFC-tag.
It does make sense to define different ways to store or transfer this backup file. One method is as a QR code. QR codes don't have to be text. The advantage of a small filesize is also leading to a compact QR code.
For human-readable usage we suggest using the cash-encoding standard with the prefix "bchbackup". This standard supports encoding files up to 64 bytes in size. If we apply this to the example backup (annotated), zero pad it to become 48 bytes we get the following cash-address formatted string:
To compare with the xpriv standard, we include all the same info in the above string as well as a comment that is not in the xpriv, while still being smaller:
The bchbackup encoded text may be used in a QR as well, which may be preferable for wallets that normally don't have access to a camera and rely on users using the clipboard somehow. The result gets larger, naturally, but is still manageble.
|example as a QR||bchbackup formatted|
|version||1||unsigned byte||Version, currently 1. Higher versions may add new fields at the end|
|secret-length||variable||variable length integer||The size of the secret in bytes|
|secret||variable||bytes(BE)||the secret, as defined above|
|language||1||unsigned byte||Language-code for dictionary|
|message-size||variable||variable length integer||The size of the message in bytes|
|message||variable||utf-8 string||message. A text message the owner can store here to describe this backup|
|paths count||variable||variable length integer||The number of paths in this backup|
|Paths||variable||paths_count paths (see below table)|
|path length||variable||variable length integer||The size of the path in bytes|
|path||variable||latin1-string||The derivation path|
|start_height||variable||variable length integer||The block-height on the BCH chain where this wallet started|
|message-size||variable||variable length integer||The size of the message in bytes|
|message||variable||utf-8 string||message. A text message the owner can store here to describe this path|
In most of the crypto world wallet backups are done using the 'mnemonic phrase', also sometimes called 'seed-phrase'. As explained in the introduction, this is open to improvement on UX for wallets. "To make a backup" is an involving process in those wallets to mitigate these user experience issues.
The 12 word phrase (or 24 words, if you want more uniqueness) can directly be used to create the HD-masterkey using a simple hash function. Which is great for wallets. A bit surprising is that to validate the seed and the included checksum, a wallet still needs to have the word lists and convert the seed phrase to a single byte-array of 17 bytes (11 bits per word, so 12 wors is 16½).
In the context of the backup-format CHIP, the "secret" is this bytearray. This bytearray has a one-to-one translation to the much used seed-phrase. We choose to store this much shorter version in our digital backup-file while preserving the checksum and thus any app can trivially convert the secret to a seed-phrase again.
The secret is a compact version of the traditional seed-phrase / mnemonic. While the vast majority of people use the English one, there are 10 different language word-lists known at time of writing this CHIP. To restore the HD-masterkey, the appropriate dictionary must be used to expand the secret.
For more info see the original BIP39 word lists.
Please use the following table to find out the mapping between language-list and integer stored in the format.
Lets create backup based on example BIP39 from this list. Our example uses a derivation path of
legal winner thank year wave sausage worth useful legal winner thank yellow
Converting this to a simple bytearray using the wordlist, following the spec of bip39, you'll get the 16 bytes as specified in the linked JSON but you'll also need to attach the checksum. Which is stored in the high-nibble (4 bits.) appended at the end. The checksum in this case is
0x80. Again, all this is standard bip39.
Other values are just filled in (see example) which gives the resulting 46 bytes backup;
00000000: 01 11 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 00000010: 7f 7f 80 00 06 48 65 6c 6c 6f 21 01 0b 6d 2f 34 00000020: 34 27 2f 30 27 2f 30 27 fe 00 0b b0 30 00
Prior Art & Alternatives
The Cashual wallet, unmaintained for 3 years, added a network-backup and NFC-tag backup feature (source). The strategy there was to simply backup the xpriv. Which is both more data and more limiting that the proposal layed out in this CHIP.
Feedback & Reviews
Sahid asked about having a password / encryption on this backup.
The basis of this proposal is built on BIP 39. This proposal has as 'secret' an alternative formatting of the mnemonic, and in bip39 you find the topic From mnemonic to seed. In that part you'll also be able to find the usage of a 'passphrase'.
As such the requirement of allowing users to store separately a passphrase is already present in fully supporting bip39 based wallets. As a result, using this specification wallets can use the existing way to password protect their backups.
Copyright (C) 2022-2023 Tom Zander
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.