Participant recovery
Being able to recover from disasters easily and safely is core to custody, which is why we strive to have backup mechanisms that are easy, flexible, and secure.
This guide will focus on being able to recover in the event of a total loss of a node that you don't have a recent backup for. It's possible to recover a destroyed node using information from the remaining nodes (instead of using a normal snapshot).
Prerequisites
You will need:
- Any old snapshot of the destroyed node -- this is to recover identity information of the node. This snapshot could have been taken immediately the node was first setup.
- Decryption phrase used by the destroyed node. If you do not have this, then you may need to start a new cluster.
- Encrypted shares gathered from the remaining participants. By default, nodes will store the encrypted shares of all other nodes.
Take a snapshot of a healthy node
We need a recent copy of all of the latest policy state. Note that this snapshot will not contain any key material and thus doesn't use encryption.
cord backup snapshot --home ${TREASURY_HOME} --output healthy-snapshot.tar --engine
If you already have a recent snapshot, you can use that -- it does not need to be decrypted.
Gather encrypted key shares
Encrypted key shares are saved on all nodes to the backup directory. By default
this will be in $TREASURY_HOME/backups
but could be configured to a separate directory.
ls $TREASURY_BACKUP_DIR/*/participants/*/shares
Alternatively, you can also untar recent snapshot(s) of any node to find encrypted/signer-shares.tar.gz
,
which will contain the contents of $TREASURY_BACKUP_DIR/participants
Shares will be organized by the backup public key used to encrypt them, and by participant.
Gather the key shares for the participant (e.g. "2
") that you wish to restore.
Double check you have the correct backup key using cord backup verify-phrase
.
If you do not have the encrypted shares
You can calculate the missing shares using a threshold number of healthy nodes. For example, if you have a 3-of-4 cluster, the 3 healthy nodes can collaborate to calculate the shares of the destroyed nodes.
Restore from snapshot
Initialize the node using an old snapshot (this snapshot must not be of a different participant).
cord backup restore --snapshot "old-snapshot.tar"
Now, apply the engine/policy state (this can come from any participant). This step does not decrypt anything.
cord backup restore --snapshot "healthy-snapshot.tar" --home ${TREASURY_HOME} --no-secrets --no-configs
Finally, import the key shares that you gathered (these could be from any or multiple participants).
signer backup import --db ${TREASURY_HOME}/signer.db --import-dir "<path/to/shares-dir>"