Participant recovery

Being able to recover from disasters easily and safely is core to custody, which is why we strive to have backup mechanisms that are easy, flexible, and secure.

This guide will focus on being able to recover in the event of a total loss of a node that you don't have a recent backup for. It's possible to recover a destroyed node using information from the remaining nodes (instead of using a normal snapshot).

Prerequisites

You will need:

Any old snapshot of the destroyed node -- this is to recover identity information of the node. This snapshot could have been taken immediately the node was first setup.
Decryption phrase used by the destroyed node. If you do not have this, then you may need to start a new cluster.
Encrypted shares gathered from the remaining participants. By default, nodes will store the encrypted shares of all other nodes.

Take a snapshot of a healthy node

We need a recent copy of all of the latest policy state. Note that this snapshot will not contain any key material and thus doesn't use encryption.

cord backup snapshot --home ${TREASURY_HOME} --output healthy-snapshot.tar --engine

If you already have a recent snapshot, you can use that -- it does not need to be decrypted.

Gather encrypted key shares

Encrypted key shares are saved on all nodes to the backup directory. By default this will be in $TREASURY_HOME/backups but could be configured to a separate directory.

ls $TREASURY_BACKUP_DIR/*/participants/*/shares

Alternatively, you can also untar recent snapshot(s) of any node to find encrypted/signer-shares.tar.gz, which will contain the contents of $TREASURY_BACKUP_DIR/participants

Shares will be organized by the backup public key used to encrypt them, and by participant. Gather the key shares for the participant (e.g. "2") that you wish to restore.

Double check you have the correct backup key using cord backup verify-phrase.

If you do not have the encrypted shares

You can calculate the missing shares using a threshold number of healthy nodes. For example, if you have a 3-of-4 cluster, the 3 healthy nodes can collaborate to calculate the shares of the destroyed nodes.

Restore from snapshot

Initialize the node using an old snapshot (this snapshot must not be of a different participant).

cord backup restore --snapshot "old-snapshot.tar"

Now, apply the engine/policy state (this can come from any participant). This step does not decrypt anything.

cord backup restore --snapshot  "healthy-snapshot.tar" --home ${TREASURY_HOME} --no-secrets --no-configs

Finally, import the key shares that you gathered (these could be from any or multiple participants).

signer backup import --db ${TREASURY_HOME}/signer.db --import-dir "<path/to/shares-dir>"

Participant recovery

Prerequisites​

Take a snapshot of a healthy node​

Gather encrypted key shares​

Restore from snapshot​

Prerequisites

Take a snapshot of a healthy node

Gather encrypted key shares

Restore from snapshot