StakeWise Infra Package
StakeWise team has developed a set of helm charts that provision an ETH2 staking infrastructure on top of the Kubernetes cluster. The charts contain all the necessary components for deploying a production-grade setup with the following features:
- Fault tolerance for servers: when any of the servers with the validators fails, all of the validators on that server will migrate to another one while preserving the slashing database. Similarly, the ETH1/ETH2 nodes, monitoring tools, and other services will also migrate to other nodes in case the current one fails.
- Interoperable: Kubernetes cluster can be deployed on all of the major clouds and to the bare-metal. As a result, the staking setup is infrastructure agnostic.
- ETH1 and ETH2 clients diversity: helm charts support various vendors of both ETH1 (GoEthereum, Erigon, OpenEthereum, Nethermind) and ETH2 (Prysm, Lighthouse, Teku, Nimbus) clients. Deploying another set of replicas of a different client can be done by updating
values.yamland calling an upgrade command.
- Fault tolerance for ETH1 nodes: the recommended setup deploys two replicas of ETH1 nodes and uses Gateway.fm, Infura, Alchemy or any other hosted service as a fallback. As a result, if one of the ETH1 nodes fails, the ETH2 nodes connected to that failed node will switch to the second ETH1 node. If it happens that both ETH1 nodes fail, the ETH2 nodes will fall back to the hosted service.
- Fault tolerance for ETH2 beacon nodes: the recommended setup deploys three replicas of the primary ETH2 client and one replica of the stand-by ETH2 client. The validators will be evenly distributed across all the primary replicas and will automatically switch to another primary replica in case the connection to their current one fails. If there is an issue with both primary nodes, the validators can migrate to the stand-by client and won't need to wait for it to sync the chain.
- Secure validator keys storage: the Hashicorp Vault is used for storing the validator keys. Only the validators that are responsible for hosting the keys can fetch them from the vault. The keys are synced to the vault using operator CLI that also ensures that there are no duplicates and cleans up keys that have exited from staking.
- Automated slashing database migration: when validators migration is performed from one ETH2 client to another, the slashing database will be migrated to the format accepted by the new ETH2 client.
- Close to zero downtime upgrades: the application upgrades are fully automated. For instance, the ETH1 and ETH2 nodes are replaced with the new versions one by one, so that validators can continue attesting/proposing blocks as they're still connected to another instance that hasn't yet been upgraded. The validators are also upgraded one by one making sure that there are never multiple instances with the same validator keys running simultaneously.
- Monitoring and alerts: all of the applications deployed with the helm charts expose the relevant metrics to the Prometheus that can be monitored with Grafana. Also, Alert Manager can be used to send notifications to the cluster admins about important events.