Skip to content

Rate this page
Thanks for your feedback
Thank you! The feedback has been submitted.

Get free database assistance or contact our experts for personalized support.

Data at Rest Encryption

Introduction

In this chapter

Data at rest encryption refers to encrypting data stored on a server’s disk. If an unauthorized user accesses data files on the file system, encryption ensures the user cannot read their contents. Percona XtraDB Cluster 8.4 inherits from Percona Server for MySQL 8.4 the ability to enable, disable, and apply encryption to the following objects:

Data at rest encryption in Percona XtraDB Cluster

  • File-per-tablespace table

  • Schema

  • General tablespace

  • System tablespace

  • Temporary table

  • Binary log files

  • Redo log files

  • Undo tablespaces

  • Doublewrite buffer files

Transit data is data transmitted to another node or client. Encrypted transit data uses an SSL connection.

Percona XtraDB Cluster 8.4 supports all data at rest encryption features available from Percona Server for MySQL 8.4.

For a concise map of the sections below, see In this chapter.

Use the component_keyring_file

Configuration

Percona XtraDB Cluster inherits Percona Server for MySQL behavior for configuring the component_keyring_file. The following example illustrates how to use the component. Review: Use the keyring vault component for the latest information on keyring components.

Note

The component_keyring_file component should not be used for regulatory compliance.

Install the component using a manifest file and add the following options in the configuration file:

Create a manifest file named mysqld.my in the installation directory:

{
 "read_local_manifest": false,
 "components": "file://component_keyring_file"
}

Add the following options to your configuration file:

[mysqld]
component_keyring_file_data=<PATH>/keyring

The SHOW COMPONENTS statement checks if the component has been successfully loaded:

SHOW COMPONENTS;
Expected output
+----------------------------------------+
| Component_id                           |
+----------------------------------------+
| file://component_keyring_file          |
+----------------------------------------+

Note

PXC recommends the same configuration on all cluster nodes, and all nodes should have the keyring configured. A mismatch in the keyring configuration does not allow the JOINER node to join the cluster.

If the user has a bootstrapped node with keyring enabled, then upcoming cluster nodes inherit the keyring (the encrypted key) from the DONOR node.

Usage

XtraBackup re-encrypts the data using a transition-key and the JOINER node re-encrypts the data using a newly generated master-key.

Keyring (or, more generally, the Percona XtraDB Cluster SST process) is backward compatible, as in higher-version JOINER can join from lower-version DONOR, but the reverse is not supported.

Percona XtraDB Cluster does not allow combining nodes with encryption and nodes without encryption to maintain data consistency. For example, the user creates node-1 with encryption (keyring) enabled and node-2 with encryption (keyring) disabled. If the user attempts to create a table with encryption on node-1, the creation fails on node-2, causing data inconsistency. A node fails to start if it fails to load the keyring component.

Note

If the user does not specify the keyring parameters, the node does not know that it must load the keyring. The JOINER node may start, but it eventually shuts down when a DML-level inconsistency with the encrypted tablespace is detected.

If a node does not have an encrypted tablespace, its keyring file exists but is empty. When an encrypted table is created on that node, the keyring file becomes populated with the required encryption keys.

The JOINER node generates a keyring local to itself. InnoDB master key rotation, cluster-wide considerations, and backups are discussed inOperational maintenance and Backups and restore for encrypted clusters.

Compatibility

The Percona XtraDB Cluster SST process with keyring support is backward compatible. A higher-version JOINER can join from a lower-version DONOR, but the reverse is not supported.

Operational maintenance

Rotate the InnoDB master key

ALTER INSTANCE ROTATE INNODB MASTER KEY replaces the InnoDB master encryption key and re-encrypts existing tablespace keys inside the keyring on the server where the statement runs. Encrypted pages on disk are not immediately rewritten; instead, data pages are re-encrypted with the new master key as they are loaded into memory and then written back to disk, following the standard InnoDB process.

InnoDB master key and tablespace key relationship

Run rotation on a live member:

ALTER INSTANCE ROTATE INNODB MASTER KEY;
Expected outcome (client)
Query OK, 0 rows affected (0.03 sec)

Check the error log around the time of rotation for keyring or InnoDB messages. If rotation fails, do not assume later DML on encrypted objects is safe. Investigate and correct the failure first.

Replication and cluster behavior

Galera replication does not replicate ALTER INSTANCE ROTATE INNODB MASTER KEY automatically. Run the statement on each member to update that member’s keyring state.

With binary logging, ALTER INSTANCE ROTATE INNODB MASTER KEY is recorded, but PXC peers do not synchronize the key automatically. Plan to rotate on each cluster member as needed.

Example: asynchronous replica reading the PXC binary log

Suppose a traditional asynchronous replica uses CHANGE REPLICATION SOURCE TO (or legacy CHANGE MASTER TO) against one PXC writer, binary logging is enabled on the writer, and the replica applies events from the writer’s binary log.

  1. On the PXC writer, run ALTER INSTANCE ROTATE INNODB MASTER KEY;. The statement succeeds and the server records the DDL in the binary log when binary logging is on.
  2. The replica applies the same statement from the relay log. That execution rotates the InnoDB master key on the replica using the replica’s own keyring (component_keyring_file path or component_keyring_vault token and secret_mount_point on the replica host).
  3. The replica must already load the keyring component; the account that applies the event needs the ENCRYPTION_KEY_ADMIN privilege. With Vault, the replica must reach Vault with a valid token. A replica without a working keyring cannot complete the same rotation step the writer performed.

Operational pattern: finish the rolling rotation on every PXC member first. Then explicitly check that the asynchronous replica has successfully applied the binary log event without errors (using SHOW REPLICA STATUS\G and the replica error log). If the replica is intentionally lagging or its replication is filtered, either plan a separate rotation on the replica after completing the cluster’s rotation, or deliberately accept that the replica’s master key age will diverge until rotation is performed on the replica. Never assume the replica’s keyring state matches a PXC node’s; always verify directly.

SHOW REPLICA STATUS\G
Expected outcome (excerpt, healthy apply path)
...
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
              Last_SQL_Errno: 0
              Last_SQL_Error:
...

If Last_SQL_Errno is non-zero or Replica_SQL_Running is No, read Last_SQL_Error and the replica error log before the next rotation on the writer.

Propagation through the cluster

There is no automatic cluster-wide rotation. To align key age or policy across the cluster, run ALTER INSTANCE ROTATE INNODB MASTER KEY on each cluster node in turn during a planned window. Suggested pattern:

Rolling InnoDB master key rotation across cluster members

  1. Pick one member at a time; avoid rotating all members during peak load if the extra I/O is a concern.

  2. On the chosen member, run ALTER INSTANCE ROTATE INNODB MASTER KEY; and confirm success in the log.

  3. Repeat for every remaining member. A node that has not been rotated yet keeps the previous master key material for that node until you run the statement on that node.

  4. After the last member completes rotation, verify encrypted workloads on each node (for example SELECT from encrypted tables, short maintenance queries).

Plan the order and timing so SST and normal replication traffic remain supported. Mixed states (some members rotated, some not) are normal for a short interval; prolonged divergence without a clear reason warrants investigation.

With component_keyring_file, each node holds a keyring file local to that node; rotation affects only the node where the command runs. With component_keyring_vault, several nodes may share the same Vault mount and secret paths depending on deployment; coordinate rotation with Vault policies and monitoring so every instance still resolves the same logical keys after the operation, and confirm connectivity to Vault before rotating on each node.

GCache encryption uses a separate rotation statement (ALTER INSTANCE ROTATE GCACHE MASTER KEY); see GCache and Write-Set cache encryption.

For more background on InnoDB encryption and rotation semantics, see Percona Server for MySQL: InnoDB data encryption .

SST configuration and the [sst] section

State Snapshot Transfer (SST) runs Percona XtraBackup on the donor and on the joiner. The SST script builds a default xtrabackup command line; that default is enough for many installs. When the script needs to be extended, for example, the keyring manifest or component_keyring_* paths live outside the usual layout, xtrabackup needs an explicit --defaults-file, or component/plugin directories differ from what the script assumes and you must pass the missing flags yourself.

Encrypted cluster State Snapshot Transfer (SST) with XtraBackup

Add those flags under the [sst] group in my.cnf: inno-backup-opts (donor backup stage), inno-apply-opts (joiner prepare/apply stage), and inno-move-opts (joiner move stage). The SST script appends them to the corresponding xtrabackup invocation. Use the same paths and keyring-related options mysqld uses on that host so XtraBackup can read the same keys during SST.

Timeouts, transfer method, compression, and SSL for the SST channel are also configured under [sst]. For a full list of parameters and examples, see Percona XtraBackup SST configuration.

Note

When using component_keyring_vault, SST must use a method that supports Vault (for example XtraBackup-based SST). The section Configure PXC to use component_keyring_vault component notes that rsync SST is not supported with component_keyring_vault.

Backups and restore for encrypted clusters

Encrypted tablespaces remain opaque without the matching keyring material. Backup design must cover both the data copy and whichever keystore backs the keys.

Use Percona XtraBackup 8.4 with PXC 8.4 (see XtraBackup SST dependencies). The flow below reflects a typical production backup path; adjust paths, users, and retention for your site.

Encrypted PXC backup with Percona XtraBackup

Percona XtraBackup: privileges for encrypted instances

Create a dedicated backup account on each node you back up. The following grants match the minimum set described in Percona XtraBackup connection and privileges for full backups (including SELECT on performance_schema.keyring_component_status):

CREATE USER 'bkpuser'@'localhost' IDENTIFIED BY 's3cr%T';
GRANT BACKUP_ADMIN, PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'bkpuser'@'localhost';
GRANT SELECT ON performance_schema.log_status TO 'bkpuser'@'localhost';
GRANT SELECT ON performance_schema.keyring_component_status TO 'bkpuser'@'localhost';
GRANT SELECT ON performance_schema.replication_group_members TO 'bkpuser'@'localhost';
FLUSH PRIVILEGES;
Expected outcome (client)
Query OK, 0 rows affected (0.01 sec)
...
Query OK, 0 rows affected (0.00 sec)

Replication topologies may also need REPLICATION_SLAVE_ADMIN on the account that runs XtraBackup (same upstream privileges page). Confirm with:

SHOW GRANTS FOR 'bkpuser'@'localhost';
Expected outcome (sample SHOW GRANTS output)
+-------------------------------------------------------------+
| Grants for bkpuser@localhost                                |
+-------------------------------------------------------------+
| GRANT BACKUP_ADMIN, PROCESS, RELOAD, LOCK TABLES, ... ON *.* |
| GRANT SELECT ON `performance_schema`.`log_status` TO ...    |
| GRANT SELECT ON `performance_schema`.`keyring_component_... |
...

Percona XtraBackup: backup, prepare, restore

Point XtraBackup at the same option file the server uses so the keyring manifest and component paths resolve the same way mysqld does. Example full backup to a dedicated directory:

xtrabackup --defaults-file=/etc/mysql/mysql.cnf --backup \
  --target-dir=/backup/$(date +%F)/full \
  --user=bkpuser --password='...'
Expected outcome (excerpt)
...
xtrabackup: Transaction log of LSN (...): (...), was copied.
xtrabackup: completed OK!

Prepare the backup (still offline, on the backup host or a staging host with enough space):

xtrabackup --prepare --target-dir=/backup/$(date +%F)/full
Expected outcome (excerpt)
...
xtrabackup: Shutdown completed; log sequence number ...
xtrabackup: completed OK!

Restore on a clean data directory (stop mysqld on the target, remove or empty the old datadir except what your runbook allows, fix ownership after copy):

xtrabackup --copy-back --target-dir=/backup/$(date +%F)/full
Expected outcome (excerpt)
...
xtrabackup: completed OK!

Use --move-back instead of --copy-back when the runbook requires moving rather than copying prepared files. Start mysqld only after file ownership matches the database user.

Incremental and partial backups add flags such as --incremental and --incremental-basedir; see the Percona XtraBackup 8.4 manual when you need those workflows—the encrypted-keyring requirement stays the same: the backup process must reach the same keys the server used.

Keyring-specific notes for backups

component_keyring_file

Include the file named by component_keyring_file_data in backup scope (same snapshot window as the XtraBackup run, or a copy taken under your storage team’s consistency rules). After --copy-back, place the keyring file where the restored my.cnf expects before starting mysqld. Losing the keyring file while keeping only ibd files blocks decrypting user data.

component_keyring_vault

The xtrabackup process uses the server’s keyring configuration: component_keyring_vault.cnf must be valid on the host where backup runs, and that host must reach vault_url with a token that can read the same secrets as mysqld. Firewall and certificate requirements apply to the backup host as well as the database node.

XtraBackup does not run Vault AppRole login or token renewal itself. The binary loads the same kind of component configuration mysqld uses (paths come from --defaults-file and the manifest layout described in Percona Server keyring vault component documentation ). Whatever token value is in the JSON at backup time is what XtraBackup sends to Vault.

  • Same token as the server: a common pattern on the database host is to reuse the same component_keyring_vault.cnf that Vault Agent (or a template) keeps updated, and to run xtrabackup from cron or automation with --defaults-file pointing at that instance. The OS user that runs XtraBackup must be able to read the JSON, vault_ca if set, and any separate file the JSON references (for example an agent sink you copy from before backup). Align ownership and mode with your security model; the backup user does not have to be the mysql system user, but the keyring files must be readable for that user.
  • Separate Vault token or AppRole: not required by XtraBackup for correctness. Many teams still create a dedicated Vault policy and token (or AppRole consumed by a second agent or job) with read-only access to the same secret_mount_point paths, then deploy a second component_keyring_vault.cnf (or a backup-only defaults fragment) used only by the backup task. That limits blast radius if a backup host or cron credential leaks. XtraBackup only needs a token that can read the keys for that instance’s mount; whether that token is shared with mysqld or not is an operational choice.

Remote or jump-host backups must ship the same configuration shape to the host that runs xtrabackup, supply a valid token there (agent, secret store, or short-lived credential), and ensure that host can reach vault_url.

A successful restore still requires a live Vault (or restored Vault storage) at the expected mount, plus a valid token on the restored server—see Disaster recovery and Authentication lifecycle.

Binary logs and other encrypted streams

If binary log encryption or redo encryption is enabled, backup and DR plans must include whatever key material those features require, in line with Percona Server for MySQL: InnoDB data encryption .

Configure PXC to use component_keyring_vault component

component_keyring_vault

The component_keyring_vault stores the master encryption key in a HashiCorp Vault server instead of a local file like component_keyring_file.

Configuration

Configuration options are the same as Percona Server for MySQL .

Create a manifest file named mysqld.my in the installation directory:

{
 "read_local_manifest": false,
 "components": "file://component_keyring_vault"
}

Create a configuration file component_keyring_vault.cnf in JSON format:

{
 "timeout": 15,
 "vault_url": "https://vault.public.com:8202",
 "secret_mount_point": "secret",
 "secret_mount_point_version": "AUTO",
 "token": "{randomly-generated-alphanumeric-string}",
 "vault_ca": "/data/keyring_vault_confs/vault_ca.crt"
}

The secret_mount_point_version parameter defaults to AUTO and controls whether the Vault KV Secrets Engine is version 1 (kv) or version 2 (kv-v2). Using the wrong KV version can cause silent failures during keyring operations.

After mysqld starts with the manifest and JSON in place, confirm the component:

SHOW COMPONENTS;
Expected outcome
+----------------------------------------+
| Component_id                           |
+----------------------------------------+
| file://component_keyring_vault         |
+----------------------------------------+

Warning

Token Security: Avoid embedding long-lived tokens directly in configuration files. Consider using Vault’s AppRole authentication or dynamic token retrieval mechanisms for enhanced security.

The detailed description of the keyring vault component options can be found in the Percona Server for MySQL keyring vault component documentation .

Vault-server is an external server, so make sure the PXC node can reach the server.

Typical Vault layout (what most teams run)

The usual pattern is a highly available Vault cluster in each environment (for example three-node Raft, or Consul-backed), exposed to database hosts as one logical endpoint: a DNS name or load-balanced VIP in vault_url. All PXC nodes talk to that shared Vault infrastructure.

Percona’s keyring still requires each mysqld instance to have its own secret_mount_point namespace (do not point two servers at the same mount point). A common compromise is the same Vault cluster and KV engine, but different mount paths or sub-paths per node (for example pxc-prod/node1, pxc-prod/node2) so policies stay simple and keys never collide.

Less common: one standalone Vault per database host (higher operational load). Multi-region setups usually replicate Vault data or run a Vault cluster per region; database hosts in a region use the regional vault_url.

Authentication lifecycle

The credential in component_keyring_vault.cnf is the Vault token field (sometimes referred to as vault_token in operational runbooks). Vault tokens are not indefinite: each token carries a Time-To-Live (TTL) and expires unless renewed or replaced. When the token expires, the keyring component can no longer read or write keys in Vault, and the server may fail to open encrypted tablespaces or to start.

What the server does not do

component_keyring_vault reads a bearer token string from JSON configuration and calls Vault’s HTTP API. The keyring component does not implement AppRole login, Kubernetes auth, LDAP, or any other Vault authentication method, and does not renew tokens on a schedule.

The legacy keyring Vault plugin available in older releases behaved the same way at the token layer: the server expected a usable token in configuration rather than performing multi-step Vault login flows.

PXC 8.4 uses the keyring component model only; the keyring plugin is not supported in 8.4 (see Upgrade guide). Regardless of plugin versus component, the deployment must supply a valid token string through external automation.

Minimum external automation

Because the server does not refresh Vault credentials by itself, run at least one of the following on each database host (or deliver an equivalent outcome through your platform):

  • Vault Agent with auto_auth (AppRole, AWS, or another method) and a sink or template that writes a fresh token before the current token’s TTL ends, or
  • A scheduled job (for example cron) plus the Vault CLI or API client that logs in, fetches a new token, and updates the JSON the keyring reads—followed by ALTER INSTANCE RELOAD KEYRING or a controlled mysqld restart on that host so the running server picks up the file (see Applying an updated token on a running server).

Without one of those patterns (or a custom sidecar with the same effect), long-lived static tokens eventually expire and the cluster loses access to keys.

Applying an updated token on a running server

component_keyring_vault does not poll the JSON configuration file. Changing component_keyring_vault.cnf on disk does not, by itself, replace the token the running server already holds in memory from the last startup or reload.

After automation writes a new token, use one of the following:

  • Reload the keyring component without a full mysqld restart: on Percona Server for MySQL 8.4, run ALTER INSTANCE RELOAD KEYRING. The server instructs the installed keyring component to re-read its configuration file and reinitialize in-memory keyring data; a revised token value in the file takes effect after this statement succeeds. The account needs the ENCRYPTION_KEY_ADMIN privilege. The statement is not written to the binary log, so execute it on each PXC member when that member’s JSON file changes (or invoke it from per-host automation). See ALTER INSTANCE Statement in the MySQL Reference Manual.
ALTER INSTANCE RELOAD KEYRING;
Expected outcome (client)
Query OK, 0 rows affected (0.01 sec)
  • Restart mysqld: a stop/start cycle reloads component configuration from disk. Prefer this path when your runbook already uses restarts, or when troubleshooting a failed reload.

Write the updated JSON using an atomic replace when possible (for example write to a temporary file in the same directory, then rename into component_keyring_vault.cnf) so the server never reads a partially written file if automation and ALTER INSTANCE RELOAD KEYRING overlap.

Example: AppRole with Vault Agent (illustrative)

The following outlines a common pattern; adjust paths, mount names, and TTLs for production. Full AppRole hardening belongs in HashiCorp’s AppRole documentation .

  1. On the Vault server, enable the AppRole auth method and create a policy that allows the keyring paths your secret_mount_point uses (KV read/write as required by Percona Server keyring vault component documentation ).

  2. Create an AppRole bound to that policy. Distribute role_id and secret_id to the database host using your secret distribution standard (never commit real values to configuration repositories). When Secret IDs expire or rotate, use the same standards—see AppRole Secret ID lifecycle.

  3. Run Vault Agent on the PXC host with auto_auth similar to:

pid_file = "/var/run/vault-agent-pid"

vault {
  address = "https://vault.example.com:8200"
  ca_cert = "/etc/mysql/vault-ca.pem"
}

auto_auth {
  method "approle" {
    config = {
      role_id_file_path   = "/etc/mysql/vault/role_id"
      secret_id_file_path = "/etc/mysql/vault/secret_id"
    }
  }
  sink "file" {
    config = {
      path = "/var/lib/mysql-vault/token"
      mode = 0640
    }
  }
}
  1. Point component_keyring_vault.cnf at the token the agent maintains. If the component on your build only accepts an inline token value, use an agent template block to render the full JSON file whenever the token rotates, or run a short wrapper script that copies the sink file into the token field and then runs ALTER INSTANCE RELOAD KEYRING (with ENCRYPTION_KEY_ADMIN) or signals a controlled mysqld restart. Confirm the exact integration path against the Percona Server version in use.

Equivalent one-off login from a shell session (useful for testing, not a substitute for renewal automation):

vault write -field=token auth/approle/login role_id="$ROLE_ID" secret_id="$SECRET_ID"
Expected outcome
hvs.CAESIJ3NmnVQN...

A single-line token is printed to standard output (no table). Paste the returned value into the token field only for short-lived tests; production still requires Vault Agent or cron-driven renewal before TTL expiry.

AppRole Secret ID lifecycle

Vault Agent’s AppRole auto_auth exchanges the role_id and secret_id read from disk for a Vault token, then maintains that token (renewal or re-authentication per agent settings). That path covers token TTL, not the lifecycle of the Secret ID file on disk. Whether a Secret ID expires or must be rotated deliberately is controlled on the Vault side (secret_id_ttl, bind_secret_id, and related AppRole settings—see HashiCorp’s AppRole documentation ).

mysqld and component_keyring_vault never handle Secret IDs. When a new Secret ID must reach the PXC host, rely on the same secret-distribution layer you use for other machine credentials, for example:

  • Issue a new Secret ID from Vault with a tightly scoped operator or automation role, optionally using response wrapping or another one-time delivery pattern, and write the value to secret_id_file_path with strict ownership and mode.
  • Pull from your organization’s secrets store (cloud SM, Kubernetes Secret, configuration management with audit) into that path during provisioning or rotation windows.
  • After the file changes, restart Vault Agent (or use the reload procedure HashiCorp documents for your agent version) so the next authentication uses the updated Secret ID; then confirm the token sink still updates and, if applicable, ALTER INSTANCE RELOAD KEYRING or your restart policy still applies for mysqld.

Percona XtraDB Cluster documentation does not prescribe a single vendor workflow for Secret ID rotation; align with Vault operations and compliance requirements at your site.

Startup ordering: Vault Agent before mysqld

A PXC joiner (and any node where mysqld starts cold) needs a working Vault token as soon as the server opens encrypted tablespaces and as soon as SST or XtraBackup stages need the keyring. The following avoids a common race: mysqld starts while Vault Agent has not yet finished its first auto_auth and the sink or template has not written a usable token. These guidelines do not replace your own systemd units, Helm charts, or compose files—encode the ordering and timeouts there.

  • With systemd, run Vault Agent in its own service. Make the mysqld unit start after that service (After=vault-agent.service) and keep the dependency explicit (Requires=vault-agent.service or Wants=vault-agent.service, per your policy). A Type=simple agent is often considered “started” as soon as the process exists, which can be before the first token write completes. Add a guarded wait on the mysqld side—for example an ExecStartPre script or drop-in that loops until the token file (or rendered component_keyring_vault.cnf) exists and is non-empty, with a timeout and a clear log line if the wait fails.
  • With containers, use an init container or entrypoint step that blocks until the same condition is true before exec’ing mysqld, or run the agent as a sidecar and only start mysqld after a health check or wait script sees a valid token path. Orchestrator depends_on alone usually reflects container start order, not “first auth succeeded,” so pair it with an explicit wait or readiness probe wired to the token file.
  • A joiner undergoing SST follows the same rule: when the node’s service comes up for a join, the token path the keyring reads must already be populated so XtraBackup-based or Clone SST can use component_keyring_vault on that host.

For auto_auth, sinks, and templates, see HashiCorp’s Vault Agent documentation.

Prefer Vault Agent on each PXC host so Vault issues and renews tokens on the local machine. Integrate agent output with component_keyring_vault.cnf so the token value stays valid before the credential TTL expires, following your change policy and the Percona Server keyring vault component documentation .

Without renewal, expect a cluster-wide outage risk: every node that uses an expired token loses access to encryption keys when those nodes restart or when the keyring next contacts Vault.

Warning

SST Limitation: The rsync tool does not support the component_keyring_vault. Any rsync-SST on a joiner is aborted if the component_keyring_vault is configured.

Uniform Component Configuration: Percona XtraDB Cluster strongly recommends using the same keyring component type on all cluster nodes. Mixing keyring component types is only recommended during controlled transitions from component_keyring_file to component_keyring_vault or the reverse. Inconsistent keyring configurations can lead to data inconsistency and cluster instability.

All nodes do not need to refer to the same vault server. Whatever vault server is used, the server must be accessible from the respective node. All nodes do not need to use the same mount point.

If the node is not able to reach or connect to the vault server, an error is notified during the server restart, and the node refuses to start:

The warning message
2018-05-29T03:54:33.859613Z 0 [Warning] Component component_keyring_vault reported:
'There is no vault_ca specified in component_keyring_vault's configuration file.
Please make sure that Vault's CA certificate is trusted by the machine
from which you intend to connect to Vault.'
2018-05-29T03:54:33.977145Z 0 [ERROR] Component component_keyring_vault reported:
'CURL returned this error code: 7 with error message : Failed to connect
to 127.0.0.1 port 8200: Connection refused'

When vault server connectivity issues occur, only the affected nodes fail to start. For example, if node-1 can connect to the vault server but node-2 cannot, only node-2 will refuse to start.

If a server has encrypted objects but cannot connect to the vault server during restart, those encrypted objects become inaccessible.

When the vault server is reachable but authentication credentials are incorrect, the same behavior occurs:

The warning message
2018-05-29T03:58:54.461911Z 0 [Warning] Component component_keyring_vault reported:
'There is no vault_ca specified in component_keyring_vault's configuration file.
Please make sure that Vault's CA certificate is trusted by the machine
from which you intend to connect to Vault.'
2018-05-29T03:58:54.577477Z 0 [ERROR] Component component_keyring_vault reported:
'Could not retrieve list of keys from Vault. Vault has returned the
following error(s): ["permission denied"]'

In case of an accessible vault-server with the wrong mount point, there is no error during server restart, but the node still refuses to start:

CREATE TABLE t1 (c1 INT, PRIMARY KEY pk(c1)) ENCRYPTION='Y';
Expected output
ERROR 3185 (HY000): Can't find master key from keyring, please check keyring
component is loaded.
... [ERROR] Component component_keyring_vault reported: 'Could not write key to Vault. ...
... [ERROR] Component component_keyring_vault reported: 'Could not flush keys to keyring'

Vault unavailable at restart: no persistent local key cache

component_keyring_vault stores key material in HashiCorp Vault for the configured secret_mount_point. Vault Agent on the database host renews the bearer token; that addresses token expiry, not loss of network reachability to Vault when the server must load keys from scratch.

On a cold mysqld start, the component expects to reach vault_url with a valid token so encrypted tablespaces can obtain master key data from Vault. If Vault is down, firewalled, or partitioned away at that moment, startup typically fails or encrypted objects do not open—the log excerpts in this section show common connection and permission errors. State held in memory by a previous mysqld process does not survive the process exit; there is no separate documented mode where the Vault component keeps a durable on-disk copy of those keys purely so the instance can boot while Vault is temporarily unavailable (unlike component_keyring_file, where keys live in a local keyring file the server reads without contacting a remote service).

Operational implication: plan HA Vault, network paths, and restart sequencing so nodes that use component_keyring_vault can still reach Vault when they bootstrap. Running instances may continue to serve data that was already decrypted while memory is warm, until an operation forces a new Vault round-trip; that behavior does not replace Vault availability for restarts—see Vault endpoint unreachable while mysqld is still running.

Disaster recovery

Warning

Vault as a control plane for data access: With component_keyring_vault, the Vault service and the network path to Vault become a single point of failure for accessing encrypted data. If Vault is unavailable, nodes may refuse to start or cannot decrypt tablespaces even though data files on disk are intact. High availability, monitoring, and disaster-recovery planning for Vault are as important as planning for the database tier.

Backups and keyring metadata: Day-to-day backup scope for encrypted PXC is described in Backups and restore for encrypted clusters. A physical backup of the datadir alone is not sufficient to recover encrypted data if Vault secrets are lost or if the keyring cannot reach Vault. Your backup and DR procedures must include:

  • Vault data and policy: Back up Vault’s own storage (or rely on Vault’s supported replication and snapshot model) so secrets engine data and mounts can be restored. Document mount paths, KV version, and ACL policies used by component_keyring_vault.

  • Keyring configuration: Securely retain copies of component_keyring_vault.cnf (without long-lived plaintext tokens where avoidable), CA material (vault_ca), and any automation (for example Vault Agent or AppRole role IDs) needed to obtain a valid token after restore.

  • Percona Server / PXC configuration: Include manifest and my.cnf fragments that load component_keyring_vault so a restored host can load the same component configuration.

Restore drills should verify that a recovered node can authenticate to Vault and that encrypted tables open successfully before relying on the procedure in production.

Day 2 operations for highly available clusters

Highly available PXC depends on more than a healthy quorum: backups must be restorable while keys remain available, rotations must not surprise async replicas, and incidents involving Vault or tokens need a written path. Use subsections below as runbook starters; fold them into your change-management templates.

Operational cadence (what to put on the calendar)

Cadence Suggested focus
Daily or weekly Automated backup success per policy (every node that takes a local backup, or the shared tooling you use). Alert on non-zero exit codes from xtrabackup jobs.
Monthly Spot-check that backup artifacts and key material (keyring file or Vault path + token renewal) still line up with the restore runbook.
Quarterly Execute a restore test on non-production hardware or a disposable instance; include component_keyring_vault token acquisition the same way production does.
Per security policy Rolling ALTER INSTANCE ROTATE INNODB MASTER KEY on each member; if GCache encryption is enabled, plan ALTER INSTANCE ROTATE GCACHE MASTER KEY per GCache encryption.
Before Vault or OS upgrades Reconfirm TTL versus Vault Agent or cron renewal; schedule a maintenance window if mysqld restarts are required after config changes.

Write down RPO/RTO targets for two cases: loss of a single PXC member (usually SST or restore from backup) versus loss of Vault storage (cluster data may exist on disk but stay unreadable until Vault returns).

HA-aware rotation sequencing

  1. Confirm cluster health (wsrep_cluster_status, wsrep_cluster_size, flow control off) before the first rotation:
SHOW STATUS LIKE 'wsrep_cluster_status';
SHOW STATUS LIKE 'wsrep_cluster_size';
Expected outcome (healthy cluster, illustrative)
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_cluster_size   | 3     |
+----------------------+-------+

Exact Value strings depend on Galera version and topology; treat anything other than a primary cluster with the expected member count as a blocker until the state is understood.

  1. Rotate one member at a time. Prefer starting with members that are not the sole source of read traffic your app relies on, and avoid piling heavy rotation I/O onto the same host that is already donor for an SST.
  2. After each member, run the checks under After rolling ALTER INSTANCE ROTATE INNODB MASTER KEY in the verification checklists before touching the next host.
  3. If rotation errors on a member, stop the rollout, capture the error log, restore Vault or file-keyring access on that host, and only continue once encrypted reads succeed there again.
  4. If asynchronous replicas consume the writer’s binary log, plan their rotation or relay log apply order as described under Example: asynchronous replica reading the PXC binary log.

There is no supported “undo” of a completed InnoDB master key rotation; forward-fix keyring and Vault issues instead of reverting the statement.

Backups and restore when the cluster is under stress

  • A failed xtrabackup job on one node does not by itself break Galera, but you lose redundancy in your backup history—treat repeat failures as a production risk.
  • Stagger heavy full backups across members when disk and network headroom are tight; each backup still needs the same keyring or Vault access as mysqld.
  • Restoring a single evicted member: follow your standard PXC procedure (often wipe the local datadir and rejoin for SST, or restore a prepared backup into an empty datadir) only after component_keyring_vault.cnf and Vault connectivity match production on that host.
  • Total cluster loss: bring Vault (or restored Vault storage) and valid tokens online first, then follow the backup, prepare, and restore commands under Backups and restore for encrypted clusters, then start members according to your bootstrap runbook.

Incident playbooks: Vault, tokens, and SST

Vault endpoint unreachable while mysqld is still running

Avoid restarting every node at once; running instances may continue serving cached paths until an operation needs a new Vault round-trip. Open a Sev-1 on the Vault or network path, validate TCP/TLS from one database host (curl to vault_url with CA), and restore HA Vault service. After recovery, use a single canary mysqld restart in staging before mass restarts in production.

permission denied or token expiry after a restart

Renew the token through Vault Agent or your cron workflow, update component_keyring_vault.cnf if your process writes the token there, then run ALTER INSTANCE RELOAD KEYRING on that member (with ENCRYPTION_KEY_ADMIN) or restart mysqld, confirm encrypted tables, and roll the same steps across the rest of the cluster.

Joiner aborts SST with component_keyring_vault

Confirm the joiner uses XtraBackup-based or Clone SST (not rsync). Verify the joiner reaches vault_url, uses the correct secret_mount_point, and carries a valid token before mysqld and SST stages need the keyring (see Startup ordering: Vault Agent before mysqld). Inspect donor and joiner SST logs; see Percona XtraBackup SST configuration.

Encrypted table errors only on one member

Compare SHOW COMPONENTS and keyring config paths with a healthy peer. For component_keyring_file, compare file permissions and inode. For Vault, compare effective policy and mount path. Do not DELETE Galera state files on a whim—follow your cluster recovery documentation before any destructive step.

Monitoring and alerts (minimum)

Point monitoring at:

  • MySQL error log lines containing component_keyring_vault, keyring, Could not retrieve, permission denied, or Failed to connect toward vault_url.
  • Backup scheduler exit status and backup size trends (sudden shrink may mean an empty or failed run).
  • PXC variables such as wsrep_cluster_status, wsrep_local_state_comment, and flow control counters after maintenance.

Correlate Vault HA health checks with database alarms so on-call knows whether to page the Vault team or the DBA team first.

Verification checklists

Use these during maintenance windows or when validating automation. Record hostnames, timestamps, and log excerpts in the change ticket.

After rolling ALTER INSTANCE ROTATE INNODB MASTER KEY

On each cluster member where rotation ran:

  1. Scan the error log from a few minutes before and after the rotation for keyring, InnoDB, or ER_ messages; treat new errors as blocking until reviewed.
  2. Run a read query against at least one known encrypted table and compare a checksum or row count against another member if your policy requires symmetry:
SELECT COUNT(*) AS cnt FROM schema_name.encrypted_table LIMIT 1;
Expected outcome
+-----+
| cnt |
+-----+
|  42 |
+-----+
  1. Optional: inspect keyring status (exact columns vary by release):
SELECT * FROM performance_schema.keyring_component_status;
Expected outcome (excerpt)
+---------------------+------------------------------------------+
| STATUS_KEY          | STATUS_VALUE                             |
+---------------------+------------------------------------------+
| Component_name      | component_keyring_vault                  |
| Component_status    | Active                                   |
| Author              | Percona Corporation                      |
...
+---------------------+------------------------------------------+

Row set depends on the loaded component; see The keyring_component_status table .

If any step fails on a node, pause the rollout, fix the keyring or Vault issue, then repeat rotation on that node after the root cause is cleared.

After a Vault or network drill

Following a controlled failover, token rotation, firewall change, or certificate renewal:

  1. Restart mysqld on a non-production canary host first when possible; confirm startup completes without keyring errors in the log.
  2. On a member allowed to accept DDL in your policy, create and drop a small encrypted table in a scratch schema and confirm no Vault permission errors appear in the log:
CREATE SCHEMA IF NOT EXISTS scratch;
CREATE TABLE scratch.t_vault_check (id INT PRIMARY KEY) ENCRYPTION='Y';
DROP TABLE scratch.t_vault_check;
Expected outcome (client)
Query OK, 1 row affected (0.01 sec)
Query OK, 0 rows affected (0.05 sec)
Query OK, 0 rows affected (0.02 sec)
  1. If your organization uses Vault audit devices, verify the keyring traffic during the test window matches expectation (paths, policies, HTTP result codes).

After XtraBackup restore (encrypted cluster)

  1. Start mysqld and confirm the keyring component:
SHOW COMPONENTS;
Expected outcome
+----------------------------------------+
| Component_id                           |
+----------------------------------------+
| file://component_keyring_file          |
+----------------------------------------+

Use file://component_keyring_vault in the Component_id column when the restored instance uses Vault.

  1. Run the same encrypted-table SELECT checks used after master key rotation.
  2. Confirm replication or cluster membership rejoin steps from your platform runbook succeed before declaring the restore complete.

Compliance and audit evidence

Regulatory and internal risk frameworks differ; the list below maps common review questions to PXC-relevant artifacts. For algorithms, encrypted object types, and Percona Server semantics, use Percona Server for MySQL: InnoDB data encryption and related upstream manuals—this section does not restate that material.

Who can decrypt data (evidence to maintain):

  • A matrix of roles (database, platform, security, backup operations) against the controls they touch: manifest and my.cnf for the keyring component; path to component_keyring_file_data or to component_keyring_vault.cnf; Vault policies and mounts for secret_mount_point; OS users that run mysqld, Vault Agent, XtraBackup, or configuration management; MySQL accounts with ENCRYPTION_KEY_ADMIN, backup-related dynamic privileges, or table read access. Auditors usually want named responsibilities and change approval paths, not a second copy of InnoDB internals.

Where tokens and secrets live (evidence to maintain):

  • Written description of every path that can contain a Vault token, AppRole role_id / secret_id, or rendered JSON the keyring reads; which automation writes each file; file permissions and owning user or group; and policy that long-lived secrets do not live in application source repositories. Cross-check against Authentication lifecycle and AppRole Secret ID lifecycle.

Rotation and database-side logs (evidence to maintain):

  • Change records for rolling ALTER INSTANCE ROTATE INNODB MASTER KEY (and GCache rotation if used) listing each host, time window, executor, and outcome; excerpts from the MySQL error log around each rotation; optional query output from verification checklists. When binary logging is enabled on a writer, remember rotation can appear in the binary log for downstream consumers—your evidence set should match how you treat replicated DDL. See Replication and cluster behavior.

Vault audit and API access (evidence to maintain):

  • Whether Vault audit devices are enabled, where audit logs are stored, retention, and who can read them; sample correlation between a maintenance window and keyring traffic (HTTP paths, policies, status codes). The After a Vault or network drill checklist already asks teams to validate audit expectations after a controlled test.

Backups and restore custody (evidence to maintain):

  • Backup schedules, retention, locations, and which roles can read backup media plus any backup-only Vault token or component_keyring_vault.cnf used by XtraBackup—see Backups and restore for encrypted clusters.

Mix keyring component types

With XtraBackup introducing transition-key logic, you can now mix and match keyring components. For example, node-1 can be configured to use the component_keyring_file component while node-2 uses component_keyring_vault.

Warning

Percona strongly recommends the same keyring component configuration for all cluster nodes. Mixing keyring component types is only recommended during controlled transitions from one keyring type to another. Inconsistent configurations can cause data corruption and cluster failures.

Migrate keys between keyring keystores

Percona XtraDB Cluster supports key migration between keystores. The migration can be performed offline or online using a migration server with specific configuration options.

Offline migration

In offline migration, the node to migrate is shut down, and the migration server takes care of migrating keys for the said server to a new keystore.

For example, a cluster has three Percona XtraDB Cluster nodes, n1, n2, and n3. The nodes use the component_keyring_file. To migrate the n2 node to use component_keyring_vault, use the following procedure:

  1. Shut down the n2 node.

  2. Start the Migration Server (mysqld with a special option).

  3. The Migration Server copies the keys from the n2 keyring file and adds them to the vault server.

  4. Start the n2 node with the vault parameter, and the keys are available.

Run the migration server:

/dev/shm/pxc84/bin/mysqld --defaults-file=/dev/shm/pxc84/copy_mig.cnf \
--keyring-migration-source=component_keyring_file \
--component_keyring_file_data=/dev/shm/pxc84/node2/keyring \
--keyring-migration-destination=component_keyring_vault \
--component_keyring_vault_config=/dev/shm/pxc84/vault/component_keyring_vault.cnf &
Expected log output
... [Note] --secure-file-priv is set to NULL. Operations related to importing and
    exporting data are disabled
... [Warning] WSREP: Node is not a cluster node. Disabling pxc_strict_mode
... [Note] /dev/shm/pxc84/bin/mysqld (mysqld 8.4-debug) starting as process 5710 ...
... [Note] Keyring migration successful.

On a successful migration, the destination keystore receives additional migrated keys (pre-existing keys in the destination keystore are not touched or removed). The source keystore retains the keys as the migration performs a copy operation and not a move operation.

If the migration fails, the destination keystore is unchanged.

Online migration

In online migration, the node to migrate is kept running, and the migration server takes care of migrating keys for the said server to a new keystore by connecting to the node.

For example, a cluster has three Percona XtraDB Cluster nodes, n1, n2, and n3. The nodes use the component_keyring_file. Migrate the n3 node to use component_keyring_vault using the following procedure:

  1. Start the Migration Server (mysqld with a special option).

  2. The Migration Server copies the keys from the n3 keyring file and adds them to the vault server.

  3. Restart the n3 node with the vault parameter, and the keys are available.

/dev/shm/pxc84/bin/mysqld --defaults-file=/dev/shm/pxc84/copy_mig.cnf \
--keyring-migration-source=component_keyring_vault \
--component_keyring_vault_config=/dev/shm/pxc84/component_keyring_vault3.cnf \
--keyring-migration-destination=component_keyring_file \
--component_keyring_file_data=/dev/shm/pxc84/node3/keyring \
--keyring-migration-host=localhost \
--keyring-migration-user=root \
--keyring-migration-port=16300 \
--keyring-migration-password='' &
Expected log output
... [Note] Keyring migration successful.

On a successful migration, the destination keystore receives the additional migrated keys. Any pre-existing keys in the destination keystore are unchanged. The source keystore retains the keys as the migration performs a copy operation and not a move operation.

If the migration fails, the destination keystore is not changed.

Migration server options

  • --keyring-migration-source: The source keyring component that manages the keys to be migrated.

  • --keyring-migration-destination: The destination keyring component to which the migrated keys are to be copied

    Note

    For offline migration, no additional key migration options are needed.

  • --keyring-migration-host: The host where the running server is located. The host named by --keyring-migration-host is always the local host.

  • --keyring-migration-user, --keyring-migration-password: The username and password for the account used to connect to the running server.

  • --keyring-migration-port: Used for TCP/IP connections, the running server’s port number used to connect.

  • --keyring-migration-socket: Used for Unix socket file or Windows named pipe connections, the running server socket or named pipe used to connect.

Prerequisite for migration:

Make sure to pass required keyring options and other configuration parameters for the two keyring components. For example, if component_keyring_file is one of the components, you must explicitly configure the component_keyring_file_data system variable in the my.cnf file.

Other non-keyring options may be required as well. One way to specify the non-keyring options is by using --defaults-file to name an option file that contains the required options.

[mysqld]
basedir=/dev/shm/pxc84
datadir=/dev/shm/pxc84/copy_mig
log-error=/dev/shm/pxc84/logs/copy_mig.err
socket=/tmp/copy_mig.sock
port=16400