Certificate Rotation Procedure¶
Audience: operators managing edge node identity certificates.
Background¶
Edge nodes use ECDSA P-256 leaf certificates signed by a local CA for mTLS transport.
The AutonomyOps autonomy cert commands inspect, issue, rotate, and revoke these
certificates using a locally managed CA and CRL.
Default validity: 90 days. Certificates approaching expiry should be rotated before
the autonomy cert list --expiring-within-days threshold triggers an alert.
0. RBAC prerequisites¶
Cert mutation commands require cert:manage. Read-only inspection commands
(cert list, cert check-revocation) accept cert:read or cert:manage
when RBAC enforcement is active (the default since PR-29-followup-a).
On a fresh deployment, bootstrap with a predefined role first. Bootstrap mode
allows rbac role assign, not rbac role create.
# 1. Bootstrap an RBAC administrator using a predefined role.
export AUTONOMY_OPERATOR=bootstrap-admin@example.com
autonomy rbac role assign --role auditor --subject bootstrap-admin@example.com
# 2. Create cert-specific custom roles after bootstrap is complete.
autonomy rbac role create --name cert-reader --permissions cert:read
autonomy rbac role create --name cert-operator --permissions cert:manage
# 3. Assign the least-privilege role needed by each operator.
autonomy rbac role assign --role cert-reader --subject reviewer@example.com
autonomy rbac role assign --role cert-operator --subject alice@example.com
Set AUTONOMY_OPERATOR=<identity> before running any cert command so the RBAC
decision and any denial audit trail are correctly attributed.
If enforcement is not yet configured, set AUTONOMY_RBAC_ENFORCEMENT=0 to disable
enforcement (not recommended in production).
Denial is audited. When a cert operation is denied, an auth.access.denied
record is emitted to the audit log before the error is returned. Successful
mutations continue to emit their native cert events such as cert.issued,
cert.rotated, cert.revoked, and cert.crl.synced.
Use these audit queries to review access decisions:
autonomy audit query --category auth
autonomy audit query --category cert
1. Inspect certificate status¶
autonomy cert list \
--cert-file /etc/autonomy/edge.crt \
[--cert-file /etc/autonomy/backup.crt] \
[--expiring-within-days 30]
Example output (healthy):
IDENTITY FILE NOT_AFTER DAYS_LEFT STATUS
edge-node-7 /etc/autonomy/edge.crt 2026-06-15T00:00:00Z 89 ok
Example output (expiring soon):
IDENTITY FILE NOT_AFTER DAYS_LEFT STATUS
edge-node-7 /etc/autonomy/edge.crt 2026-04-17T00:00:00Z 29 expiring
IDENTITY FILE NOT_AFTER DAYS_LEFT STATUS
edge-node-7 /etc/autonomy/edge.crt 2026-03-10T00:00:00Z -8 expired
--expiring-within-days is a display filter, not an alerting mode: it narrows the output
to matching certificates but still exits successfully even when nothing matches. Use the
rendered status text or downstream parsing in cron/CI checks.
2. Rotate an existing certificate¶
Certificate rotation replaces the certificate and key files in place while preserving the same identity. The operation writes to a temporary file first, then atomically renames it into place — no partial writes.
autonomy cert rotate \
--cert-file /etc/autonomy/edge.crt \
--key-file /etc/autonomy/edge.key \
--ca-cert /etc/autonomy/ca.crt \
--ca-key /etc/autonomy/ca.key \
--identity edge-node-7 \
[--validity-days 90]
Expected output:
rotated identity=edge-node-7 cert=/etc/autonomy/edge.crt valid_days=90
Audit event emitted: cert.rotated.
Post-rotation verification¶
autonomy cert list --cert-file /etc/autonomy/edge.crt
# Confirm EXPIRY is approximately now + 90 days and STATUS is valid
# Confirm the running edged process will pick up the new cert
# edged re-reads cert files on each new TLS connection; no restart is required
# for TLS libraries that do not cache the cert. Verify with the transport layer:
openssl verify -CAfile /etc/autonomy/ca.crt /etc/autonomy/edge.crt
# Expect: /etc/autonomy/edge.crt: OK
3. Issue a new certificate (new identity)¶
Use issue when provisioning a new edge node or when the identity must change.
autonomy cert issue \
--cert-file /etc/autonomy/edge.crt \
--key-file /etc/autonomy/edge.key \
--ca-cert /etc/autonomy/ca.crt \
--ca-key /etc/autonomy/ca.key \
--identity edge-node-42 \
[--validity-days 90]
The issue operation is identical to rotate in implementation — both call the same
underlying runCertIssue function. The distinction is semantic: use issue for a new
identity, rotate for an in-place renewal of the same identity.
Expected output:
issued identity=edge-node-42 cert=/etc/autonomy/edge.crt valid_days=90
Audit event emitted: cert.issued.
4. Revoke a certificate and update the local CRL¶
Use revoke when a leaf certificate must no longer be trusted, for example after key
compromise or node decommissioning.
autonomy cert revoke \
--identity edge-node-7 \
--cert-file /etc/autonomy/edge.crt \
--ca-cert /etc/autonomy/ca.crt \
--ca-key /etc/autonomy/ca.key \
--crl-file /etc/autonomy/revoked.crl \
--reason key-compromise
Expected output:
revoked identity=edge-node-7 cert=/etc/autonomy/edge.crt crl=/etc/autonomy/revoked.crl serial=...
Notes:
--crl-filemay be omitted whenEDGE_CRL_FILEis already set in the environment.Re-running revoke for the same certificate is idempotent; the command reports
already_revokedand does not duplicate CRL entries.The CRL is managed offline. Transport enforcement requires the control-plane server to be configured with
CRLFile(see section 4a). A running control-plane now reloads the CRL on subsequent handshakes when the file changes on disk.For multi-node control-plane deployments, distribute the canonical CRL with
autonomy cert sync-crlor configure automatic pull refresh withautonomy-orchestrator serve --tls-crl-sync-url.
Audit event emitted: cert.revoked.
4a. Check revocation status¶
Use check-revocation to confirm a certificate’s serial appears (or does not appear) in
the local CRL before or after revoking.
autonomy cert check-revocation \
--cert-file /etc/autonomy/edge.crt \
--ca-cert /etc/autonomy/ca.crt \
--crl-file /etc/autonomy/revoked.crl
Expected output (not revoked):
not_revoked serial=4d2
Expected output (revoked):
revoked serial=4d2 reason=key-compromise revoked_at=2026-03-19T10:00:00Z
Exit code is 0 when not revoked, non-zero when revoked — suitable for use in scripts.
4b. Transport enforcement via --tls-crl-file¶
The control-plane server enforces revocation at the TLS handshake when started with a
CRLFile in its TLSConfig. On the supported CLI surface, this is exposed as
autonomy-orchestrator serve --tls-crl-file:
# Example control-plane startup with revocation enforcement enabled.
autonomy-orchestrator serve \
--listen 0.0.0.0:8443 \
--data-dir /var/lib/autonomy/orchestrator \
--tls-cert-file /etc/autonomy/server.crt \
--tls-key-file /etc/autonomy/server.key \
--tls-ca-file /etc/autonomy/ca.crt \
--tls-crl-file /etc/autonomy/revoked.crl
# 1. Revoke the certificate and update the CRL.
autonomy cert revoke \
--identity edge-node-7 \
--cert-file /etc/autonomy/edge.crt \
--ca-cert /etc/autonomy/ca.crt \
--ca-key /etc/autonomy/ca.key \
--crl-file /etc/autonomy/revoked.crl \
--reason key-compromise
# 2. Verify the revoked node is rejected.
# The running control-plane reloads the updated CRL on the next handshake;
# no restart is required.
# A connection attempt from the revoked node will fail with a TLS handshake error.
# The server logs: cert.revocation.rejected serial=<hex> subject=<cn>
Fail-closed guarantee: if CRLFile is set but the file is missing or has an invalid
CA signature, the server refuses to start rather than proceeding without CRL enforcement.
If the CRL later becomes unreadable or malformed on disk, subsequent client handshakes
fail closed until the CRL is corrected.
4c. CRL distribution across control-plane nodes¶
Use the control-plane CRL endpoint plus either the manual sync command or the built-in pull loop when more than one control-plane host must enforce the same revocation set.
Manual sync fallback:
autonomy cert sync-crl \
--min-sources 2 \
--source-url https://peer-a.example.internal:8443/v1/certs/crl \
--source-url https://leader.example.internal:8443/v1/certs/crl \
--source-url https://leader-b.example.internal:8443/v1/certs/crl \
--ca-cert /etc/autonomy/ca.crt \
--client-cert /etc/autonomy/server.crt \
--client-key /etc/autonomy/server.key \
--crl-file /etc/autonomy/revoked.crl
Expected output:
synced source=https://leader.example.internal:8443/v1/certs/crl matched=2 required=2 crl=/etc/autonomy/revoked.crl bytes=... sha256=...
Audit event emitted: cert.crl.synced.
With --min-sources 2, the command only accepts an update after two publishers
return the same CRL digest. Unreachable or mismatched publishers do not count
toward the threshold. The recorded source= value identifies the publisher
that completed the accepted quorum.
Automatic pull distribution on a follower node:
autonomy-orchestrator serve \
--listen 0.0.0.0:8443 \
--data-dir /var/lib/autonomy/orchestrator \
--tls-cert-file /etc/autonomy/server.crt \
--tls-key-file /etc/autonomy/server.key \
--tls-ca-file /etc/autonomy/ca.crt \
--tls-crl-file /etc/autonomy/revoked.crl \
--tls-crl-sync-min-sources 2 \
--tls-crl-sync-url https://peer-a.example.internal:8443/v1/certs/crl \
--tls-crl-sync-url https://leader.example.internal:8443/v1/certs/crl \
--tls-crl-sync-url https://leader-b.example.internal:8443/v1/certs/crl \
--tls-crl-sync-interval 30s
Notes:
The source node serves the current CRL from
GET /v1/certs/crlwhen it is started with--tls-crl-file.The sync loop performs one fail-closed fetch before the follower begins serving, then refreshes the local CRL on the configured interval.
--tls-crl-sync-min-sourcescontrols how many publishers must agree on the CRL digest before the follower accepts an update. Set it to2or greater when more than one authoritative publisher is available.Repeated
--tls-crl-sync-urlvalues still provide publisher availability, but the follower now treats them as a publisher set rather than only a fallback list when the minimum source threshold is above1.The same local certificate, key, and CA flags are reused for mTLS when the source endpoint requires client authentication.
5. CA key management prerequisites¶
Both issue and rotate require access to the CA private key (--ca-key). The CA key
should be:
Stored in a secrets manager or HSM in production.
Never placed on the edge node itself; certificate operations should run from a management workstation or CI pipeline.
Restricted by operating procedure and secret-management policy.
Protected by the dedicated CLI RBAC permission
cert:managewhile RBAC enforcement is enabled by default.
6. Certificate rotation in the HA control-plane context¶
When rotating certificates for control-plane nodes in an HA cluster:
Rotate the certificate on the standby node first (it is not the current leader).
Verify the standby can still connect to the primary PostgreSQL:
psql "$POSTGRES_URL" -c "SELECT pg_is_in_recovery();"
Rotate the certificate on the leader node.
The advisory lock keepalive loop will re-establish the PostgreSQL connection using the new certificate within one keepalive interval.
Confirm the leader is still write-ready:
curl -sf "${AUTONOMY_ORCHESTRATOR_URL}/v1/health/write-ready" | jq .
7. Automated rotation¶
To automate certificate rotation using a cron job:
# /etc/cron.d/autonomy-cert-rotate
# Run daily at 02:00, rotate if expiring within 14 days
0 2 * * * root \
sh -lc 'if autonomy cert list --cert-file /etc/autonomy/edge.crt --expiring-within-days 14 \
| tee /tmp/autonomy-cert-check.txt \
| grep -Eq "\\b(expiring|expired)\\b"; then \
autonomy cert rotate \
--cert-file /etc/autonomy/edge.crt \
--key-file /etc/autonomy/edge.key \
--ca-cert /etc/autonomy/ca.crt \
--ca-key /etc/autonomy/ca.key \
--identity "$(hostname)"; \
fi'
The --expiring-within-days check only filters output, so the cron job must inspect the
printed status values rather than relying on the process exit code. Only rotate when the
output contains expiring or expired.
Known gaps¶
No CA certificate rotation: The CA itself cannot be rotated via the CLI. CA rotation requires manual key replacement and re-issuance of all leaf certificates.
No OCSP support: Online Certificate Status Protocol is not implemented. Revocation checking relies on CRL only; there is no live OCSP responder integration.
No external PKI / OCSP integration: The repo now supports repeated CRL publishers plus a configurable agreement threshold for pull-based sync, but it does not integrate with OCSP, enterprise PKI platforms, or externally managed revocation responders. Operators still need to choose and operate the authoritative publisher set for their deployment.
edged does not auto-reload certificates: The
edgedprocess reads certificate files at TLS handshake time for most Go TLS configurations. Verify with your deployment that the running process will use the new certificate without restart.