Certificate Rotation Procedure

Audience: operators managing edge node identity certificates.

Background

Edge nodes use ECDSA P-256 leaf certificates signed by a local CA for mTLS transport. The AutonomyOps autonomy cert commands inspect, issue, rotate, and revoke these certificates using a locally managed CA and CRL.

Default validity: 90 days. Certificates approaching expiry should be rotated before the autonomy cert list --expiring-within-days threshold triggers an alert.


0. RBAC prerequisites

Cert mutation commands require cert:manage. Read-only inspection commands (cert list, cert check-revocation) accept cert:read or cert:manage when RBAC enforcement is active (the default since PR-29-followup-a).

On a fresh deployment, bootstrap with a predefined role first. Bootstrap mode allows rbac role assign, not rbac role create.

# 1. Bootstrap an RBAC administrator using a predefined role.
export AUTONOMY_OPERATOR=bootstrap-admin@example.com
autonomy rbac role assign --role auditor --subject bootstrap-admin@example.com

# 2. Create cert-specific custom roles after bootstrap is complete.
autonomy rbac role create --name cert-reader --permissions cert:read
autonomy rbac role create --name cert-operator --permissions cert:manage

# 3. Assign the least-privilege role needed by each operator.
autonomy rbac role assign --role cert-reader --subject reviewer@example.com
autonomy rbac role assign --role cert-operator --subject alice@example.com

Set AUTONOMY_OPERATOR=<identity> before running any cert command so the RBAC decision and any denial audit trail are correctly attributed.

If enforcement is not yet configured, set AUTONOMY_RBAC_ENFORCEMENT=0 to disable enforcement (not recommended in production).

Denial is audited. When a cert operation is denied, an auth.access.denied record is emitted to the audit log before the error is returned. Successful mutations continue to emit their native cert events such as cert.issued, cert.rotated, cert.revoked, and cert.crl.synced.

Use these audit queries to review access decisions:

autonomy audit query --category auth
autonomy audit query --category cert

1. Inspect certificate status

autonomy cert list \
  --cert-file /etc/autonomy/edge.crt \
  [--cert-file /etc/autonomy/backup.crt] \
  [--expiring-within-days 30]

Example output (healthy):

IDENTITY     FILE                     NOT_AFTER              DAYS_LEFT  STATUS
edge-node-7  /etc/autonomy/edge.crt  2026-06-15T00:00:00Z   89         ok

Example output (expiring soon):

IDENTITY     FILE                     NOT_AFTER              DAYS_LEFT  STATUS
edge-node-7  /etc/autonomy/edge.crt  2026-04-17T00:00:00Z   29         expiring
IDENTITY     FILE                     NOT_AFTER              DAYS_LEFT  STATUS
edge-node-7  /etc/autonomy/edge.crt  2026-03-10T00:00:00Z   -8         expired

--expiring-within-days is a display filter, not an alerting mode: it narrows the output to matching certificates but still exits successfully even when nothing matches. Use the rendered status text or downstream parsing in cron/CI checks.


2. Rotate an existing certificate

Certificate rotation replaces the certificate and key files in place while preserving the same identity. The operation writes to a temporary file first, then atomically renames it into place — no partial writes.

autonomy cert rotate \
  --cert-file /etc/autonomy/edge.crt \
  --key-file  /etc/autonomy/edge.key \
  --ca-cert   /etc/autonomy/ca.crt \
  --ca-key    /etc/autonomy/ca.key \
  --identity  edge-node-7 \
  [--validity-days 90]

Expected output:

rotated  identity=edge-node-7 cert=/etc/autonomy/edge.crt valid_days=90

Audit event emitted: cert.rotated.

Post-rotation verification

autonomy cert list --cert-file /etc/autonomy/edge.crt
# Confirm EXPIRY is approximately now + 90 days and STATUS is valid
# Confirm the running edged process will pick up the new cert
# edged re-reads cert files on each new TLS connection; no restart is required
# for TLS libraries that do not cache the cert. Verify with the transport layer:
openssl verify -CAfile /etc/autonomy/ca.crt /etc/autonomy/edge.crt
# Expect: /etc/autonomy/edge.crt: OK

3. Issue a new certificate (new identity)

Use issue when provisioning a new edge node or when the identity must change.

autonomy cert issue \
  --cert-file /etc/autonomy/edge.crt \
  --key-file  /etc/autonomy/edge.key \
  --ca-cert   /etc/autonomy/ca.crt \
  --ca-key    /etc/autonomy/ca.key \
  --identity  edge-node-42 \
  [--validity-days 90]

The issue operation is identical to rotate in implementation — both call the same underlying runCertIssue function. The distinction is semantic: use issue for a new identity, rotate for an in-place renewal of the same identity.

Expected output:

issued  identity=edge-node-42 cert=/etc/autonomy/edge.crt valid_days=90

Audit event emitted: cert.issued.


4. Revoke a certificate and update the local CRL

Use revoke when a leaf certificate must no longer be trusted, for example after key compromise or node decommissioning.

autonomy cert revoke \
  --identity edge-node-7 \
  --cert-file /etc/autonomy/edge.crt \
  --ca-cert /etc/autonomy/ca.crt \
  --ca-key /etc/autonomy/ca.key \
  --crl-file /etc/autonomy/revoked.crl \
  --reason key-compromise

Expected output:

revoked  identity=edge-node-7 cert=/etc/autonomy/edge.crt crl=/etc/autonomy/revoked.crl serial=...

Notes:

  • --crl-file may be omitted when EDGE_CRL_FILE is already set in the environment.

  • Re-running revoke for the same certificate is idempotent; the command reports already_revoked and does not duplicate CRL entries.

  • The CRL is managed offline. Transport enforcement requires the control-plane server to be configured with CRLFile (see section 4a). A running control-plane now reloads the CRL on subsequent handshakes when the file changes on disk.

  • For multi-node control-plane deployments, distribute the canonical CRL with autonomy cert sync-crl or configure automatic pull refresh with autonomy-orchestrator serve --tls-crl-sync-url.

Audit event emitted: cert.revoked.


4a. Check revocation status

Use check-revocation to confirm a certificate’s serial appears (or does not appear) in the local CRL before or after revoking.

autonomy cert check-revocation \
  --cert-file /etc/autonomy/edge.crt \
  --ca-cert /etc/autonomy/ca.crt \
  --crl-file /etc/autonomy/revoked.crl

Expected output (not revoked):

not_revoked  serial=4d2

Expected output (revoked):

revoked  serial=4d2 reason=key-compromise revoked_at=2026-03-19T10:00:00Z

Exit code is 0 when not revoked, non-zero when revoked — suitable for use in scripts.


4b. Transport enforcement via --tls-crl-file

The control-plane server enforces revocation at the TLS handshake when started with a CRLFile in its TLSConfig. On the supported CLI surface, this is exposed as autonomy-orchestrator serve --tls-crl-file:

# Example control-plane startup with revocation enforcement enabled.
autonomy-orchestrator serve \
  --listen 0.0.0.0:8443 \
  --data-dir /var/lib/autonomy/orchestrator \
  --tls-cert-file /etc/autonomy/server.crt \
  --tls-key-file /etc/autonomy/server.key \
  --tls-ca-file /etc/autonomy/ca.crt \
  --tls-crl-file /etc/autonomy/revoked.crl

# 1. Revoke the certificate and update the CRL.
autonomy cert revoke \
  --identity edge-node-7 \
  --cert-file /etc/autonomy/edge.crt \
  --ca-cert /etc/autonomy/ca.crt \
  --ca-key /etc/autonomy/ca.key \
  --crl-file /etc/autonomy/revoked.crl \
  --reason key-compromise

# 2. Verify the revoked node is rejected.
#    The running control-plane reloads the updated CRL on the next handshake;
#    no restart is required.
#    A connection attempt from the revoked node will fail with a TLS handshake error.
#    The server logs: cert.revocation.rejected  serial=<hex>  subject=<cn>

Fail-closed guarantee: if CRLFile is set but the file is missing or has an invalid CA signature, the server refuses to start rather than proceeding without CRL enforcement. If the CRL later becomes unreadable or malformed on disk, subsequent client handshakes fail closed until the CRL is corrected.


4c. CRL distribution across control-plane nodes

Use the control-plane CRL endpoint plus either the manual sync command or the built-in pull loop when more than one control-plane host must enforce the same revocation set.

Manual sync fallback:

autonomy cert sync-crl \
  --min-sources 2 \
  --source-url https://peer-a.example.internal:8443/v1/certs/crl \
  --source-url https://leader.example.internal:8443/v1/certs/crl \
  --source-url https://leader-b.example.internal:8443/v1/certs/crl \
  --ca-cert /etc/autonomy/ca.crt \
  --client-cert /etc/autonomy/server.crt \
  --client-key /etc/autonomy/server.key \
  --crl-file /etc/autonomy/revoked.crl

Expected output:

synced  source=https://leader.example.internal:8443/v1/certs/crl matched=2 required=2 crl=/etc/autonomy/revoked.crl bytes=... sha256=...

Audit event emitted: cert.crl.synced.

With --min-sources 2, the command only accepts an update after two publishers return the same CRL digest. Unreachable or mismatched publishers do not count toward the threshold. The recorded source= value identifies the publisher that completed the accepted quorum.

Automatic pull distribution on a follower node:

autonomy-orchestrator serve \
  --listen 0.0.0.0:8443 \
  --data-dir /var/lib/autonomy/orchestrator \
  --tls-cert-file /etc/autonomy/server.crt \
  --tls-key-file /etc/autonomy/server.key \
  --tls-ca-file /etc/autonomy/ca.crt \
  --tls-crl-file /etc/autonomy/revoked.crl \
  --tls-crl-sync-min-sources 2 \
  --tls-crl-sync-url https://peer-a.example.internal:8443/v1/certs/crl \
  --tls-crl-sync-url https://leader.example.internal:8443/v1/certs/crl \
  --tls-crl-sync-url https://leader-b.example.internal:8443/v1/certs/crl \
  --tls-crl-sync-interval 30s

Notes:

  • The source node serves the current CRL from GET /v1/certs/crl when it is started with --tls-crl-file.

  • The sync loop performs one fail-closed fetch before the follower begins serving, then refreshes the local CRL on the configured interval.

  • --tls-crl-sync-min-sources controls how many publishers must agree on the CRL digest before the follower accepts an update. Set it to 2 or greater when more than one authoritative publisher is available.

  • Repeated --tls-crl-sync-url values still provide publisher availability, but the follower now treats them as a publisher set rather than only a fallback list when the minimum source threshold is above 1.

  • The same local certificate, key, and CA flags are reused for mTLS when the source endpoint requires client authentication.

5. CA key management prerequisites

Both issue and rotate require access to the CA private key (--ca-key). The CA key should be:

  • Stored in a secrets manager or HSM in production.

  • Never placed on the edge node itself; certificate operations should run from a management workstation or CI pipeline.

  • Restricted by operating procedure and secret-management policy.

  • Protected by the dedicated CLI RBAC permission cert:manage while RBAC enforcement is enabled by default.


6. Certificate rotation in the HA control-plane context

When rotating certificates for control-plane nodes in an HA cluster:

  1. Rotate the certificate on the standby node first (it is not the current leader).

  2. Verify the standby can still connect to the primary PostgreSQL:

    psql "$POSTGRES_URL" -c "SELECT pg_is_in_recovery();"
    
  3. Rotate the certificate on the leader node.

  4. The advisory lock keepalive loop will re-establish the PostgreSQL connection using the new certificate within one keepalive interval.

  5. Confirm the leader is still write-ready:

    curl -sf "${AUTONOMY_ORCHESTRATOR_URL}/v1/health/write-ready" | jq .
    

7. Automated rotation

To automate certificate rotation using a cron job:

# /etc/cron.d/autonomy-cert-rotate
# Run daily at 02:00, rotate if expiring within 14 days
0 2 * * * root \
  sh -lc 'if autonomy cert list --cert-file /etc/autonomy/edge.crt --expiring-within-days 14 \
      | tee /tmp/autonomy-cert-check.txt \
      | grep -Eq "\\b(expiring|expired)\\b"; then \
      autonomy cert rotate \
        --cert-file /etc/autonomy/edge.crt \
        --key-file  /etc/autonomy/edge.key \
        --ca-cert   /etc/autonomy/ca.crt \
        --ca-key    /etc/autonomy/ca.key \
        --identity  "$(hostname)"; \
    fi'

The --expiring-within-days check only filters output, so the cron job must inspect the printed status values rather than relying on the process exit code. Only rotate when the output contains expiring or expired.


Known gaps

  • No CA certificate rotation: The CA itself cannot be rotated via the CLI. CA rotation requires manual key replacement and re-issuance of all leaf certificates.

  • No OCSP support: Online Certificate Status Protocol is not implemented. Revocation checking relies on CRL only; there is no live OCSP responder integration.

  • No external PKI / OCSP integration: The repo now supports repeated CRL publishers plus a configurable agreement threshold for pull-based sync, but it does not integrate with OCSP, enterprise PKI platforms, or externally managed revocation responders. Operators still need to choose and operate the authoritative publisher set for their deployment.

  • edged does not auto-reload certificates: The edged process reads certificate files at TLS handshake time for most Go TLS configurations. Verify with your deployment that the running process will use the new certificate without restart.