Certificate Transparency: Retrieving a Precertificate by SHA-256 Hash

Discover the process behind accessing a precertificate published on a CT log

Key Insights

Direct Retrieval via CT Log APIs: Several CT logs offer RESTful APIs that allow querying by hash or entry index, providing mechanisms to fetch inclusion proofs and certificate data.
Understanding the Precertificate Process: Precertificates include a unique SHA-256 hash derived from their to-be-signed (TBS) data along with a poison extension, ensuring public logging compliance.
Verification and Merkle Proofs: Even if raw precertificate extraction is not always straightforward, Merkle inclusion proofs verify a precertificate's presence in the log.

Overview of Certificate Transparency and Precertificates

Certificate Transparency (CT) is a security enhancement mechanism designed to ensure accountability among Certificate Authorities (CAs). Defined initially by IETF RFC 6962 and further updated in RFC 9162, CT requires that all issued TLS/SSL certificates, as well as their precertificates, be publicly logged in append-only ledgers. A precertificate is a version of the certificate created by a CA that includes a “poison extension” to indicate that it is not the final signed certificate. Once logged, a Signed Certificate Timestamp (SCT) is provided by the log, and later, the final certificate incorporates this timestamp.

The SHA-256 hash of a precertificate serves as a unique identifier, allowing interested parties – such as auditors, security researchers, or even end-users – to verify the certificate's entry in CT logs. The core question is whether one can retrieve the full precertificate from a CT log if the SHA-256 hash is known. The simple answer is: in specific circumstances and using particular API endpoints, it is possible to obtain the data linked to the precertificate. However, the method, availability of data, and completeness may vary by CT log operator.

Retrieval Methods and Detailed Instructions

Understanding CT Logs and Their APIs

Each CT log is maintained by a specific operator (Google, Cloudflare, DigiCert, etc.) and supplies API endpoints that facilitate the inspection and retrieval of log entries. Interfaces typically include features such as:

Query by SHA-256 Hash: Some logs allow you to submit the hash directly and receive a JSON response that includes either the entire certificate data or a Merkle inclusion proof.
Merkle Inclusion Proof Retrieval: This proof is a data structure that confirms the precertificate’s inclusion in the log’s Merkle Tree. It consists of the leaf hash (the SHA-256 hash of the precertificate), log identification, and timestamps.

Step 1: Identify the Correct CT Log

Before beginning the query, it is essential to determine which CT log contains the precertificate you are targeting. CT logs are publicly listed, and you can access these lists on websites such as the official Certificate Transparency site: Certificate Transparency Known Logs.

Carefully review the logs and select the one you suspect is used by the Certificate Authority (CA) in question. Knowing the specific CT log will dictate which API endpoints and query parameters you use.

Step 2: Accessing the CT Log API

Once you have identified the appropriate CT log, review its API documentation. Many CT log providers offer a publicly-available RESTful API with endpoints resembling the following:

// Example using curl to fetch log entry by hash
curl "https://log.example.com/ct/v1/get-proof-by-hash?hash=<SHA-256_hash_value>"

Replace <SHA-256_hash_value> with your precertificate's hash. Note that the endpoint you choose might differ; common endpoints include:

/ct/v1/get-proof-by-hash: Retrieves the Merkle inclusion proof for the hash provided.
/ct/v1/get-certificate-by-hash: In some cases, an endpoint of this type might retrieve the actual precertificate if available.

Step 3: Constructing and Sending the GET Request

Use an HTTP client or a command-line tool such as curl, Postman, or even a programming language with built-in HTTP libraries (for example, Python’s requests library) to send the GET request.

# Example using curl:
curl "https://log.example.com/ct/v1/get-proof-by-hash?hash=YOUR_SHA256_HASH_HERE"

Replace the placeholder URL with the CT log domain you selected. Ensure that HTTPS is used for secure communication. When the API responds, you will often receive a JSON object containing essential details, such as the inclusion proof, timestamp, and possibly the certificate chain.

Step 4: Analyzing the JSON Response

Upon a successful API call, the JSON response may include several fields. Key elements include:

Field	Description
leaf_hash	The SHA-256 hash that was queried, confirming the identity of the precertificate.
merkle_proof	The inclusion proof in the Merkle Tree, often an array of hashes used to verify inclusion.
timestamp	The time when the precertificate was logged.
certificate_data	In cases where the CT log returns the certificate or precertificate in full, this field holds the encoded data (typically in DER format).

Validate that the returned leaf_hash matches your supplied SHA-256 hash. If present, the certificate_data field may provide the complete precertificate. If not, you might need further analysis using the provided inclusion proof.

Step 5: Confirming and Reconstructing the Precertificate

In many typical scenarios, the API’s primary output is to verify inclusion rather than directly supply all certificate data. However, if the API does return the precertificate:

Verify Data Integrity: Confirm that the data and subsequent chain correspond to the SHA-256 hash you initially used.
Utilize OpenSSL Tools: If the returned precertificate is in DER format, you can decode it using OpenSSL:
openssl x509 -in precertificate.der -inform DER -text -noout

Should the API only return inclusion proofs, you have already validated that the precertificate appears in the log at a specific time and index. This proof may require additional tooling to reconstruct the full pre-certificate if the API does not supply it directly.

Additional Considerations and Limitations

Technical Limitations

Although the CT protocol supports public logging and verification, not every CT log may allow a straightforward extraction of the complete precertificate simply through a hash query. A few notable limitations include:

Availability of Endpoints: Not all logs implement a “get-certificate-by-hash” endpoint. Some logs focus solely on providing inclusion proofs to verify that a certificate has been logged.
API Variability: Different CT log operators may implement different API endpoints and response formats. It is crucial to consult the documentation for the specific log.
Privacy and Security Policies: Log operators might restrict direct access to raw precertificates to protect sensitive CA data or prevent misuse.

Understand that the primary aim of Certificate Transparency is to ensure accountability and to provide a method to check that certificates are logged. While verifying inclusion is common, extracting the full precertificate is less frequently needed and is often facilitated by specialized tools or internal systems used by CAs.

Verification using Merkle Tree Proofs

If the API provides a Merkle inclusion proof rather than the entire precertificate, you still have a means of verifying that the precertificate is part of an append-only log via the following steps:

Confirm the authenticity of the provided SHA-256 hash (leaf) by using the inclusion proof.
Recompute and validate the Merkle root of the log using the array of hashes given in the proof.
Compare this recomputed Merkle root with the one published by the CT log.

This process reassures users that the certificate (or precertificate) has been immutably recorded. Tools and libraries are available that simplify Merkle Tree proof verification, particularly in systems where security auditing is critical.

Structured Recap Table for the Retrieval Process

Step	Action	Notes
Identify Log	Access the list of known CT logs	Determine the appropriate log by CA usage
Access API	Review API documentation	Locate endpoints such as `/ct/v1/get-proof-by-hash`
Send Request	Utilize GET requests (e.g., with curl)	Ensure proper formatting of the SHA-256 hash parameter
Process Response	Examine returned JSON for inclusion proof and certificate data	Validate hash matches and check for full certificate details
Verification	Use OpenSSL tools for DER decoding if needed	Confirm integrity of precertificate data

Additional Tools, Resources, and References

While the steps above provide a general framework, several online services and repositories facilitate interacting with CT logs. Tools such as crt.sh offer web interfaces for certificate lookups by domain, while GitHub projects often provide code to scrape and verify CT logs. Moreover, RFC 6962 and RFC 9162 remain the foundational documents detailing the CT protocol.

Useful URLs and Further Reading

Numerous resources provide more details on CT logs and precertificate retrieval:

Programming and Script Samples

For security researchers and developers looking to automate the retrieval process, sample scripts written in languages such as Python may be used. These scripts typically leverage HTTP libraries to query the appropriate API endpoints and process JSON responses. The general pseudocode logic is as follows:


# Pseudocode for querying a CT log for a precertificate:
hash_value = "YOUR_SHA256_HASH"
base_url = "https://log.example.com"
endpoint = "/ct/v1/get-proof-by-hash"
url = base_url + endpoint + "?hash=" + hash_value

response = http_get(url)  # Replace with your preferred HTTP library call
if response.status_code == 200:
    data = response.json()
    if data.get("leaf_hash") == hash_value:
        // Process inclusion proof and certificate_data if available
    else:
        // Handle mismatch in hash
else:
    // Handle error response

The example above outlines the flow. Adapt the logic based on the specific requirements and API documentation of the CT log you are querying.

Legal and Ethical Reminder

While the CT log system is designed for transparency and public verification, ensure that your access and usage comply with the CT log provider’s policies and all relevant legal and ethical guidelines. Retrieval of certificate or precertificate data should only be performed for valid auditing, research, or operational purposes.