Problem
When attempting to connect to Databricks using Open Delta Sharing at https://<region>.<cloud-specific-databricks-domain>/
, the connection fails with an SSL certificate verification error.
requests.exceptions.SSLError: HTTPSConnectionPool(...): Max retries exceeded ... (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)')))
Note
The URL specifics vary depending on where your workspace is deployed. For example, a workspace deployed in Singapore on AWS can have the URL https://singapore.cloud.databricks.com
.
- AWS:
<region>.cloud.databricks.com
- Azure:
<region>.azuredatabricks.net
- GCP:
<region>.gcp.databricks.com
Cause
The PEM certificate file provided does not include the full chain of certificates required for verification. As a result, Python's requests library cannot validate the server certificate, leading to a connection failure.
Solution
- Create a new PEM file that contains the complete chain of certificates (including any required intermediates).
- Use the following code to set the Python environment variable
REQUESTS_CA_BUNDLE
to the path of the updated PEM file in the code. This configuration allows the requests library to successfully verify the SSL certificate chain and establish a secure connection to Databricks using Delta Sharing.
os.environ["REQUESTS_CA_BUNDLE"] = "/path/to/file/<file-name>.pem"
- Use the following OpenSSL command to automatically fetch and save the full certificate chain for
<cloud-specific-databricks-domain>
. Ensure the host is the same as the host in the error message. This command connects to the Databricks endpoint, retrieves all certificates in the chain, and writes them to<file-name>.pem
for use as a trusted certificate bundle.
openssl s_client -showcerts -connect <region>.<cloud-specific-databricks-domain>:443 </dev/null | sed -n -e '/-.BEGIN/,/-.END/ p' > <file-name>.pem