Init Script
Init Script (Initialization Script) is a shell script that runs during startup of each cluster node before the Apache Spark driver or executor JVM starts.
Danger
Legacy Global and Legacy Cluster-Named init scripts run before other init scripts. These init scripts might be present in workspaces created before February 21, 2023.
Prerequisite
We want to initialize some program before a cluster started like:
#!/bin/bash
echo "Start running init script: adb-default"
echo "Running on the driver? $DB_IS_DRIVER"
echo "Driver IP: $DB_DRIVER_IP"
timedatectl set-timezone Asia/Bangkok
Getting Started
Cluster-Scoped init scripts
To use the UI to configure a cluster to run an init script, complete the following steps:
- On the Cluster Configuration Page Click the Advanced Options toggle
- At the bottom of the page click the Init Scripts tab
- In the Destination drop-down Select the Workspace type
- Specify a path to the init script like
SYS/init_script.sh
Click Add.
Note
Each user has a Home directory configured under the /Users
directory in
the workspace. If a user with the name user@databricks.com
stored an
init script called my-init.sh
in their home directory, the configured
path would be /Users/user@databricks.com/my-init.sh
.
Cluster-Scoped with Shared Cluster
For shared access mode, you must add init scripts to the allowlist
.
See Allowlist libraries and init scripts on shared compute.
- In your Azure Databricks Workspace Click Catalog
- Click Gear Icon to open the metastore details and permissions UI Select Allowed JARs/Init Scripts Click Add
Warning
Init scripts use the identity of the cluster owner.
Global init scripts
- Go to the Admin Settings Click
Global Init Scripts
- Click Add Name the script and enter it by typing, pasting, or dragging a text file into the Script field.