Workspace, Notebooks & Azure Integration

Workspace

The Databricks Workspace is the central hub for accessing all Databricks objects and features. It provides a unified interface to access, create, and collaborate on notebooks, files, folders, dashboards, queries, and libraries.

Workspace Folder Types

Folder Description
Users Personal workspace, private notebooks and files per user
Shared Collaborative space for team projects
Repos Git-integrated folders (GitHub, Azure DevOps, GitLab)

Workspace Objects

Objects you can create and manage inside a workspace:

Each object has a unique identifier for programmatic access via APIs.

Workspace Types

There are two types of workspaces:

  1. Classic Workspace
  2. Serverless Workspace
Classic (Hybrid) Workspace Serverless Workspace
Compute Your cloud account + optional serverless Fully Databricks-managed
Storage Workspace storage in your cloud account Default storage (Databricks-managed)
Setup You configure VPC/VNet, storage, networking Pre-configured, zero cloud setup
Azure Portal Name Hybrid Workspace Serverless Workspace

Classic Workspace Architecture

Classic Databricks workspaces have an associated workspace storage account (Azure) or workspace storage bucket (AWS) in your cloud account.

This storage contains:

The compute resources (clusters, SQL warehouses) are deployed inside your virtual network, giving you full control and isolation.

Pasted image 20260516150933.png

Serverless Workspace Architecture

Serverless workspaces use default storage -- a fully managed object storage platform provided by Databricks. No need to configure external cloud storage or manage access credentials.

Default storage is used for:

You can still connect to your own cloud storage account to access your data.

Note: All interactions with default storage require serverless, Unity Catalog-enabled compute. Classic compute cannot directly access default storage assets.

Pasted image 20260516151008.png`


Databricks Notebooks

Databricks Notebooks are interactive, cloud-based documents that combine code execution, visualization, and narrative text in a cell-based model.
Databricks support python, sql, scala, r, markdown in a single notebook

Note: Inline comments with @username mentions and email notifications

You can chain notebooks in two ways:

  1. Using %run /path/to/the/nb, this way shares variable context, this does not return any exit values
  2. dbutils.notebook.run("path/to/other_nb", timeout_seconds=123, arguments={"key", "val"}), this run isolated execution and return exit value

Notebook Permissions

Five permission levels (available in Premium plan):

Permission View Comment Run Edit Manage Permissions
NO PERMISSIONS
CAN READ x x
CAN RUN x x x
CAN EDIT x x x x
CAN MANAGE x x x x x

Compute Attachment


Databricks Utilities (dbutils)

dbutils is Databricks' built-in utility library available in Python, R, and Scala notebooks. It provides modules for file management, secrets, widgets, notebook orchestration, and more.

Modules Overview

Module Description
dbutils.fs File system operations on DBFS
dbutils.secrets Secure secret management
dbutils.widgets Interactive notebook parameterization
dbutils.notebook Notebook orchestration and control flow
dbutils.jobs Job-related features
dbutils.library (Deprecated) Session-scoped library management
dbutils.data (Experimental) Dataset understanding
dbutils.credentials Credential passthrough interactions

Key Commands

File System (dbutils.fs)

dbutils.fs.ls("/mnt/data/")              # List files
dbutils.fs.mkdirs("/mnt/data/new/")      # Create directory
dbutils.fs.cp("/src", "/dst")            # Copy
dbutils.fs.mv("/old", "/new")            # Move/rename
dbutils.fs.rm("/path", recurse=True)     # Delete
dbutils.fs.head("/path/file.txt")        # Preview file content
dbutils.fs.put("/path/file.txt", "content", overwrite=True)  # Write

Mounting Storage (Legacy -- prefer Unity Catalog External Locations now)

dbutils.fs.mount(
  source="abfss://container@storage.dfs.core.windows.net/",
  mount_point="/mnt/storage",
  extra_configs={...}
)
dbutils.fs.unmount("/mnt/storage")

Secrets (dbutils.secrets)

dbutils.secrets.listScopes()               # List scopes
dbutils.secrets.list("my-scope")           # List keys in scope
dbutils.secrets.get("my-scope", "db-pass") # Get secret (redacted in output)

Widgets (dbutils.widgets)

dbutils.widgets.text("env", "dev", "Environment")     # Text input
dbutils.widgets.dropdown("region", "us", ["us", "eu"]) # Dropdown
dbutils.widgets.get("env")                              # Get value
dbutils.widgets.removeAll()                             # Cleanup

Notebook Orchestration (dbutils.notebook)

result = dbutils.notebook.run("/path/notebook", timeout_seconds=120, arguments={"key": "val"})
dbutils.notebook.exit("success")  # Return value to parent

How Databricks Works with Azure

Azure Databricks is a first-party Azure service jointly developed by Databricks and Microsoft. It is fully integrated into the Azure ecosystem.

Azure Service Integration Map

Category Azure Service Role
Storage Azure Data Lake Storage Gen2 (ADLS) Primary object storage for the lakehouse
Batch Ingestion Azure Data Factory / Fabric Data Factory Orchestrate batch data pipelines into ADLS
Streaming Ingestion Azure Event Hubs / IoT Hub Real-time event and IoT data ingestion
Identity & Access Microsoft Entra ID (Azure AD) + SCIM SSO, user provisioning, identity management
Secrets Azure Key Vault Store and manage secrets, keys, certificates
Governance Microsoft Purview Enterprise-wide data discovery, classification, lineage
BI & Reporting Power BI Dashboards and reporting via optimized Databricks connector
Monitoring Azure Monitor Telemetry, health monitoring, diagnostics
CI/CD Azure DevOps / GitHub Version control, deployment automation
Cost Management Microsoft Cost Management Track and optimize Databricks spend
Networking Azure VNet, Private Link, NSGs Secure network isolation for classic workspaces
Source Systems Azure SQL DB, Cosmos DB, Synapse Source data via Lakehouse Federation or ETL
AI Services Azure OpenAI GenAI integration via Mosaic AI

Data Flow (Reference Architecture)

Pasted image 20260528133805.png

Key Azure-Specific Points for Interviews

Workspace Architecture

Databricks operates out of a control plane and a compute plane.


Same as the compute, workspaces also are of two types

When to choose serverless workspace

Serverless workspace are best choice when:

When to choose classic workspaces

Classic workspaces are the best choice for the following use cases:

Interview Questions


#todo #databricks