About the role
We are seeking an experienced Databricks developer with strong expertise in Azure Databricks, Apache Spark, and modern data engineering practices. The ideal candidate will be responsible for designing, developing, and optimising scalable data pipelines, data lakehouse solutions, and cloud-based data platforms. The role requires hands-on experience with Spark processing, Delta Lake, Unity Catalogue, data modelling, and enterprise-grade data integration solutions.
Key Responsibilities Data Engineering & Databricks Development Design, develop, and maintain scalable ETL/ELT pipelines using Databricks, Apache Spark, and SQL. Build and optimize batch and streaming data pipelines using PySpark, Spark Structured Streaming, and Auto Loader. Develop and support enterprise data lakehouse solutions using Delta Lake and Databricks technologies. Implement data ingestion, transformation, cleansing, and aggregation processes for large-scale datasets. Develop reusable frameworks and best practices for data engineering solutions. Data Modeling & Performance Optimization Design and implement data models to support reporting, analytics, and business requirements. Build and maintain Slowly Changing Dimensions (SCD Type 1 & Type 2) for data warehousing solutions. Develop and optimize Change Data Capture (CDC) pipelines. Optimize Spark workloads through partitioning, clustering, caching, and performance tuning techniques. Ensure efficient query performance and scalability across large datasets. Unity Catalog & Data Governance Configure and manage Databricks Unity Catalog environments. Create and manage catalogs, schemas, tables, materialized views, functions, and volumes. Implement enterprise data governance, security, access control, and compliance standards. Support metadata management and data lineage initiatives across the data platform. Cloud & Integration Develop cloud-native data solutions on Microsoft Azure and related cloud services. Integrate data from multiple internal and external data sources. Implement Lakehouse Federation and foreign catalogs to access external data platforms. Collaborate with architects and stakeholders to design scalable cloud data solutions. DevOps & Operational Excellence Support CI/CD implementation and automated deployment processes. Participate in code reviews, testing, and release activities. Monitor and troubleshoot data pipeline failures and performance issues. Ensure adherence to development standards, security policies, and operational best practices.
Required Qualifications Strong hands-on experience with Databricks and Apache Spark (PySpark and/or Scala). Extensive experience with SQL and complex data transformation techniques. Experience in ETL/ELT development and enterprise data pipeline implementation. Strong experience with Microsoft Azure and cloud-based data platforms. Hands-on experience with Azure Databricks and Delta Lake. Experience building batch processing pipelines using Auto Loader and real-time pipelines using Spark Structured Streaming. Strong understanding of data warehousing concepts and dimensional modeling. Experience implementing SCD Type 1, SCD Type 2, and CDC processes. Strong knowledge of Spark performance tuning, partitioning, and optimization techniques. Experience with CI/CD pipelines and DevOps practices. Strong analytical, troubleshooting, and problem-solving skills.
Similar Jobs
About the role
We are seeking an experienced Databricks developer with strong expertise in Azure Databricks, Apache Spark, and modern data engineering practices. The ideal candidate will be responsible for designing, developing, and optimising scalable data pipelines, data lakehouse solutions, and cloud-based data platforms. The role requires hands-on experience with Spark processing, Delta Lake, Unity Catalogue, data modelling, and enterprise-grade data integration solutions.
Key Responsibilities Data Engineering & Databricks Development Design, develop, and maintain scalable ETL/ELT pipelines using Databricks, Apache Spark, and SQL. Build and optimize batch and streaming data pipelines using PySpark, Spark Structured Streaming, and Auto Loader. Develop and support enterprise data lakehouse solutions using Delta Lake and Databricks technologies. Implement data ingestion, transformation, cleansing, and aggregation processes for large-scale datasets. Develop reusable frameworks and best practices for data engineering solutions. Data Modeling & Performance Optimization Design and implement data models to support reporting, analytics, and business requirements. Build and maintain Slowly Changing Dimensions (SCD Type 1 & Type 2) for data warehousing solutions. Develop and optimize Change Data Capture (CDC) pipelines. Optimize Spark workloads through partitioning, clustering, caching, and performance tuning techniques. Ensure efficient query performance and scalability across large datasets. Unity Catalog & Data Governance Configure and manage Databricks Unity Catalog environments. Create and manage catalogs, schemas, tables, materialized views, functions, and volumes. Implement enterprise data governance, security, access control, and compliance standards. Support metadata management and data lineage initiatives across the data platform. Cloud & Integration Develop cloud-native data solutions on Microsoft Azure and related cloud services. Integrate data from multiple internal and external data sources. Implement Lakehouse Federation and foreign catalogs to access external data platforms. Collaborate with architects and stakeholders to design scalable cloud data solutions. DevOps & Operational Excellence Support CI/CD implementation and automated deployment processes. Participate in code reviews, testing, and release activities. Monitor and troubleshoot data pipeline failures and performance issues. Ensure adherence to development standards, security policies, and operational best practices.
Required Qualifications Strong hands-on experience with Databricks and Apache Spark (PySpark and/or Scala). Extensive experience with SQL and complex data transformation techniques. Experience in ETL/ELT development and enterprise data pipeline implementation. Strong experience with Microsoft Azure and cloud-based data platforms. Hands-on experience with Azure Databricks and Delta Lake. Experience building batch processing pipelines using Auto Loader and real-time pipelines using Spark Structured Streaming. Strong understanding of data warehousing concepts and dimensional modeling. Experience implementing SCD Type 1, SCD Type 2, and CDC processes. Strong knowledge of Spark performance tuning, partitioning, and optimization techniques. Experience with CI/CD pipelines and DevOps practices. Strong analytical, troubleshooting, and problem-solving skills.