Part-time Data Engineer (Databricks and Azure)
Our client is unable to hire H1-B candidates at this time. Client is a small, growing consulting company, with a focus on AI and Data Solutions. They are seeking a part-time Data Engineer (20 hours/week) with extensive experience on Databricks and Azure. As a Data Engineer:
- You will be a critical player in building the foundational data infrastructure for a leading firm's data and AI strategy.
- Working primarily with Databricks on the Azure platform, you will design, develop, and maintain robust data pipelines, ingesting diverse data sources and transforming them into actionable insights.
- You will collaborate closely with the product team and other stakeholders to construct a data lakehouse that will power integrations, advanced analytics and future AI-driven workflows, all while handling sensitive client data with the utmost care and responsibility.
Responsibilities:
- Partner closely with the Product Manager, Product Designer, and client stakeholders to understand data requirements and translate them into effective data solutions within Azure and Databricks.
- Design, build, and maintain scalable and reliable data pipelines in Databricks to ingest data from a variety of source types (e.g., business workflow systems, accounting systems, CRM, databases, APIs, flat files).
- Implement and manage a medallion architecture (Bronze, Silver, Gold layers) within Databricks, transforming raw data into curated, business-ready datasets tailored for specific use cases defined by the product team.
- Develop gold layer tables and views optimized for analytics, ensuring they meet the requirements for dashboards and reports, particularly for consumption via Power BI.
- Configure and optimize Databricks to connect seamlessly with BI tools like Tableau and Power BI, enabling self-service analytics for the customer.
- Work with potentially sensitive client data, implementing and adhering to strict data security, privacy, and governance protocols.
- Leverage your skills in Databricks, including familiarity with or a strong willingness to quickly learn features like MLflow, Delta Lake, and Unity Catalog.
- Apply DevOps best practices to data pipeline development, including automation, monitoring, and CI/CD where applicable.
- Collaborate on the design and optimization of data models, ensuring they align with business needs, performance requirements, and future scalability.
- Implement robust automated testing procedures to validate data pipelines, ensure data quality, and maintain the accuracy of transformed data.
- Create and maintain comprehensive documentation for data pipelines, data models, architectural decisions, and operational procedures.
- Establish monitoring and alerting solutions to proactively identify and resolve issues in data pipelines, ensuring data availability and reliability.
- Communicate effectively with both technical and non-technical stakeholders, clearly explaining data engineering concepts, design choices, and progress.
- Contribute to a collaborative environment within a large, cross-functional consulting team.
Requirements:
- Proven experience as a Data Engineer, with a strong focus on designing and implementing solutions on the Databricks platform.
- Hands-on expertise in building and maintaining scalable Python data pipelines within Azure and Databricks.
- Demonstrable experience in implementing medallion data architecture (Bronze, Silver, Gold layers) in to support analytics and AI use cases.
- Proficiency in ingesting data from diverse source types (e.g., APIs, relational databases, NoSQL databases, flat files, streaming sources).
- Experience with BI tools and optimizing for maintainability and performance.
- Strong SQL skills and proficiency in data modeling techniques.
- Experience with Azure cloud services, particularly Azure Data Lake Storage (ADLS Gen2), Azure Key Vault, Azure Data Factory or other Azure data services.
- Familiarity with MLflow for managing the machine learning lifecycle is a strong plus; curiosity and ability to quickly learn new Databricks features is essential.
- Understanding of DevOps principles and experience with tools for CI/CD, version control (e.g., Git), and infrastructure automation is advantageous.
- Experience working with sensitive data and a strong understanding of data security, data governance, and privacy-preserving techniques.
- Excellent problem-solving skills and the ability to troubleshoot complex data issues.
- Strong communication skills, with the ability to articulate technical details and decisions to product managers, client stakeholders, and other engineers.
- Ability to work effectively within a large, cross-functional consulting team in a dynamic, client-facing environment.
- A mindset that leans into data science concepts (understanding data needs for ML) or advanced DevOps practices.
Apply tot his job Apply To this Job