[Remote] Lead Data Engineer (Databricks | Python | PySpark)
Note: The job is a remote job and is open to candidates in USA. Staffing Technologies is seeking a Lead Data Engineer to design, build, and lead modern cloud-based data platforms with a strong focus on Databricks, Python, and PySpark. This role combines hands-on engineering with technical leadership, owning architecture decisions, delivery standards, and scalable data solutions.
Responsibilities
- Lead the design and delivery of cloud-native data platforms using Databricks
- Architect and implement Lakehouse and Data Warehouse patterns
- Build and optimize ETL/ELT pipelines using Python and PySpark
- Establish engineering standards , reusable frameworks, and metadata-driven orchestration
- Review designs, vet solutions with the team, and lead demos and retros prior to deployment
- Enforce data quality, lineage, monitoring, and alerting across pipelines
- Mentor engineers and provide hands-on technical leadership
- Partner with analytics and business teams to align solutions with data and reporting needs
Skills
- MUST be EST time zone
- Lead the design and delivery of cloud-native data platforms using Databricks
- Architect and implement Lakehouse and Data Warehouse patterns
- Build and optimize ETL/ELT pipelines using Python and PySpark
- Establish engineering standards, reusable frameworks, and metadata-driven orchestration
- Review designs, vet solutions with the team, and lead demos and retros prior to deployment
- Enforce data quality, lineage, monitoring, and alerting across pipelines
- Mentor engineers and provide hands-on technical leadership
- Partner with analytics and business teams to align solutions with data and reporting needs
- ~15 years of total experience in data or software engineering
- 3+ years in a technical lead role
- 5+ years building cloud-based data platforms
- Proven delivery of production-grade, scalable data systems
- Excellent Communication Skills are critical here
- Hands-on experience with Databricks Notebooks, Jobs, and workload optimization
- Building pipelines using Lakeflow / Declarative Pipelines
- Data ingestion via Databricks connectors
- Implementing data lineage, quality checks, monitoring, and alerting
- Table, compute, and performance optimization within Databricks
- Advanced Python with strong packaging and dependency management
- Expert PySpark for distributed data processing
- Clear understanding of Spark vs single-node execution
- Spark performance tuning and troubleshooting
- Strong SQL for mid-to-complex transformations
- Query and data model optimization to reduce compute and improve performance
- Strong adherence to SOLID and DRY principles
- Experience building parameterized, reusable frameworks
- Agile/SCRUM delivery experience
- Git-based development workflows and code reviews
- Testing strategies: unit, integration, and end-to-end
- CI/CD pipelines and Infrastructure as Code (Terraform)
Company Overview