[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. Anika Systems is seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms supporting federal clients. The role involves developing ETL/ELT pipelines, managing cloud data platforms, and implementing CI/CD processes to ensure reliable data delivery and governance.
Responsibilities
- Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and deliver data across enterprise platforms
- Build scalable data ingestion frameworks for structured and semi-structured data, including XBRL filings and financial datasets
- Implement data transformation logic to support analytics, reporting, and regulatory use cases
- Ensure data pipelines are reliable, performant, and scalable in cloud environments
- Leverage AI-assisted development tools to accelerate pipeline development, testing, and optimization
- Develop and manage data solutions leveraging AWS services (e.g., S3, Airflow, DAGs, Glue, Lambda, Redshift)
- Implement and optimize Apache Iceberg table formats for large-scale, ACID-compliant data lakes
- Support lakehouse architectures that unify data lakes and data warehouses
- Optimize data storage and retrieval strategies for performance and cost efficiency
- Enable data platforms that support AI/ML workloads and downstream generative AI use cases
- Design and implement CI/CD pipelines for data pipelines, infrastructure, and analytics code using tools such as GitHub Actions, GitLab CI, Jenkins, or AWS-native services
- Automate build, test, and deployment processes for ETL pipelines and data platform components
- Implement DataOps best practices, including version control, automated testing, environment promotion, and rollback strategies
- Ensure reproducibility, reliability, and governance of data pipeline deployments across environments
- Integrate AI-driven testing and monitoring tools to improve pipeline quality and reduce operational risk
- Design and implement materialized views and other performance optimization techniques to improve query efficiency
- Tune data pipelines and queries for performance, scalability, and cost
- Implement partitioning, indexing, and caching strategies aligned to workload patterns
- Develop pipelines to ingest, parse, and normalize XBRL (eXtensible Business Reporting Language) data
- Support regulatory and financial data use cases requiring high accuracy and traceability
- Ensure alignment with data standards and validation rules for financial reporting datasets
- Apply context engineering principles to ensure data is enriched with meaningful metadata, lineage, and business context
- Collaborate with Data Architects to support data modeling, schema design, and entity relationships
- Enable downstream analytics and AI use cases by structuring data for usability, discoverability, and governance
- Integrate pipelines with enterprise data catalogs and metadata management systems
- Support automated metadata capture, lineage tracking, and data quality monitoring
- Ensure alignment with data governance frameworks and standards established by OCDO organizations, including AI data readiness and traceability
- Collaborate with data architects, analysts, and business stakeholders to understand data needs and deliver solutions
- Participate in stakeholder listening campaigns, workshops, and data discovery efforts
- Work in Agile teams to iteratively deliver data capabilities and enhancements
- Contribute to identifying and implementing AI-driven efficiencies and automation opportunities across the data lifecycle
Skills
- Bachelor's degree in Computer Science, Engineering, Data Science, or related field
- 5+ years of experience in data engineering, ETL development, or data platform engineering
- Strong hands-on experience with: ETL/ELT tools and frameworks, AWS data services (S3, Glue, Lambda, Redshift, etc.), Apache Iceberg and modern data lake architectures
- Experience designing and implementing CI/CD pipelines for data platforms and ETL workflows
- Demonstrated proficiency using AI tools and AI-assisted development workflows (e.g., LLM copilots, automated code generation, pipeline optimization tools)
- Experience processing XBRL or complex financial/regulatory datasets
- Proficiency in SQL and Python
- Experience implementing materialized views and query optimization techniques
- Understanding of data modeling concepts and metadata management
- Familiarity with data governance, data quality practices, and data readiness for AI/ML use cases
- Ability to work in Agile, DevOps-oriented environments
- U.S. Citizenship required; ability to obtain and maintain a federal clearance
- Experience supporting federal agencies such as SEC, DHS, Treasury, or Federal Reserve System
- Familiarity with data catalog tools (e.g., Collibra, Alation, ServiceNow)
- Experience with Apache Spark, Kafka, or other distributed data processing frameworks
- Experience enabling data pipelines for AI/ML or generative AI applications
- Knowledge of data maturity frameworks (e.g., EDM DCAM, TDWI)
- Exposure to context engineering or semantic data layer design
- AWS or data engineering certifications
- Experience with infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation) in support of CI/CD pipelines
Company Overview