Data Platform Engineer
About Us
At Turquoise, we’re making healthcare pricing simpler, more transparent and lower cost for everyone. We’ve already launched our consumer-facing website that allows anyone to search and compare hospital insurance rates – something never before possible in the USA. We’re also working on solutions that are helping insurance companies and hospitals to negotiate their prices better with more understanding of the market conditions using petabytes scale datasets
We’re a Series B-stage startup backed by top VCs. More importantly, we’re a multi-talented group of folks with a big passion for improving healthcare. We’re eager to find ambitious yet well-rounded teammates to join us on this mission.
Our product is used by hospitals, health insurance companies and other companies that pay for healthcare in the US.
About the Role
We are looking for a Data Platform Engineer to join our Data Engineering & Infrastructure team. You will work alongside Data Ingestion, Data Transformation, and DevOps/SDET colleagues to architect and implement the frameworks that power our data ecosystem. In this role, you’ll identify bottlenecks and weak points in our current data flows and design scalable solutions that streamline monthly data pipelines and large-scale compute tasks.
You will have significant autonomy to shape our data platform—defining best practices for table naming, schema management, orchestrating complex workflows, compute frameworks development and so on. This is an excellent opportunity if you enjoy solving data architecture challenges while still being hands-on with frontier data technologies.
Key Responsibilities
- Architect & Automate: Build frameworks for general data automation (e.g., DAG-of-DAGs in Airflow) to streamline monthly ingestion and transformation pipelines.
- Monitor & Improve: Continually assess how data flows throughout the company, identify weak points, and unify/automate these areas for better performance and reliability.
- Resource Management: Collaborate on scheduling and execution of compute-intensive tasks (e.g., Spark, Trino) to optimize resource utilization and performance.
- Collaboration & Mentoring: Partner with internal teams (Data Ingestion, Data Transformation, DevOps, SDET, etc.) to understand pain points and implement robust solutions. Provide technical guidance and set tasks for SDET and DevOps teams to achieve platform goals.
- Best Practices: Establish guidelines for naming conventions, schema management, data modeling, and overall data governance across our data lakes, warehouses, and databases.
Required Skills & Experience
- SQL and Database Engines proficiency – you should understand how to write and debug SQL that is running across petabyte scale datasets
- Data Orchestration: Hands-on experience with Airflow (or similar scheduling/ETL tools) to manage complex workflows.
- Data Processing: Proficient with Spark and/or Trino for distributed data processing.
- Data Storage Technologies: Practical know-how of Redshift, PostgreSQL, and data lake formats (e.g., Iceberg).
- Programming: Strong Python skills; familiarity with Java is a plus (for debugging tooling).
- Cloud Expertise: Solid understanding of AWS services, including infrastructure management and best practices.
- Problem-Solving: Able to identify process gaps, architect solutions, and carry them through to implementation.
Nice-to-Have
- Background in Data Governance: Experience establishing naming conventions, schema versioning, or data cataloging.
- Performance Tuning: Familiarity with optimizing queries and managing resource allocation.
Previous Mentorship/Leadership: Ability to coach junior engineers or SDETs.
Tech-stack we’re working with
AWS, Trino, Spark, Clickhouse, Airflow, Iceberg/Delta/Hive Metastore, Schema management tooling (bytebase).
Benefits
- Work From Home
- Flexible PTO
- Stock Option Plan
- Annual Learning & Development Benefit ($1200)
- Home office benefit ($700)