Senior Software Engineer (Big Data)
About Us
At Turquoise, we’re making healthcare pricing simpler, more transparent and lower cost for everyone. We’ve already launched our consumer-facing website that allows anyone to search and compare hospital insurance rates – something never before possible in the USA. We’re also working on solutions that are helping insurance companies and hospitals to negotiate their prices better with more understanding of the market conditions using petabytes scale datasets
We’re a Series B-stage startup backed by top VCs. More importantly, we’re a multi- talented group of folks with a big passion for improving healthcare. We’re eager to find ambitious yet well-rounded teammates to join us on this mission.
Our product is used by hospitals, health insurance companies and other companies that pay for healthcare in the US.
The Role
We are looking for a Senior Software Engineer (Big Data) to help design and build a new system that processes and transforms hundreds of terabytes of files on a monthly schedule and publishes the results to downstream systems. This role partners closely with Data Engineering and Infrastructure teams to deliver a reliable, scalable, and well-observed platform.
Key responsibilities
- Design and implement a scalable monthly batch-processing system (ingestion → validation → transformation → publishing).
- Build and maintain workflow orchestration for recurring pipelines (e.g., Airflow).
- Develop and improve ETL/ELT pipelines for large-scale datasets, including safe reruns and backfills.
- Improve reliability through monitoring, alerting, data quality checks, and operational playbooks.
- Tune performance and resource usage for distributed compute workloads.
- Contribute to best practices around naming conventions, schema management, and data governance.
- Collaborate across Data Ingestion, Data Transformation, DevOps, and SDET teams; provide technical guidance as needed.
Required Skills & Experience
- Senior-level experience building production software systems.
- Strong programming skills (Python preferred).
- Experience working with big data and large-scale ETL/ELT pipelines.
- Strong SQL and ability to troubleshoot data issues and performance bottlenecks.
- Strong experience working with cloud technologies (AWS, Azure, or Google Cloud).
- Quick learner who is eager to adopt new tools and technologies as needed.
- Proven problem-solving skills and ability to take ownership from design through delivery.
Nice to Have
- Experience with workflow orchestration (Airflow or similar).
- Hands-on experience with distributed processing engines (Spark, Trino, or similar).
- Familiarity with data lake/lakehouse patterns and table formats (Iceberg/Delta/Hudi or similar).
- Experience processing large volumes of files in Python using Dask, Polars, or similar frameworks.
- Exposure to warehouses/databases such as ClickHouse, Snowflake, PostgreSQL.
- Experience with schema management and data governance practices.
Tech Stack (Current)
AWS, Trino, Spark, ClickHouse, Airflow, Iceberg/Delta/Hive Metastore, schema management tooling (Bytebase).
Benefits
- Work From Home
- Learning + Development Benefit $1,200/year
- Home Office Benefit
- Flexible PTO
- Stock Option Plan
- Yearly US summit trips
Note: Direct experience with all technologies listed is not required—we’re happy to consider candidates who have worked with similar tools and can ramp up quickly.