YipitData is the market-leading data and analytics firm. We analyze billions of data points every day to provide accurate, detailed insights across industries, including consumer brands, technology, software, and healthcare.
Our insights team uses proprietary technology to identify, license, clean, and analyze the data that many of the world’s largest investment funds and corporations depend on. We raised $475M from The Carlyle Group at a valuation over $1B, further accelerating our growth and market impact.
We have been recognized multiple times as one of Inc’s Best Workplaces. As a fast-growing company backed by The Carlyle Group and Norwest Venture Partners, YipitData is driven by a people-first culture rooted in mastery, ownership, and transparency.
With offices in New York, Austin, Miami, Denver, Mountain View, Seattle, Hong Kong, Shanghai, Beijing, Guangzhou, and Singapore, we continue to expand our reach and impact across global markets.
We are seeking a highly skilled Data Pipeline Engineer to join our dynamic Data Engineering team. The ideal candidate possesses 3+ years of data engineering experience. An excellent candidate should have a solid understanding of Spark, SQL, and data pipeline experience. This individual will play a crucial role in supporting our strategic pipelines and optimizing for reliability, efficiency, and performance.
Hired individuals will play a crucial role in helping to shape our ETL strategy and be part of a rapidly growing team. In this role, you will be responsible for researching, planning, and implementing best practices that optimize data pipeline performance and reliability across our organization. You will work cross-functionally with all of our product teams to ensure smooth, efficient, and scalable pipelines. This position is ideal for someone who loves tackling complex technical challenges, enjoys hands-on problem-solving, and loves the fast-paced environment where they can make a big impact.
Report directly to the Director of Data Engineering, who will provide significant, hands-on training on cutting-edge data tools and techniques.
Build end-to-end data pipelines.
Help with setting best practices for our data modeling and pipeline builds.
Create documentation, architecture diagrams, and other training materials for analysts.
Become an expert at solving complex data pipeline issues using PySpark and SQL.
Collaborate with stakeholders to incorporate business logic into our central pipelines.
Deeply learn Databricks, Airflow, and other ETL tooling.
You hold a Bachelor’s or Master’s degree in Computer Science, STEM, or a related technical discipline.
You have 3+ years of experience as a Data Engineer, or in other technical functions.
You are comfortable working with large-scale datasets using PySpark or Pandas.
You have a great understanding of working with data, orchestration tools, and data pipelines.
You understand business needs and the rationale behind different data ingestion and delivery strategies.
You are eager to constantly learn new technologies.
You are a self-starter who enjoys working with both internal and external stakeholders.
You have exceptional verbal and written communication skills.
Nice to have: Experience with Kubernetes, Docker, or equivalent.
Candidates must have primary residency (and be physically located) in the country of hire (AKA Colombia) in order to be considered for the position. If hired, the worker must maintain primary residency in the country of hire in order to be eligible for employment.
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender, gender identity or expression, or veteran status. We are proud to be an equal-opportunity employer.