Data Solutions Engineer
About YipitData:
YipitData is the leading market research firm for the disruptive economy and recently raised $475M from The Carlyle Group at a valuation of over $1B.
We analyze billions of data points every day to provide accurate, detailed insights on industries like ridesharing, e-commerce marketplaces, and payments. Our data team uses proprietary technology to identify, license, clean, and analyze the data that many of the world’s largest investment funds and corporations depend on.
For three years, we have been recognized as one of Inc’s Best Workplaces. We are a fast-growing technology company backed by Norwest Venture Partners and The Carlyle Group. We cultivate a strong people-centric culture focused on mastery, ownership, and transparency.
About the Role:
We are looking for a Data Solutions Engineer to join our Data Solutions team, focusing on building, optimizing, and maintaining large-scale data pipelines. This role is highly technical, requiring deep expertise in Spark optimizations, while also involving direct collaboration with Databricks users across the company.
You will work closely with analysts, data engineers, and stakeholders to design, debug, and refine high-performance data pipelines in Databricks. Additionally, you will help train and support internal users of Databricks, creating documentation and materials to promote best practices across teams.
As a Data Solutions Engineer, you will:
- Optimize and maintain data pipelines
- Build and enhance high-performance ETL workflows in Databricks.
- Identify and resolve performance bottlenecks in Spark jobs.
- Ensure data processing reliability, scalability, and fault tolerance.
- Collaborate with cross-functional teams
- Work with analysts, engineers, and business teams to improve data workflows.
- Help analysts transition legacy SQL-based workflows into more scalable PySpark solutions.
- Provide training and technical guidance on best practices in Databricks and Spark optimization.
- Drive continuous improvement and knowledge sharing
- Develop internal tooling and automation to streamline data workflows.
- Create and maintain documentation, training materials, and best practices for Databricks users.
You Are Likely To Succeed If:
- You hold a Bachelor’s or Master’s degree in Computer Science, STEM, or a related technical discipline.
- You have 3+ years of experience in Data Engineering or ETL, working with large-scale data processing.
- You have strong expertise in PySpark and Spark optimizations, with the ability to diagnose and resolve performance issues using the Spark UI.
- You understand how to efficiently process and optimize large datasets in a distributed environment.
- You have experience tuning Spark jobs for performance and scalability, including memory management, parallelism, and resource allocation.
- You are comfortable working with Databricks, managing clusters, and troubleshooting job execution.
- You can read and interpret SQL queries to translate them into optimized PySpark workflows.
- You are proactive and enjoy collaborating with stakeholders to improve data workflows.
- You have strong communication skills in English, both technical and non-technical, as you will interact with Databricks users across the company.
What We Offer:
- 30 days PTO for any reason, including sick days (no specified limits)
- Flexible work schedule
- Personal laptop
- Health and wellness package
- Remote work
Work Hour/Schedule Expectations:
- Candidates should be available and willing to work during East Coast business hours (approximately 10/11am – 6/7am EST).
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender, gender identity or expression, or veteran status. We are proud to be an equal opportunity employer.