At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle. Visit h1.co to learn more about us.
Data Engineering is responsible for the development and delivery of our most important asset - our data. Looking across thousands of data sources from across the globe, the data engineering team is responsible for making sense out of that data to create the world’s most extensive and comprehensive knowledge base of healthcare stakeholders and the ecosystem they influence. It is our job to ensure that only accurate, normalized data flows through to our customers, and at a velocity that keeps up with the changes in the real world. As we rapidly expand the markets we serve and the breadth and depth of data we want to collect for our customers, the team must grow and scale to meet that demand.
WHAT YOU’LL DO AT H1
As a Staff Software Engineer, you will be responsible for big data engineering, data wrangling, data analysis and user support primarily focused on the AWS platform. You will have direct founder-level interactions. You’ll not only learn about great technology and a great product, but you’ll also learn from the decision-makers who have successfully built and exited multiple startups. You will work directly with stakeholders across our company to deliver the best scalable, stable, and high-quality healthcare data application in the market.
Roles & Responsibilities:
- Analyze the business needs, profile large data sets and build custom data models and applications to drive business decision making and customers experience.
- Work with the content team to make sure they understand the content, make their day to day work as optimized and efficient and possible.
- Work as a small team to make sure their deliverables are on track and they understand the content and be a Subject Matter Expert for the content they’re responsible for.
- Maintain data quality, work across time zones and different teams.
- Build workflows that empower analysts to efficiently validate large volume of data
- Design optimized big data solutions for data ingestion, data processing, data wrangling, and data delivery
- Design, develop, tune data products, streaming applications, and integrations on large-scale data platforms (Spark, Kafka/Kinesis Streaming, SQL server, Data warehousing, big data, etc) with an emphasis on performance, reliability and scalability, and most of all quality.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for efficient extraction, transformation, and loading of data from a wide variety of data sources
- Willingness to explore new alternatives or options to solve data engineering and data mining issues, and utilize a combination of industry best practices, innovations and experience to get the job done.
- Build data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Peer Review of the code developed by team members
You have strong hands-on technical skills including conventional ETL and SQL skills, experience with multiple programming languages like Python, Java or Scala, as well as streaming or other data processing techniques. You are a self-starter with the ability to manage projects through all stages (requirements, design, coding, testing, implementation, and support).
- 7+ years of professional experience with Big data Systems, pipelines, data processing and reporting
- 4+ years’ experience working on big data technologies like Spark or Hadoop preferably on AWS EMR
- Practical hands-on experience with technologies like Apache Spark, Apache Flink and Apache Hudi
- Experience with data processing technologies like Spark Streaming, Kafka Streaming, K-SQL , Spark SQL, or Map/Reduce
- Understanding on various distributed file formats such as Apache AVRO, Apache Parquet and common methods in data transformation
- Experience in performing root cause analysis on internal and external data and processes to answer specific business questions and find opportunities for improvement
- Someone with an ability to isolate, deconstruct and resolve complex data engineering challenges
- Experience with AWS cloud preferred
- Good to have experience on working with ELK stack