Member of the Technical Team - Data Engineer

Transfyr Bio

Transfyr Bio

Software Engineering, IT, Data Science

Cambridge, MA, USA

Posted on May 12, 2026

About Transfyr

Transfyr is building physical AI for science, and the world’s largest commercial dataset on real-world scientific execution.

Why is it that a professional athlete has dramatically more information about every play they make than a scientist has about the cause of any experimental failure? At Transfyr, we are building the infrastructure to make real-world scientific work legible, transferable, and reproducible. Right now this looks like: sensors (vision, audio, environmental, etc.) in real laboratory environments, and a platform that records and analyzes multimodal data about how scientific work is performed. This foundation is critical not only for driving elite human performance today, but for enabling meaningful automation tomorrow.

Want to learn more? You can read some of our writings here.

The Role

Data engineers at Transfyr design and build the core systems that make real-world science legible.

You will work on end-to-end data systems that ingest, organize, and serve multimodal data from complex laboratory workflows. Your solutions will touch physical environments, imperfect humans, ever-evolving protocols, and long-tail failure modes. The work demands strong engineering fundamentals, high agency, and comfort operating where requirements emerge from customers’ reality rather than specs.

This role is in-person in Cambridge, MA.

What you’ll accomplish with us:

  • Architect Scalable Pipelines: Lead the development of modern, observable data pipelines managing multimodal data transformations and scientific data using Prefect, Temporal, etc.

  • Modernize Data Models: Develop and manage the data models used across our stack, ensuring they support business, operational, and scientific requirements.

  • Streamline Scientific Data: Develop robust data contracts and versioning strategies to manage high-volume multimodal data streams from laboratory environments.

  • Implement Automated Data Quality (QC): Establish robust, automated Quality Control (QC) processes and monitoring, ensuring the integrity, completeness, and correctness of multimodal data uploads (e.g., confirming video frames, checking for dropouts, and validating deployment association)

  • Design for Privacy and Compliance: Architect data pipelines and storage solutions that address data anonymization, Personally Identifiable Information (PII) handling, and security requirements necessary for both customer data and internal compliance.

  • Ship Tools People Trust: Develop internal and customer facing tools that make complex systems understandable and usable for scientists and operators.

  • Enable What Comes Next: Evaluate and integrate external tools, open source software, and infrastructure components where they accelerate progress.

Who you are:

  • High agency. You don’t wait for perfect specs, detailed tickets, or constant direction. You notice what needs to be done, anticipate downstream needs, and take ownership of pushing important work across the finish line.

  • Biased toward action. You value momentum, can move quickly without being careless, and know when to trade elegance for progress.

  • Successful in ambiguity. You can turn incomplete context into working software, ask the right questions when needed, and make reasonable assumptions when answers aren’t available yet.

  • Thoughtful. You can balance speed, correctness, and long-term maintainability, and you understand that the right tradeoff depends on the moment.

  • Clear, direct communicator. You close loops, surface issues early, and work effectively with colleagues who may not share your technical background.

  • Intense. You care deeply about the mission, are willing to work hard when it matters, and take pride in helping a small team do outsized work.

What you know:

  • Modern Tooling: Deep experience with pipeline orchestration tools (Prefect/Temporal or similar preferred), data streams, and managing complex multi-cloud and edge architectures.

  • Reliable Systems: Fluency in Python, infrastructure-as-code tooling, and edge-to-cloud data transfers, with a proven track record of implementing data version control and data contracts to maintain high data quality.

  • Infrastructure: Ability to build and scale data-intensive backend systems that handle multimodal scientific outputs like live video and sensor metadata.

  • Cloud Data Storage and Compute: Hands-on expertise with cloud data services (e.g., AWS S3, ECS, or similar infrastructure-as-code managed cloud compute/storage) for high-volume data ingestion and serving.

Other things we like to see:

  • A passion for science

  • A passion for and experience with AI

  • Demonstrated experience working in fast-moving/ambiguous environments (like startups!)

The basics:

  • Competitive compensation (cash + equity)

  • Full benefits (low/no-cost health insurance options, HSA, 401K with matching, lunch subsidy, etc.)

  • Well-funded startup led by industry leaders: founders Anna Marie Wagner (Ginkgo Bioworks, Bain Capital) and Renee Wegrzyn (DARPA, Ginkgo Bioworks, ARPA-H).