Data Engineer
- Full-Time
- Pasadena, CA
- Terray Therapeutics
- Posted 3 years ago – Accepting applications
Company Overview: Terray Therapeutics is a venture-backed biotechnology company that is pioneering an automated, closed-loop, low latency, wet lab discovery platform and powerful AI capabilities to accelerate the discovery and development of small molecule therapeutics. Our internal development programs are focused on oncology and immunology. In addition to these programs we also work with leading pharmaceutical companies. Our platform is based on fluorescent imaging of ultra-dense microarrays such that, via thousands to millions of images, we can measure billions of target/molecule interactions in a day, generating hundreds of millions of dose-response binding curves.
Position Summary: Terray Therapeutics is seeking a motivated, creative, and experienced Data Engineer. As an integral member of our data team, the candidate will be responsible for building a reliable, distributed data pipeline to handle millions of raw fluorescence microscopy images and their extracted features, allowing our machine learning engineers and data scientists to fully leverage our data to accelerate internal drug discovery efforts. The position will report to the Head of Computational and Data Sciences.
The core responsibilities of this job will be:
- Manage and improve our data lake of millions of fluorescence microscopy images
- Work with our data scientists to incorporate our image processing workflow into the data pipeline
- Build and manage our databases of billions to trillions of chemical structures, intensities, affinities, and data from other biological assays
- Design and architect a data warehouse to support downstream analytics
Experience and Qualifications: Given the company’s size, anticipated growth and fast-paced environment, the organization requires a data engineer who is thoughtful, high energy and can partner with the broader organization to further enhance our next generation drug discovery capabilities.
Part of Terray Therapeutics’ success is nurtured by a hands-on work environment where everyone is accountable, everyone is vested in a vision of excellence, and everyone actively takes part in the success of the business. Terray Therapeutics supports a positive work environment comprised of engaged employees who feel appreciated, recognized and free to be creative.
Qualifications include:
- Expert in engineering big data pipelines using modern technologies and cloud infrastructures
- Expert in building and managing scalable relational databases, preferably in the life sciences space
- Experience with cloud computing services, preferably AWS (EMR, Redshift)
- Experience with high-end distributed data processing environments (Spark, Hadoop, etc.)
- Proficiency in Linux environment, experience with database languages (e.g., SQL) and experience with version control practices and tools (Git)
- Experience with pipeline/workflow managers (Luigi, Airflow, Nextflow, etc.)
- Highly proficient in Python and the PyData stack (numpy, pandas, scipy, dask, etc.)
She/he will exhibit the ability to work well under pressure to provide results in a short timeframe. The company is looking for a highly responsive, goal-oriented individual who will bring significant energy and drive to solve complex technical problems and help us achieve our mission to advance human health.