About the job
We are looking for a genomics bioinformatician to join the data science and genomics team to help us achieve the goal of building the largest and most comprehensive biodiversity knowledge store.
The endeavour will explore biodata collected from around the world often in unexplored locations and biomes where a majority of the data will be unknown, unannotated and constitute a large part of the undiscovered “microbial dark matter”. Tapping into this undiscovered information at Basecamp Research is leading to tremendous opportunities in both expanding basic understanding of biology but also building out tools (both traditional and AI) that will enhance biodiscovery and shape the world of human health and industry. An example of this can be found by a blog entry written by our collaborator NVIDIA (Basecamp Research is an NVIDIA inception member).
The successful candidate will take ownership, expand and manage our sequencing and metagenomic data analysis production operations. They will have the opportunity to investigate new methods to maximize the curation and annotation of the microbial dark matter. This will be a strong collaborative role working closely with all teams at all data collection and analysis points. This will include but not limited to our biodiversity partners, field scientists, sequencing ops, ML scientists and commercial stakeholders.
Responsibilities
Develop and run software to support the biodiversity and sample collection teams and genome sequencing operations. Our sequencing datastack includes second and third generation technologies.
Responsibilities include, in coordination with other team members:
- Taking ownership in building, improving and managing the in-house genomic assembly and annotation pipeline. This will entail:
- Manage and audit the data workflow of our samples from collection to data warehousing. This includes the quality control of appropriate files and datasets
- Collaborate with the Data Engineering team in building and managing the pipeline in our in-house designed infrastructure platform
- Investigate, benchmark and integrate novel analyses into the pipeline
- Write and document high quality code and methodology of processes
- Methods development to leverage the in-house sequencing datasets to create full high quality genomes for context analysis
- Contribute to problem-solving discussions within and across teams to generate ideas that will benefit all aspects of the organisation
- Opportunity to lead from the front when it comes to bringing new ideas and approaches to the table
Required skills and experiences
- A graduate (MSc/PhD) degree in the life sciences, computer science or similar
- At least three years of experience in a high throughput sequencing environment and building and managing genomics and protein analysis workflows or pipelines
- Experience managing large datasets containing many samples from many sources (ecological, population etc)
- Knowledge of dna sequencing platforms (second and third generation) and experience working with their data types
- Experience and knowledge of unix based operating systems, libraries, and tools
- Knowledge and/or experience of tools used in bioinformatics both in genomics, metagenomics and/or protein biology
- Experience with a programmatic scripting language (python, perl etc) and shell scripting
- Excellent analytical and problem solving skills
- Excellent communication skills and ability to work closely with interdisciplinary teams
- Fluency in English
Advantageous skills and experiences
- Experience with metagenomic data and knowledge of microbial genomics and/or protein biology
- Experience with data workflow management systems such as Prefect, Airflow, Snakemake, Nextflow but preferably Dagster
- Experience in the techbio/biotech industry that is focused on product development (therapeutics, protein/drug discovery, CRO etc)
- Experience with building databases (relational and/or non-relational)
- Experience using cloud solutions (AWS, GCP etc)
- Experience with containerization (Docker, Singularity)
- Experience with git and Gitlab or Github
- Experience with Agile software development
Location
- London office based (non-remote) or happy to relocate to London.