
Research Projects
We developed a web-based bioinformatics platform to allow NEB scientists without a bioinformatics background to conduct genome mining for enzyme discovery. This platform provides a graphic user interface for advanced research bioinformatics tools and will allow users to run gene annotation and search protein homologs and gene neighborhoods on popular protein
databases, large metagenomics datasets, and in-house sequencing datasets that would otherwise be difficult to do for people without bioinformatics training. Many of the tools in GeneSpace are powered by Domainator, a flexible and modular open-source software suite that we developed for domain-based neighborhood and protein search, extraction, and clustering.
Leveraging machine learning for improved functional annotation and remote homology detection
While working on genome mining projects for enzyme discovery, one of our big questions is how can we improve our ability to predict protein function? We have been investigating how to use structure searches and neural-network-based sequence classifiers to improve remote homology searches and have recently found a way to use protein language models to dramatically increase the sensitivity of protein domain annotations while still maintaining acceptable search speed. We also look forward to working with other research groups in applying these new computational approaches to advance enzyme discovery at NEB.
Using deep learning models for enzyme engineering
Deep learning models, including protein language models, have recently shown great potential for generating novel protein sequences and predicting fitness of mutant and wild-type enzymes. We collaborate with experimental scientists at NEB to develop and apply new deep-learning methods in practical enzyme engineering projects.
Discovery of novel deaminases and developing better enzyme-based technologies for genomics and epigenomics applications
We work closely with experimental scientists in the Research and Application Development Departments to discover novel enzymes and develop new enzyme-based high-throughput methodologies and analytical tools, propelling forward the frontiers of genomics and epigenomics.
Recently, we have screened over 200 new cytosine deaminase variants from various protein and metagenomics databases by integrating bioinformatics, in-vitro protein synthesis, LC/MS-based analytical chemistry, and high-throughput sequencing approaches. Our work has identified many new cytosine deaminases with interesting and previously undiscovered properties. Among these were enzymes with strong activity on double and single-stranded DNA without any apparent sequence constraints, enzymes that do not deaminate modified cytosines, and enzymes with a variety of sequence context preferences, including a preference for CpG. These novel properties are highly desired for converting deaminases into powerful and easy-to-use tools for detecting epigenetic modifications and for genome editing.