Natural product-based drug discovery has greatly benefited from genome mining, an approach that predicts the chemical potential of a sequenced organism based on its biosynthetic genes. For example, the number of biosynthetic gene clusters (BGCs) in a microbial genome usually correlates with chemical richness, and an unprecedented BGC architecture suggests that the corresponding compound is novel. Some of the most powerful predictive methods exist for nonribosomal peptide synthetases and cis-AT polyketide synthases (PKSs), for which the BGC architecture can suggest fairly detailed chemical structures that aid in the targeted natural product isolation. The current paper provides an automated tool for a third family, termed trans-AT PKSs, which have been notoriously challenging enzymes in the context of genome mining due to their functional complexity.
Similar to the better-studied cis-AT PKSs, trans-AT systems are enzymatic assembly lines comprised of giant, multimodular proteins with dozens of catalytic domains. Unique to trans-AT PKS is their astonishing diversity of non-canonical domains, module types, and trans-acting components (i.e., enzymes not embedded in the main multifunctional proteins). Their architectural diversity that is among the highest known for any biosynthetic enzyme family, permits trans-AT PKSs to install a wide range of moieties into polyketides, but it also makes predictions of chemical structures much more challenging than for textbook PKSs. Such predictions, however, would be highly attractive, since trans-AT PKSs occur in many poorly explored bacterial groups and generate bioactive compounds, for which novel chemical scaffolds are encountered at high frequency.
Combining insights into such polyketide structures, biochemical data, and extensive bioinformatic analyses, we have been able to decipher the biosynthetic principles of these unusual assembly lines. A straightforward correlation between the phylogeny of a single enzymatic domain within a module, the ketosynthase domain, and the chemical structure of the incoming substrate is sufficient for precise predictions of polyketide cores. However, these predictions required detailed biosynthetic knowledge, a reference dataset, and the tedious construction of large phylogenetic trees. To make the analysis of polyketide pathways available for broader community, we developed TransATor, a bioinformatic platform that automatically analyzes polyketide pathways and predicts the corresponding polyketide core structures. These predictions enable efficient in silico dereplication and prioritization of bacterial strains, the identification of chemical novelty, and the identification of culturable producers in cases where the original pathway stems from metagenomic studies. In the present study we show how the tool can be used to predict novel structures and to guide the subsequent structure elucidation process. Moreover, we used TransATor to establish biosynthetic models for even highly aberrant polyketide structures.
We believe that this tool will assist the natural product community and scientists not familiar with natural product biosynthesis alike. The cost-efficient in silico pathway dereplication and prioritization harbors great potential for biotechnological and pharmaceutical applications and will serve as a valuable tool to expand the chemical space of trans-AT PKS-derived natural products.
Figure 1. Outline of the TransATor pipeline for protein-based analysis of trans-AT PKS BGCs. Core PKS domains are annotated and KS substrate specificities predicted based on Hidden Markov Models.