The leaderless communication peptide (LCP) class of quorum-sensing peptides is broadly distributed among Firmicutes
Leaderless communication peptide (LCP) system is broadly distributed
To assess the distribution of SIP-like LCP-based qs systems across bacteria, we employed a large-scale search strategy that includes: (i) search for RopB homologs across a dataset of 129,001 bacterial genomes and 9,421 reference metagenomics-assembled genomes (MAGs), (ii) probing the genomic vicinity of RopB homologs for small ORFs with a preceding Shine-Dalgarno Ribosome-Binding-Site (RBS) motif that is indicative of their likely translation, (iii) identification of clans of RopB homologs for which the most likely translated adjacent sORF is ultrasmall and encodes a SIP-like LCP, and (iv) functional validation by assessing the regulatory activity of chosen subset of candidate LCPs.
A Blastp search of RopB against the target dataset resulted in 19,280 hits (sequence identity ≥25%, mutual length coverage ≥70%, Supplementary Data 1) encoded in 15,776 genomes and 39 MAGs distributed across 468 taxa. These 19280 hits correspond to 975 unique protein sequences that are predicted to harbor C-terminal tetratricopeptide repeat domain, a hallmark structural element responsible for peptide recognition by the RRNPPA family of receptors7 (Supplementary Data 1). Importantly, 478 out of 975 RopB homologs (~49%) are flanked by at least one candidate sORF with a high confidence upstream RBS motif (Supplementary Data 1)13,14.
To determine whether clans of RopB homologs are enriched for their association with LCPs, we inferred the phylogeny of RopB-like receptors and mapped the amino acid sequence of flanking sORF with the strongest RBS motif on its corresponding receptor leaf of the tree (Fig. 1b, Supplementary Data 1). The two previously well-characterized clans of RopB homologs correspond to streptococcal Rgg regulators, which recognize adjacently encoded canonical propeptides that are post-translationally processed into mature short hydrophobic peptides (SHPs)15,16. Accordingly, the two Rgg clans appeared as two well-delineated clans in the RopB tree and were correctly mapped to SHP propeptides, which lends validation to our methodology to identify clans of qs systems (Fig. 1b and S1, Supplementary Data 1 and 2). Remarkably, RopB and its evolutionarily closest relatives represent a distinct clan of 183 receptors that were frequently associated with an ultrasmall peptide with amino acid composition similar to SIP (Fig. 1b, Supplementary Data 1 and 2). We term the new clan, RopB clan, and the receptors in RopB clan represent a new class of qs system that uses LCPs as signaling molecules.
The refined manual assessment of the candidate cognate LCP of each receptor within the RopB clan (Fig. 2 and Supplementary Data 2) revealed that the LCP system is broadly distributed in bacterial genomes of a large taxonomic diversity, spanning over Streptococcaceae, Lactobacillaceae, Enterococcaceae, Carnobacteriaceae, Bacillaceae and Staphylococcaceae families (Fig. 1c). Most of the candidate receptor-LCP pairs are encoded in the bacterial genomes, and a non-negligible number of LCP systems are present in either plasmids or integrative and conjugative elements (ICEs) (Fig. 3). However, receptor-LCP pairs were not detected in prophages or phages. The genetic elements encoding LCP-receptor pairs are found in various host-associated microbiomes, waste waters, and fermented food products. The LCP systems are prevalent in the genomes of several clinically relevant human pathogens such as S. pyogenes or Enterococcus casseliflavus, and in animal pathogens such as S. porcinus and S. pseudoporcinus (Fig. 3).
Most putative LCPs are 8 to 10 amino acids long with few exceptions such as LCPs from Lysinibacillus sphaericus, Staphylococcus delphini, Enterococcus durans, and Granulicatella balaenopterae. The predicted LCPs are highly hydrophobic and are mostly comprised of aliphatic and aromatic amino acids (Fig. 2 and S1). The peptides encode distinct communication codes with unique LCP amino acid sequences, suggesting that LCP-mediated communication is specific among closely related bacterial species or strains. Importantly, the characterized RopB-SIP system of S. pyogenes represents only a small subset of the identified LCP systems (Fig. 2), indicating the broader distribution of LCP systems and their potential unappreciated roles in bacterial pathophysiology.
The structure-guided multiple amino acid sequence alignment (MSA) analyses of RopB clan receptors indicate that the LCP-binding C-terminal domain (CTD) diversified faster than the N-terminal DNA-binding domain (DBD) (Figs. S2 and S3a)7,11,17. Among the structural elements of CTD, the α6 helix is highly conserved relative to the rest of CTD (Figs. S2 and S3a). The α6 helix constitutes a critical structural element for LCP binding as it forms the floor of the LCP binding pocket as well as engages in intramolecular interactions that provide the scaffold for the LCP-binding pocket of RopB (Supplementary Fig. 3b)7,11,17. In accordance with the suggested functional constraint on α6 helix to bind LCPs, the site-wise dN/dS ratio analyses indicate that α6 helix of candidate LCP receptors are under strong purifying selection against amino acid substitutions (Supplementary Fig. 3a). Similarly, the MSA analyses of 12 LCP-contacting amino acids of RopB with corresponding amino acids from candidate RopB-clan receptors suggest that these amino acids evolve slower and face stronger purifying selection compared to the reminder of CTD (Supplementary Fig. 4a, b). These observations are suggestive of faster diversification of TPR motifs likely due to their innate degeneracy18,19 relative to LCP-receptor specificity diversification. In accordance with this, pairwise comparisons of evolutionary distances between LCP receptors, LCP-contacting residues in receptors, and candidate LCPs suggest that the amino acid sequences of LCP receptors diverged faster than the LCP-contacting residues and amino acid sequences of putative LCPs, and both LCPs and LCP-contacting amino acids in RopB clan receptors diverge at similar evolutionary rate (Supplementary Fig. 5). Collectively, these observations are suggestive of the function of RopB clan receptors and candidate LCPs as qs receptor-signal pairs.
To understand the co-evolution of receptors and candidate LCPs, we aligned the 12 LCP-contacting amino acids of RopB with similarly located amino acids from RopB-clan receptors and compared them with the physicochemical characteristics of the corresponding LCPs (Supplementary Fig. 4a, b). Consistent with the preponderance of aliphatic and aromatic amino acids in majority of candidate LCPs (Supplementary Fig. 1d), the peptide-contacting amino acids are relatively well conserved among RopB clan receptors (Supplementary Fig. 4). However, compared to RopB, the peptide-contacting residues of RopB clan receptors from Ligilactobacillus muralis, Pediococcus acidilactici, Ligilactobacillus animalis, and Enterococcus casseliflavus are distinct (Supplementary Fig. 4). Accordingly, the physicochemical properties of corresponding LCPs also deviate from the typical signature of LCPs as they contain charged, polar, and proline residues (Supplementary Fig. 4). These observations suggest a tropism of RopB clan receptors for SIP-like LCPs, however, divergence exists to achieve alternative LCP specificities through receptor-LCP co-evolution. Finally, analyses of the RopB clan receptor-LCP pairs from Bacillus cereus and Lysinibacillus sphaericus revealed an evolutionary feature that is suggestive of peptidase-mediated processing of some LCPs similar to canonical RRNPP propeptides (Supplementary Fig. 4). The predicted LCP binding sites and the corresponding candidate LCPs of B. cereus and L. sphaericus have identical amino acid composition. However, the LCP from L. sphaericus have an additional eight amino acids in their C-terminus compared to B. cereus LCP (Supplementary Fig. 4). This observation suggests that LCPs may exist in longer precursor forms and the C-terminal appendages may be involved in peptidase-mediated cleavage of precursor form to release mature LCPs.
Analyses of the genomic context of representative putative LCP systems showed that the candidate genes regulated by LCPs are predominantly in a divergent context relative to the receptor (Fig. 2) and belong to 3 major categories: biosynthetic gene clusters (BGCs) predicted to produce ribosomally synthesized and post-translationally modified peptide (RiPPs) as well as non-ribosomally synthesized antimicrobials, ABC-type transporters, and type VII secretion systems that are typically involved in the translocation of virulence factors (Fig. 3). These observations suggest a broader and more diverse role for LCP systems in bacterial pathogenesis, physiology, and microbial ecology.
LCP in S. salivarius mediates gene regulation
To investigate whether the LCPs other than SIP also act as intercellular signals, we characterized the putative cytosolic receptor-LCP pair from S. salivarius (RopBss-LCPss). The LCPss is encoded in a megaplasmid and located downstream of ropBss and transcribed divergently (Fig. 4a). The LCPss encodes an eight amino acid hydrophobic peptide with a predicted amino acid sequence of MWLILLFL with no additional amino acids at either end (Fig. 4b). The genetic proximity of a 14-gene operon encoding a putative non-ribosomal peptide synthase biosynthesis gene cluster (NRPS-BGC) located immediately downstream of LCPss and transcribed convergently (Fig. 4a) suggest that NRPS-BGC is the regulatory target of the RopBss-LCPss pathway.
In accordance with this, inactivation of ropBss or LCPss abrogated NRPS-BGC expression and cis-complementation of ∆ropBss and LCPss* mutants with ropBss and LCPss, respectively, restored NRPS-BGC expression (Fig. 4c). Similarly, the addition of synthetic LCPss containing the predicted amino acid sequence in native order (LCPss), not in scrambled order (SCRA), restored WT-like NRPS-BGC expression in the LCPss* mutant (Fig. 4d). However, supplementation with synthetic LCPss containing staggering truncations at either N-terminal or C-terminal ends (Fig. 4d) failed to activate NRPS-BGC expression in the LCPss* mutant, demonstrating that LCPss encodes a mature LCP and lacks the hallmarks of canonical bacterial peptide signals (Fig. 4d). Furthermore, supplementation with even 20X molar excess of synthetic LCPss failed to activate NRPS-BGC expression in ∆ropBss (Fig. 4d), indicating that LCPss activity requires its cognate receptor RopBss. However, despite the absence of the secretion signal sequence, the LCPss is secreted and reinternalized into the cytosol and acts as an intercellular signal. This was demonstrated by the presence of LCPss associated regulatory activity only in the secreted component of peptide-producing strains (WT, LCPss*::LCPss, and ∆ropBss::ropBss) and internalization of the exogenously added FITC-labeled synthetic LCPss (Fig. 4e, f and Supplementary Fig. 6). Furthermore, inactivation of the canonical bacterial peptide import machinery oligopeptide permease (∆opp) did not affect LCPss import (Fig. 4f)20, suggesting that unknown reimport mechanisms are involved in LCPss import.
To elucidate the molecular mechanism of LCPss-mediated signaling, we investigated the sequence-specific recognition of LCPss by cytosolic RopBss by fluorescence polarization (FP) assay using FITC-labeled LCPss. RopBss binds LCPss with high affinity (Kd ~8 nM) (Fig. 4g) and the pre-formed RopBss-FITC-LCPss complex was disrupted only by unlabeled LCPss, not by non-specific SCRA (Fig. 4h), indicating that RopBss recognition of LCPss is sequence-specific. To explain the downstream consequences of RopBss-LCPss interactions, we hypothesized that LCPss facilitates RopBss interactions with target promoters and promotes RopBss-dependent activation of NRPS-BGC expression. To map the operator sequences for RopBss in LCPss and NRPS-BGC promoters, we performed electrophoretic mobility shift assays (EMSA) using different DNA fragments that span LCPss and LCPss-NPRS-BGC intergenic region (Supplementary Fig. 7). RopBss bound only to a 43-bp fragment located immediately upstream of the putative −35 hexamer of the LCPss promoter and did not interact with the NRPS-BGC promoter (Fig. 4i and Supplementary Fig. 7). These results indicate that LCPss and NRPS-BGC are likely expressed as a polycistronic transcript and RopBss binding site is located in the LCPss promoter (Supplementary Fig. 7). We further probed the 43-bp fragment for the presence of putative palindromes and found an inverted repeat with a 12 bp half site –4 bp spacer –12 bp half site motif that likely constitutes RopBss binding site (Supplementary Fig. 7e–f). However, the RopBss binding site differs from RopB-GAS binding site in several aspects including the motif arrangement, length, and nucleotide composition. The RopB-GAS binds to a palindrome with a 9 bp half site –7 bp spacer –9 bp half site motif (25 bp long)10 compared to the 12 bp half site – 4 bp spacer – 12 bp half site motif (26 bp long) of RopBss. The half site of the palindrome in RopB-GAS binding site has a nucleotide composition of GTTACGTNT10, which varies from RopBss binding site that has nucleotide composition of ATGTAACATATT (Supplementary Fig. 7f). These findings indicate that the receptors recognize operator sequences of different length and nucleotide composition in the target promoters. However, consistent with the role of RopBss and RopB-GAS as transcription activators10 and their likely role in the recruitment of RNA polymerase to defective promoters21, the binding sites for both receptors are located upstream of and around the −35 region of LCP promoters.
To delineate the influence of LCPss on RopBss-promoter interactions, we assessed RopBss-DNA interactions in the presence and absence of LCPss by FP assay using FITC-labeled oligoduplex containing the identified RopBss binding site (Fig. 4i). The addition of LCPss resulted in high affinity interactions between RopBss and the cognate DNA sequences (Kd ~ 90 nM) compared to that of apo- or SCRA-bound RopBss (Kd ~ > 500 nM)(Fig. 4j and Supplementary Fig. 7g, h), indicating that LCPss binding promotes high affinity interactions between RopBss and LCPss promoter. Based on these observations, we proposed a model for LCPss signaling and LCPss-dependent transcription activation of NRPS-BGC by RopBss (Fig. 4k).
LCP system regulates streptococcal virulence factor production
To assess the functionality of LCPs in other bacteria, we first assessed the regulatory activity of LCP from the swine pathogen S. porcinus. The amino acid sequence of S. porcinus LCP (LCPsp) is identical to that of SIP (Fig. 5a, b). The coding region of LCPsp is flanked upstream by ropB in the divergent direction and downstream by a gene encoding cysteine protease (speBsp) that is transcribed convergently (Fig. 5a). Consistent with the role of LCPsp as an intercellular signal that controls speBsp expression, supplementation of S. porcinus with synthetic LCPsp triggered early induction of speBsp expression, while the non-cognate SCRAsp had no effect on gene regulation (Fig. 5c). Since the secreted cysteine protease SpeB is critical for the virulence of S. pyogenes22,23, we reason that LCPsp-mediated activation of speBsp expression may impact the pathogenic traits of S. porcinus.
LCPs from different clades of the RopB clan receptors phylogeny mediate intercellular communication and gene regulation
To test whether LCP from non-streptococcal genus is functional, we characterized the regulatory activity of LCP from Enterococcus malodoratus (LCPem) (Fig. 2). The LCPem is 9 amino acid long and its amino acid sequence is distinct from SIP (Fig. 5d, e). The ropBem is divergently transcribed from LCPem. However, unlike the other characterized LCPs (above), there are no convergently transcribed genes downstream of LCPem (Fig. 5d). Instead, there are two genes encoding T7 secretion system-effector pair (T7SS) located downstream of ropBem and transcribed convergently from ropBem. Additionally, there are two hypothetical genes located downstream of LCPem but transcribed divergently from LCPem (Fig. 5d). Since the gene arrangement is distinct from other characterized LCP systems and regulatory influence of LCPem on these genes is unknown, we investigated the effect of synthetic LCPem on the expression of genes in both directions. Supplementation of E. malodoratus with LCPem induced only the expression of genes encoding T7SS and its effector and the induction was specific for LCPem (Fig. 5f). Contrarily, the LCPem had no influence on the expression profile of the two hypothetical genes located downstream of LCPem (data not shown). These findings demonstrate that the LCPem that is dissimilar to SIP acts as an intercellular signal and controls the production of an E. malodoratus T7 secretion system/effector system.
To investigate the functionality of a LCP system from a more distant LCP system, we assessed the regulatory activity of LCP from Limosilactobacillus reuteri (LCPlr) (Figs. 2, 6). Unlike other L. reuteri strains, the L. reuteri DSM32035 strain has a naturally occurring stop codon at amino acid position 4 of the putative LCPlr (Fig. 6b). The predicted untruncated full length LCPlr is 8 amino acid long (Fig. 6b, c) with the characteristic aliphatic and aromatic amino acid composition (Fig. 6a–c). The ropBlr is divergently transcribed from LCPlr (Fig. 6a). Two hypothetical genes encoding a putative ABC-type transporter are located downstream of LCPlr and transcribed convergently from LCPlr (Fig. 6a). Supplementation of synthetic LCPlr to the exponential growth of L. reuteri activated the naturally silent LCP pathway and induced the expression of genes encoding the ABC-type transporter. The induction was specific for LCPlr as the SCRAlr failed to activate the expression of ABC transporter (Fig. 6c). These results indicate that LCP from a distant species, LCPlr, functions effectively as a qs signal and mediates gene regulation.