Article Text

Download PDFPDF

Bioinformatic and experimental methods to identify and validate bacterial RNA-human RNA interactions
  1. Kylie I Krohmaly1,2,
  2. Robert J Freishtat1,3,4,
  3. Andrea L Hahn1,4,5
  1. 1Center for Genetic Medicine Research, Children's National Research Institute, Washington, District of Columbia, USA
  2. 2Institute for Biomedical Sciences, The George Washington University School of Medicine and Health Sciences, Washington, District of Columbia, USA
  3. 3Division of Emergency Medicine, Children's National Hospital, Washington, District of Columbia, USA
  4. 4Department of Pediatrics, The George Washington University School of Medicine and Health Sciences, Washington, District of Columbia, USA
  5. 5Division of Infectious Diseases, Children's National Hospital, Washington, District of Columbia, USA
  1. Correspondence to Dr Andrea L Hahn, Children's National Hospital, Washington, District of Columbia, USA; alhahn{at}


Ample evidence supports the importance of the microbiota on human health and disease. Recent studies suggest that extracellular vesicles are an important means of bacterial-host communication, in part via the transport of small RNAs (sRNAs). Bacterial sRNAs have been shown to co-precipitate with human and mouse RNA-induced silencing complex, hinting that some may regulate gene expression as eukaryotic microRNAs do. Bioinformatic tools, including those that can incorporate an sRNA’s secondary structure, can be used to predict interactions between bacterial sRNAs and human messenger RNAs (mRNAs). Validation of these potential interactions using reproducible experimental methods is essential to move the field forward. This review will cover the evidence of interspecies communication via sRNAs, bioinformatic tools currently available to identify potential bacterial sRNA-host (specifically, human) mRNA interactions, and experimental methods to identify and validate those interactions.

  • microbiota
  • RNA
  • bacteria
  • research design
  • RNA, messenger

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, an indication of whether changes were made, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Section 1: background and evidence of bacteria-to-human communication via small RNAs

The function and dysfunction of microbiota have been implicated in numerous human conditions, including inflammatory bowel diseases, asthma, and even infertility.1–3 However, the mechanism by which the community of microbiota that reside on and in humans contribute to human disease remains poorly understood. More recent evidence supports that microbiota impact the host in part via molecular mechanisms induced by bacterial extracellular vesicles (EVs) and their contents.4

Bacterial EVs5 are lipid bilayer-bound products deriving from the outer and/or inner membranes of bacteria6 and can be between 10 and 500 nm depending on the species.7 EVs can be isolated through various methods including filtration, ultracentrifugation, precipitation kits, or size-exclusion chromatography, and characterized by size, protein content, lipid content, function, and more.8 There is no gold standard for bacterial EV isolation and characterization, however the minimal information for studies of EV guidelines updated in 2018 provide recommended protocols and reporting criteria to enhance reproducibility of methods.8

Bacterial small RNAs are among the RNA cargo of bacterial EVs

Bacterial EVs carry a range of cargo, including RNA, DNA, and proteins.6 9 10 RNA sizes and types found within EVs include protein coding genes, transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), and short non-coding RNAs.9 11–13 Short non-coding RNAs are regulatory molecules that are able to modulate gene expression via base pairing with a potential target transcript.14 Eukaryotic short non-coding RNAs often take the form of microRNAs (miRNA) which are processed from miRNA genes,15 while in bacteria, short non-coding RNAs are called small RNAs (sRNAs) and are generally derived from intergenic regions.16,17 tRNA fragments are considered a type of sRNA,18,19 and are found in bacterial EVs.9 Additionally, both rRNA and tRNA fragments have been implicated in inducing apoptosis in human cells.20

Whereas eukaryotic miRNAs are around 20 nucleotides (nts) long, bacterial sRNAs can be as long as 500+ nt,21 but are generally between 50 and 400 nt. This extra length allows bacterial sRNAs to form complex secondary structures that impact the stability of the sRNA, how it regulates its target transcript, and how and if it binds its target.22–2829 30 Bacteria also produce miRNA-sized RNAs,18 29–33 which may function similar to human miRNAs. sRNA is a consistent and often large portion of EV cargo.9 11–13 Interestingly, bacterial EV RNA cargo is distinct from the intracellular RNA population,9 18 34 and from the non-EV extracellular RNA population,9 indicating that RNA is strategically loaded into EVs for their target cell.

Bacterial sRNAs circulate in the bloodstream and affect mammalian gene expression

Bacterial sRNAs35–37 and particularly those encapsulated in bacterial EVs38 39 have been detected in the plasma of both healthy individuals and those that are affected by various illnesses. Mammalian cells internalize bacterial EVs11 40 and furthermore, EV cargo—specifically, their RNA cargo—localizes to both the cytoplasm33 40 and the nucleus.11 33 40 Exogenous bacterial EVs also traverse the blood-brain barrier.33 40

Bacterial sRNAs have been bioinformatically predicted and experimentally validated to affect certain pathways in human and murine cells. Using three periodontal pathogens’ EVs, Choi et al confirmed the presence of miRNA-sized sRNAs.32 Using bioinformatic tools, they predicted human immune system-related mRNA targets of several sRNAs.32 Jurkat T cells exposed to synthetic oligos of the most highly expressed sRNAs showed that these sRNAs downregulated several cytokines (interleukin (IL)-5, IL-13, and IL-15).32

Additionally, Han et al investigated the potential of EV-derived RNAs from Aggregatibacter actinomycetemcomitans (Aa), a periodontal pathogen, to induce an immune response.29 U937 cells exposed to both intact Aa EVs and lysed Aa EVs (both exposed to RNase) showed that intact EVs induced a greater tumor necrosis factor-alpha response than lysed EVs.33 Compared with RNase-treated lysed EVs, cells exposed to DNase-treated lysed EVs also upregulated nuclear factor kappa B (NF-κB) activation and toll-like receptor 8 expression (both of which sense single-stranded RNAs); the authors attributed the difference in outcome to the RNAs of the DNase-treated lysed EVs.35 The authors also found that BV2 (ie, murine microglial) cells activated NF-κB and upregulated IL-6 after exposure to DNase-treated lysed Aa EVs or intact EVs more so than RNase-treated lysed EVs.41

Bacterial sRNAs interact with mammalian RISC

RNA-induced silencing complex (RISC) in mammalian cells comprises Argonaute (Ago) and a miRNA, which then can bind the miRNA’s target mRNA to result in its degradation.15 Han et al described bacterial sRNAs that interact with host RISC after Aa EV exposure to U397 cells using total RNA sequencing (RNA-Seq) and RNA immunoprecipitation and sequencing (RIP-Seq).33 While Aa-derived sRNAs only accounted for 0.02% of reads via RNA-Seq, 0.35% of reads originated from Aa when looking at RISC-associated sRNA through RIP-Seq.33

In a similar study, Furuse et al investigated five pathogenic intracellular bacteria which produced RISC-enriched RNAs (compared with total RNA levels) that were close to the same length as authentic eukaryotic miRNAs.29 Additionally, one bacterial species, Mycobacterium marinum (Mm), produced an RNA, MM-H, the expected size (23 nt) with similar characteristics to canonical miRNAs (ie, stem-loop structure prediction, discrete 5’ end consistent with Dicer trimming).29

These studies demonstrate that bacterial sRNAs have the potential to operate as human regulatory RNAs via RISC. Bacterial sRNAs—specifically, bacterial miRNA-sized RNAs—interacting with RISC are a likely mechanism by which some sRNAs exert their impact on host gene expression.29 33 While these findings taken together are very compelling, more studies are needed to confirm this potential mechanism and investigate any off-target effects (perhaps relating to the machinery itself), or the possibility that host miRNA may be targeting the bacterial sRNAs. Nonetheless, discovering that sRNAs interact with miRNA machinery is encouraging evidence that the interactions and resulting effects may be sequence-specific rather than due to foreign single-stranded RNA.

Although Furuse et al investigated if MM-H interacted with a perfectly complementary target using a reporter assay,29 no human target prediction was performed for the sRNA. Future studies will need to investigate the potential specific targets of the discovered sRNAs more deeply. There is a need for identification and validation to determine sRNA candidates for microbial-host communication in conditions for which host microbiota have been implicated (eg, inflammatory bowel disease or asthma).1 2 In the following two sections, we will discuss bioinformatic methods (table 1) and experimental methods (table 2) that could be employed to discover these interactions and identify and validate potential human targets of bacterial sRNAs.

Table 1

Summary of bioinformatic tools that can be used to predict bacterial RNA-human RNA targets

Table 2

Summary of experimental approaches that can be used to identify and validate bacterial RNA-human RNA interactions

Section 2: bioinformatic approaches for predicting bacterial sRNA-human mRNA interactions

Homology and alignment tools for sRNA target prediction in silico

As RNA-RNA interactions are mediated via base pairing, sequence complementarity for at least the seed region (ie, the perfectly complementary region that ‘seeds’ the interaction) is required. This region of the sRNA may be conserved across species, and a novel sRNA’s function in one species may be similar to that of a homologous or identical sRNA (ie, a perfect match after alignment) in another species.

Koeppen et al identified human targets for 10 Pseudomonas aeruginosa sRNAs in EVs via BLAST (and other tools discussed below), which they then used to guide their functional validation.18 Jorth and Whitely, after identifying and characterizing a lysine riboswitch controlling a lysine transporter in Aa, identified a homologous transporter gene in Haemophilus influenzae (Hi) through BLASTp.42 They confirmed the riboswitch’s effect on the transporter in Hi, demonstrating conservation of the system across the two species.42 In other words, they aligned one species’ sRNA target in BLAST in order to determine if the latter produced a similar target. This type of approach can be ued in identifying sRNA targets in different genomes; after identifying the transcript and homologous transcripts in other genomes, and one can then go backwards from there to identify sRNAs that target the transcript. BLAST (and other alignment tools combined with appropriate databases such as bowtie41) can also be useful in identifying sRNAs themselves through alignment to sRNA databases—with whole sequences, and with portions of sequences that may be conserved. For more tools on sRNA prediction and detection, refer to table 3.

Table 3

Summary of bioinformatic tools that detect bacterial small RNAs and their genes

The use of alignment and homology to annotate sRNAs with targets in different species (and even within a species) is currently limited in part because there may be species-specific sRNAs, and because there are still many sRNAs that need functional validation and many species left to investigate. For example, one group assigned functions to detected bacterial sRNAs using Rfam43 and other databases by aligning the detected sRNAs to the annotated sRNA sequences within the databases.44 However, only 19 of 422 sRNAs were assigned a function.44 These databases can contain predicted and experimentally validated sRNAs. sRNA alignment and homology may be more widely used after more sRNA targets have been identified and fully characterized.

Importantly, alignment to directly determine sRNA targets (ie, a reverse-complement match to an sRNA) may be less reliable, however, as the secondary structure of the sRNA and mRNA are not taken into account in alignment tools. This is important to consider because although a sequence match may be found between a pair, the sequences may not be able to interact—for example, the seed region may be occluded.

Energy-based and accessibility-based models consider secondary structure

Besides homology, energy-based models are RNA-RNA interaction prediction algorithms and pipelines that calculate interaction or hybridization energy; that is, the energy required for two RNAs to interact via base pairing.22–24 An interaction is favorable when the energy is negative, which means that it is a spontaneous interaction,45 and is used as a proxy to interaction stability.27 One such model is RNAhybrid.46 However, RNAhybrid does not consider the secondary structure of the two RNAs before they interact.46 Importantly, the binding site that is predicted in RNAhybrid may be concealed and unable to interact in its natural secondary structure, generating a false positive ‘hit’. Other tools take secondary structure into account by considering the ‘accessible energy’ of the interaction, which is the energy needed to make the binding site accessible or open,22 23 27 47 or the probability that the binding site is unpaired in the secondary structure.24 In these models, the reported energy is typically the sum of accessible energy and hybridization or interaction energy,22 23 27 with the lowest energy being a more favorable or plausible interaction.

RNAup,23 RNAplex,24 and IntaRNA22 27 are designed for short non-coding RNA interaction prediction and consider both hybridization and accessible energy. Generally, at least one sRNA and putative target are required. These targets may be a few mRNAs to a whole transcriptome or genome, as in IntaRNA.22 27 RNAplex and IntaRNA are able to be applied to both bacterial sRNA and miRNA target prediction.24 27 Umu and Gardner performed a benchmark study of 15 RNA-RNA interaction prediction tools comparing datasets of eukaryotic, bacterial, and archaeal short RNAs along with their RNA targets.48 Overall, energy-based and accessibility-based tools performed the best in terms of true positive rates (sensitivity) and positive predictive values (precision).48 Specifically, RNAup overall performed the best, followed by IntaRNA, and RNAplex.48

From the developers of GLASSgo, IntaRNA is an energy-based and accessibility-based RNA-RNA interaction prediction tool that predicts both prokaryotic49 and eukaryotic interactions.50 Recently, IntaRNA has been incorporated into ViRBase V.3.0, a virus-host interaction database, to aid in interspecies RNA-RNA interaction binding site prediction.51 Uniquely, in addition to outputting the potential binding sites and energy scores as RNAplex and RNAup do, IntaRNA provides minimal energy profiles that allow visual exploration of interaction patterns.27

RIsearch247 facilitates exploration of predicted sRNA targets by integrating experimental transcriptomic data. This pipeline assumes that there are target (intended) interactions and off-target interactions, and calculates targeting probabilities for each potential target of an sRNA using interaction probabilities derived from both energies.47 As output, RIsearch2 reports the predicted interaction target, the start and stop positions with a visualized binding site, and the energy of the interaction.52

Similarly, RNAnue53 is a bioinformatic pipeline that analyzes experimental RNA-RNA interaction data derived from crosslinking and ligation methods (discussed in section 3). RNAnue considers hybridization and accessible energy, and complementarity of RNA partners.53 The main outputs of RNAnue are alignments of the RNA partners (chimeras, discussed further in section 3) to the genome and complementarity and energy scores.53 Overall, when benchmarked using experimental data produced by the methods, the RNAnue pipeline outperformed the original bioinformatic pipelines.53

Bioinformatic tools are able to predict interkingdom RNA-RNA interactions

Dual RNA sequencing (dRNA-Seq) is simply the simultaneous sequencing of two or more organisms without enriching or depleting any RNA type.54 After sequencing, reads are aligned to human and bacterial genomes in parallel, and subsequent differential expression, pathway, or network analysis, and more can be performed to detect sRNAs in the dataset and determine their targets.55 56 Westermann et al used dRNA-Seq and differential gene expression (DGE) and correlation analyses to identify a novel Salmonella enterica sRNA, PinT, after infecting HeLa cells.55 56 Through this strategy, they also determined that PinT regulates bacterial virulence factors which impact host signaling.55 56 Although this sRNA was not shown to have a direct human RNA target, this technique is still applicable to identifying direct interspecies RNA-RNA interactions.

Ingenuity pathway analysis (IPA)57 is a bioinformatic tool that allows transcriptome integration and pathway visualization facilitating sRNA target exploration and has been employed to predict bacterial sRNA-induced human gene expression changes.18 IPA was developed for eukaryotic transcriptome data (such as that generated from dRNA-Seq58), as well as metabolome and proteome data analysis. IPA integrates experimentally validated interactions from miRNA-mRNA interaction databases, and TargetScan, a miRNA-mRNA interaction prediction tool.59 Using IPA, Mika-Gospodorz et al identified differentially expressed genes and affected pathways in host cells infected with an intracellular bacterium,60 while Koeppen et al analyzed which human proteins were impacted by the addition of their synthetic sRNA of interest in human bronchial epithelial cells.18 Using a platform such as IPA to analyze dual RNA sequencing data54,58 61 would facilitate target prediction and contextualization within the biological system. TargetScan (currently at release V.8.0) predicts miRNA targets by scanning 6–8 nt long sites on mRNAs complementary to the miRNA.62 63 It requires the selection of a species supported by TargetScan in addition to either a gene or transcript ID and/or select a conserved miRNA family, miRBase annotation, or that the user enter the name of a known miRNA.62 63

Koeppen et al also predicted interkingdom RNA-RNA interactions between bacterial sRNAs and human mRNAs using miRanda.18 miRanda computes miRNA targets by assessing complementarity of the two input sequences and then calculating free energy of the interaction; the lowest (most favorable) energy of the resulting linked sequence structure is computed64 miRanda also considers conservation of the miRNA by scanning it against 3’ UTR datasets.64 Additionally, Choi et al used DIANA-microT for the same purpose.32 Using alignment of the driver sequence (ie, the first 9 nt of the miRNA, to databases of targets’ 3’ UTRs65 and coding sequences66), the algorithm identifies miRNA recognition elements, which are 7–9 consecutive and complementary nucleotides to the driver sequence.65 DIANA-microT uses RNAhybrid to compute hybridization energy for sites with fewer than seven consecutive complementary nucleotides.65 Pairings are ranked based on conservation of the miRNA recognition elements in several species.65 Additionally, DIANA-microT contains integrated pipelines that can, for example, analyze expression data or perform enrichment analysis of miRNA-target pairings.66 While all three tools were designed to predict human miRNA targets, Koeppen et al and Choi et al have established their use in predicting interspecies—even more, interkingdom—RNA-RNA interactions (ie, human targets of bacterial sRNAs).18 32

Section 3: experimental approaches for identifying and validating bacterial sRNA-human mRNA interactions

Use of two known interacting RNA partners to validate an interaction

Originally used to investigate nucleic acid-protein interactions, an electrophoretic mobility shift assay (EMSA) can validate the interaction of two suspected RNA partners. This first involves in vitro transcription of the two RNAs and labeling of one of them for detection, such as radiolabel49 67 68 or digoxigenin.69 The two RNAs are then incubated and shifted on a gel together; if the two RNAs interact and form a duplex, a second radiolabeled band will form and be visible.49 67–69

Reporter systems are also helpful in determining the interaction of two known RNAs. Furuse et al used a Renilla luciferase (Rluc) reporter assay to determine if MM-H would bind and repress two perfectly complementary sequences.29 One modification to this approach is to clone a potential human target transcript into the Rluc vector, providing a clearer picture of if MM-H can repress a human transcript. Also, in lacZ reporter system by Mai et al, derepression of the sRNA will decrease beta-galactosidase activity if it interacts and binds with the candidate transcript’s 5’ UTR.49 Mai et al observed this decrease in beta-galactosidase activity after 6C sRNA was derepressed, indicating its binding to its target, the lacZ-fused panD transcript.49

Alteration of sequence and expression can similarly determine if two known RNAs interact. Zhang et al, interested in cotton-fungal interactions, identified several cotton miRNAs in infecting fungal hyphae through deep sequencing of sRNA.70 Zhang et al mutated fungal transcripts to eliminate base pairing with the identified miRNAs.70 After transformation into the fungus, expression and cotton infection, the resistant transcripts accumulated while the wild-type transcripts were silenced, demonstrating direct targeting by the cotton miRNAs.70 Similarly, Jiao and Peng overexpressed and knocked-down a particular wheat miRNA (using a viral-transformed bacterial infiltration method); miRNA overexpression resulted in downregulation of target fungal transcripts, while knocked-down miRNA allowed an accumulation of target transcripts.71

RNA pull-down assays can demonstrate direct RNA-RNA interaction between known RNAs.72 Here, both interacting partners must be known, as transfection of one known RNA occurs and the end point is reverse transcription-quantitative PCR (RT qPCR), which requires primers of both sequences for detection. In this technique, a biotin-labelled known RNA of interest is transfected into cells and after lysis, the cells are added to magnetic streptavidin beads.72 The known biotinylated RNA base pairing to its suspected partner are pulled down using the beads, extracted, then amplified with RT qPCR using primers of both sequences.72 For Yang et al, base-pairing without crosslinking was enough to pull-down the pair and allow detection of the target RNA via RT-qPCR.72 This method is helpful in validating a few RNA-RNA interactions at a time, but again the interacting pairs need to be known beforehand.

Targeted identification to enable identification of one or both RNA partners

‘MS2-affinity purification coupled with RNA sequencing’ (MAPS) requires knowledge of one of the two sequences.73–75 First, an MS2 (a coat protein from a bacteriophage) aptamer is fused to a sequence of interest and expressed in vivo.73–75 Lysates undergo affinity chromatography using the MS2 maltose binding protein, which binds the MS2 aptamer with high affinity and specificity,75 to enrich RNAs fused to the MS2 aptamer and those base pairing with fused RNAs.73–75 RNA is then extracted and sequenced.73–75 This method involves manipulation of the native RNA of interest, which could potentially impact the secondary structure of the RNA or its binding. In contrast to RNA pull-down assays, only one interacting partner needs to be known as MAPS uses sequencing of the associated RNAs instead of RT qPCR.

Furuse et al and Han et al used RIP-Seq to identify microRNA-sized bacterial sRNAs associated with mammalian RISC.29 33 After infection with intracellular bacteria29 or dosage with EVs,33 cells are lysed and incubated with beads coated with anti-Ago antibodies. RNA is then extracted and subjected to deep sRNA-Seq.29 33 This technique allows identification of mammalian RISC-associated bacterial sRNAs, and previously discussed bioinformatic tools may be employed to help with this.

Crosslinking and/or ligation methods to enable global or near-global identification of interacting RNA partners

As some RNA-RNA interactions are facilitated by certain proteins, crosslinking RNA to proteins and/or ligation to the linked RNA with subsequent pull-down of specific anchor proteins known to facilitate or be involved with RNA-RNA interactions can aid in identifying targets of sRNAs that use that protein. Generally, techniques that involve both crosslinking and ligation follow these steps: (1) cells are grown and nearby RNAs or proteins are crosslinked; (2) cells are lysed and RNA is extracted; (3) crosslinked RNAs are ligated at one or both ends, forming a chimera; (4) crosslinks are reversed; (5) RNA is purified, prepared for sequencing and sequenced (figure 1). Timing of RNA pull-down depends on the technique. Crosslinking can be achieved by several molecules, but psoralen-derivative 4’-aminomethyltrioxsalen (AMT) is commonly used.76–80 AMT inserts itself in nucleic acids and on irradiation with 365 nm UV light, AMT will covalently bind two nearby RNAs together (intermolecularly and intramolecularly), with a preference for thymine or uridine,81 82 which may introduce bias for regions containing more uridine.83 Formaldehyde crosslinks RNA and protein to protein, while disuccinimidyl glutarate crosslinks protein to protein.80 Crosslinking RNA with protein, and protein to protein, enables inclusion of RNA whose direct interactions are facilitated by proteins, and also RNA indirectly involved through peripheral proteins.80 Conveniently, these crosslinks are reversed on irradiation with 254 nm UV light. To our knowledge, the methods discussed in this section have not been applied to identify direct RNA-RNA interactions between bacteria and humans.

Figure 1

Summary and comparison of crosslinking and ligation methods. Crosslinking, RNA extraction, ligation, crosslink reversal, and RNA purification and sequencing represent a core set of steps shared by many discussed methods and are indicated with images. For each method, the protocol is listed stepwise in descending order, starting at crosslinking. Differences between the methods are indicated with text as they appear in the protocol. RNA purification and sequencing (not depicted) is always the final step in the protocol. 2D, two-dimensional; AMT, 4’-aminomethyltrioxsalen; DSG, disuccinimidyl glutarate; FA, formaldehyde; LIGR-Seq, ligation of interacting RNA followed by high throughput sequencing; PARIS, psoralen analysis of RNA interactions and structures; RAP-RNA, RNA antisense purification to systemically map RNA-RNA interactions; RIC-Seq, RNA in situ conformation; RIL-Seq, RNA interaction by ligation and sequencing; rRNA, ribosomal RNA.

Gay et al debuted a ‘quick cross-linking and sequencing of hybrids’ (qCLASH) that enables the pull-down of Ago-crosslinked RNAs.84 In this technique, only UV irradiation crosslinks RNAs to proteins. Specifically, Ago proteins, along with the crosslinked RNAs, are then immunoprecipitated on beads.84 RNA pairs are ligated while on the beads and Ago is digested, while RNA is purified and sequenced.84 In a similar CLASH, Hfq, a bacterial chaperone that interacts with sRNAs, and associated RNAs are pulled down.85

‘RNA interaction by ligation and sequencing’ (RIL-Seq) similarly targets Hfq and has very similar general steps to qCLASH and the Hfq-CLASH.86 RIL-Seq, however, requires Hfq to express a Flag tag to facilitate its precipitation.86 The computational portion of RIL-Seq (and CLASH) uses the ends of sequence fragments to map to the genome, as some of these reads are chimeras made of two RNA molecules that could be from different loci in the genome (or different genomes altogether, if looking at interspecies RNA-RNA interactions), and identifies statistically significant over-represented chimeras.86

‘RNA antisense purification to systemically map RNA-RNA interactions’ (RAP-RNA) uses biotinylated DNA probes after utilizing only crosslinking.80 These probes are antisense to an RNA of interest. There are three versions of RAP-RNA that differ mainly in type of crosslinking molecule: RAP-RNA (FA), (FA+DSG) or (AMT).80 In RAP-RNA (AMT), RNA-RNA complexes are crosslinked with AMT, and RNA antisense to the biotinylated probes are sequenced.80 In this technique, the crosslinks are not reversed, and reverse transcription is halted at the crosslinked RNA which enables mapping of RNA-RNA interaction sites, as opposed to just the identification of interaction partners.

To positively select crosslinked interaction pairs, ‘sequencing of psoralen crosslinked, ligated, and selected hybrids’ (SPLASH) uses biotinylated psoralen to crosslink interacting RNAs, enabling pull-down of the RNA complexes by streptavidin-coated beads before ligation.79 Similarly, ‘RNA in situ conformation’ (RIC-Seq) labels interaction pairs with biotinylated cytidine (bis) phosphate (pCp) at the 3’ ends of transcripts before ligation of both ends of the pair, which are then positively selected for using biotin-pCp.87

For clear delineation between and enrichment for ligated interaction pairs after sequencing, ‘mapping RNA interactome in vivo’ (MARIO) ligates unambiguous biotin-tagged RNA linkers between the RNA pairs.88 First, all RNA-protein and protein-protein complexes are crosslinked via UV irradiation for capture of RNA-RNA interactions facilitated by any protein, and the proteins are denatured while the RNA is fragmented.88 Biotinylation occurs at cysteine residues and is used to immobilize the RNA-protein complexes on streptavidin beads. Next, a biotin-tagged RNA linker is ligated to the 5’ ends of RNAs, and then the interacting RNA pairs are ligated to form the chimeric product ‘RNA1-linker-RNA2’.88 Nguyen et al developed ‘MARIO tools’ (available at:, which is a bioinformatic pipeline to aid in the analysis of MARIO output and determine enriched interacting pairs from different genomic locations.88

Some techniques omit protein crosslinking and opt for only RNA-RNA crosslinking. Some techniques use the same general crosslinking and ligation procedure summarized above without pull-down of any kind and some minor additions. In modified CLASH by Liu et al, extracted RNAs are size-selected before ligation.76 In ‘ligation of interacting RNA followed by high throughput sequencing’ (LIGR-Seq), RNase R digests uncrosslinked (likely non-interacting) RNA and thus enriches crosslinked duplexes after ligation.77 Additionally, in ‘psoralen analysis of RNA interactions and structures’ (PARIS) proteins are totally digested, RNA is partially digested, and crosslinked RNA is purified through gel electrophoresis.78 RNAnue, mentioned in the previous section, outperformed the original analyzation pipelines for SPLASH-generated, LIGR-Seq-generated, and PARIS-generated datasets.53 This or a program like it would be ideal for analyzation of data from crosslinking/ligation methods that generate chimeras.

Various techniques use both photoactive crosslinking and subsequent ligation of RNA partners to determine direct RNA-RNA interactions without the use of anchors such as RNA-binding proteins or complementary oligonucleotide baits. Crosslinking RNAs to each other, instead of anchoring proteins, and ligation of those partners enables investigation into a global picture of RNA interaction, including protein-independent interaction.76 In contrast to some methods mentioned above, most of these methods do not require the use of genetically modified proteins or systems in order to investigate RNA-RNA interactions. Furthermore, these techniques do not require an RNA of interest, as they are able to map global or near-global RNA-RNA interactions. However, as the RNA ligation step is inefficient and ligation of non-interacting RNA occurs,83 the number of chimeric reads is typically low. Schönberger et al83 offer several improvements to these methods, including introducing non-complementary overhangs to improve ligation efficiency and a combinatorial approach to enriching crosslinked RNA partners.

These crosslinking and ligation techniques could be used to identify interspecies or interkingdom RNA-RNA interactions—in particular, bacterial sRNA-human mRNA interactions—if employed either after exposure of a singular sRNA of interest or a mixed population of sRNAs (such as that inside bacterial EVs), as these techniques identify RNA-RNA interactions globally. While the main advantage of these techniques is the global or near-global identification of RNA pairing partners, one disadvantage is the possibility of a high number of pairing partners that need to be evaluated. However, excluding interactions that do not involve interspecies pairings lowers that number considerably, especially since a small proportion of such interactions are expected.29 33


Bacterial sRNAs are frequently found within EVs.9 11–13 These EVs can affect immune system-related gene expression,32 33 40 and bacterial sRNAs associate with eukaryotic miRNA machinery.29 33 While not a complete list of bioinformatic and experimental approaches that exist, this review summarizes a wide range of techniques that can be employed or modified for use to identify and validate the human targets of newly identified bacterial sRNAs (and more broadly, interspecies RNA-RNA interactions).

We discussed bioinformatic tools that allow annotation of sRNAs via aligning to known sRNAs (alignment/homology tools), and those that consider energy and accessibility to assess interaction favorability (summarized in table 1). Although the latter are more thorough and consider secondary structure, using a combination of bioinformatic techniques to identify sRNA targets is the most robust approach. Even when a particular target prediction tool is designed for a certain biological system, it may be applicable elsewhere—such as IntaRNA, which was originally intended for sRNA target prediction,22 but has been used for eukaryotic target prediction as well.50 Similarly, miRNA target prediction tools, such as TargetScan (IPA), DIANA microT, and miRanda, have been successfully used to predict human targets of bacterial sRNAs.18 32

Experimental techniques to identify RNA-RNA interactions that require the use of an anchor (eg, a biotinylated probe) and those that employ crosslinking and ligation of interacting RNA pairs were also discussed (summarized in table 2). EMSA, reporter systems, altering expression, and RNA pull-down assays are useful for validation of interactions, whereas crosslinking and ligation methods are more global and will be useful for exploration and identification of RNA-RNA interaction networks. While experimental methods to verify RNA-RNA interactions have often been described for use in one cell type or species, they can be adapted in experimental design to allow identification and validation of interkingdom RNA-RNA interactions. For example, the aforementioned crosslinking and ligation methods could be employed after cell exposure to bacterial sRNAs or EVs.

To our knowledge, the vast majority of the bioinformatic and experimental methods discussed here have not been used to analyze interspecies interactions, particularly bacteria-human. Employing these methods, particularly the experimental methods, to determine bacterial-human RNA-RNA interactions will facilitate the important research that is needed to determine the potential impact microbiota exert on the host through these interspecies RNA-RNA interactions. It is only with improved understanding of these interactions that we can begin to effectively alter or adapt microbiota and positively impact human disease.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.



  • Contributors KIK, RJF, and ALH were all involved in the concept development. KIK wrote the first draft of the manuscript. RJF and ALH provided critical review for intellectual content. All authors approve of the final version as written.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Commissioned; externally peer reviewed.