POSTAR2

Lu Lab

GET STARTED

  • POSTAR2 is a comprehensive database to integrate CLIP-seq, Ribo-seq and RNA-seq data for exploring post-transcriptional regulation.

  • POSTAR2 curated ~40 million experimentally probed RBP binding sites from CLIP-seq, ~36 million ORFs with translation landscape from Ribo-seq among 6 species.

  • POSTAR2 is a user-friendly database, which provides a platform to connect protein-RNA interactions with multi-layer information of post-transcriptional regulation, as well as a translation landscape of RNAs across various tissues, cell lines and conditions. Moreover, it’s helpful for biologists to generate novel hypotheses about the regulatory mechanisms of phenotypes and diseases.

  • Reference genome versions for each species in POSTAR2

    Species Human Mouse Fly Worm Arabidopsis Yeast
    Reference genome version hg38 mm10 dmel-r6.18 ws235 TAIR10 R64-1-1

    Software used in POSTAR2

    Software Description Version Reference
    CLIP-seq read mapping
    Bowtie A fast alignment tool used for CLIP-seq read mapping v1.0.0 PMID 19261174
    Novoalign A program used for HITS-CLIP and iCLIP read mapping v3.00.05 Link
    CLIP-seq peak calling
    Piranha Peak calling algorithm for all types of CLIP-seq dataset v1.2.0 PMID 23024010
    PARalyzer Peak calling algorithm for PAR-CLIP dataset only v1.1 PMID 21851591
    CIMS Peak calling algorithm for HITS-CLIP dataset only v1.0.4 PMID 24407355
    CITS Peak calling algorithm for iCLIP dataset only v1.0.4 PMID 24613350
    Predicting sequence motifs of RBP binding sites
    MEME Identify sequence motifs of RBP binding sites v4.9.1 PMID 7584402
    HOMER Identify sequence motifs of RBP binding sites v4.8 PMID 20513432
    WebLogo Visualize sequence motifs v2.8 PMID 15173120
    Predicting structural preferences of RBP binding sites
    RNApromo Identify enriched structural elements of RBP binding sites v3 PMID 18815376
    RNAcontext Identify structural preferences of RBP binding sites Jan 2016 PMID 20617199
    Predicting miRNA binding sites
    miRanda Predict miRNA targets by evaluating sequence complementarity and thermodynamic stability of RNA local alignment v3.3a PMID 14709173
    RNAhybrid Predict miRNA targets by finding the minimum free energy hybridization of a long and a short RNA v2.1.1 PMID 16845047
    psRobot Predict miRNA targets by identifying smRNAs with stem-loop shaped precursors among batch input data and predicts their targets in plants v1.2 PMID 22693224
    psRNAtarget Predict miRNA targets by integrating a predefined scoring schema and evaluating target site accessibility in plants v2 PMID 29718424
    Transcriptome analysis
    TopHat A fast splice junction mapper for RNA-seq reads v2.0.10 PMID 19289445
    Cufflinks Assemble transcripts and estimates their abundances for RNA-seq data v2.1.1 PMID 20436464
    Translatome analysis
    RiboWave A wavelet-based algorithm for identifying actively translated ORFs v1.0 in submission
    RiboTaper A multitaper spectral-based approach for detecting actively translated ORFs v1.3 PMID 26657557
    ORFscore A metric for exploring high-resolution footprinting with frame preference to identify actively translated ORFs NA PMID 24705786
    RibORF A support vector machine-based classifier for identifying actively translated ORFs. v0.1 PMID 26687005
    Other tools
    FASTX-Toolkit A collection of tools for CLIP-seq data preprocessing v0.0.13.2 PMID 21278185
    BEDTools Work for the comparison, manipulation and annotation of genomic features in BED and GTF format v2.17.0 PMID 20110278

    Resource used in POSTAR2

    Resource Description Version Reference
    RBP
    CLIPdb Provide uniformly identified RBP binding sites of publicly available CLIP-seq datasets v1.0 plus PMID 25652745
    POSTAR Provide uniformly identified RBP binding sites of publicly available CLIP-seq datasets for human and mouse v1.0 PMID 28053162
    ENCODE eCLIP Provide RBP binding sites identified using eCLIP technology Nov 2017 PMID 27018577
    RBP annotation Information about RBP gene symbol and RNA-binding domains NA PMID 25365966
    Gene Ontologies Define concepts/classes used to describe gene function, and relationships between these concepts Nov 2017 PMID 10802651
    RNA: Binding sites & Translatome
    GENCODE Human and mouse genome annotation human v27; mouse v7 PMID 22955987
    Flybase Fly gene annotation dmel-r6.18 PMID 18641940
    WormBase Worm gene annotation ws235 PMID 19910365
    TAIR Arabidopsis gene annotation TAIR10 PMID 22140109
    SGD Yeast gene annotation R64-1-1 PMID 12519985
    PhastCons Conservation scores for alignments of multiple genomes UCSC database  PMID 16024819
    PhastCons (Arabidopsis) Conservation scores for Arabidopsis NA PMID 23150631
    phyloP Basewise conservation scores of multiple genomes UCSC database  PMID 19858363
    Gene Ontologies Define concepts/classes used to describe gene function, and relationships between these concepts Nov 2017 PMID 10802651
    RNA: Crosstalk
    miRBase Published miRNA sequences and annotation v21 PMID 16957372
    RMBase2 RNA modification sites identified from high-throughput sequencing datasets v2.0 PMID 29040692
    RADAR A-to-I editing sites collected from published datasets and identified from high-throughput sequencing datasets v2 PMID 24163250
    DARNED RNA editing sites collected from published datasets NA PMID 23074185
    RNA editome (worm) RNA editomes of wild-type C.elegans NA PMID 25373143
    RNA: Variation
    dbSNP Public archive for genetic variation
    RNA: Disease
    GWASdb Comprehensive data curation and knowledge integration for GWASs v2 PMID 26615194
    ClinVar Relationships between human variations and phenotypes, with supporting evidence Oct 2015 PMID 24234437
    COSMIC Comprehensive resource for information on somatic mutations in human cancer v76 PMID 18428421
    TCGA whole-genome SNVs Whole-genome somatic mutations in human cancer (TCGA) NA PMID 23945592
    TCGA whole-exome SNVs Whole-exome somatic mutations in human cancer (TCGA) NA PMID 29596782

    Outline

    RBP module

    Here is an example showing how to search in RBP module and read the results.

    1.1 Input:

     

    Firstly, you should select a species: human, mouse, fly, worm, Arabidopsis or yeast and enter a RBP name (i.e., TAF15).

     

     

    You can also turn on the button “Advanced” to select specific CLIP-seq technology and peak calling method.

     

    1.2 Result:

    1.2.1 RBP summary

     

    This section provides information on the query RBP. It includes:

    (1) RBP name

    (2) Detail annotation: ensemble ID and a link to ensemble database.

    (3) Domain number: the total number of domains.

    (4) Domain [count]: domains and counts collected from Pfam.

    (5) RBP’s ontology: the GO term annotation of the RBP, including biological process, cellular component and molecular function.

    (6) Sequence motifs: enriched sequence motifs for the RBP binding region.

    (7) Structural preferences: enriched structural motifs for the RBP binding region.

     

    1.2.2 RBP binding sites

     

    Table view of the binding sites for the query RBP. It includes:

    (1) Target gene symbol

    (2) Target gene ID: ensemble ID and a link to ensemble database.

    (3) Target gene type

    (4) Target gene exp.level: a link to the bar chart of the gene expression values across different tissues or cell lines.

    (5) Binding site records: a link to the table of all binding sites of the RBP and the target genes. You can also see details about transcript, genomic context, tissue type and more information by clicking on the “Details” button.

     

    1.2.3 Enriched Gene Ontologies of the RBP targets

     

    Table view of the GO terms enriched in the query RBP’s set of target genes.

    (1) Genomic context: CDS, intron, 3’UTR and 5’UTR are provided separately.

    (2) Ontology: BP (biological process), CC (cellular component), MF (molecular function)

    (3) GO ID

    (4) GO term

    (5) P-value

     

    RNA module

    Here is an example showing how to search in RNA module and read the results.

    RNA: Binding sites

    Input:

     

    Firstly, you should select a species: human, mouse, fly, worm, Arabidopsis or yeast, and enter a gene name.

     

     

    You can also turn on the button “Advanced” to select specific CLIP-seq technology and peak calling method.

     

    Result:

    Section1. RNA: Bing sites

    This section provides RBP binding sites on the query gene with multiple visualization methods from different aspects.

    Gene summary

     

    This section provides information on the query gene. It includes:

    1 Gene Symbol

    2 Detail Annotation: ID and provide a link to related database.

    3 Gene Type

    4 Genome Location

    5 Associated Disease: associated disease of input genes collected from OMIM and DisGeNET.

    6 Associated Cancer: associated cancer of input genes collected from 60 publications.

    7 Targeted by Drugs: Target drugs collected from DGIdb.

    8 Expression pattern: the expression levels of input genes across multiple cell lines, tissue types, developmental stages or conditions are shown in a bar chart.

    RBP binding hotspots

     

    Number of binding proteins along each pre-mRNA, each pre-mRNA was divided into windows of 20nt.

    RBP binding sites

     

    Network view and table view of the binding sites for the query gene. It includes:

    1 Interaction network: the interactions between the query gene and multiple RBPs are visualized in a network.

    2 Visualize in browser (green button)

    You can select several RBPs among all bound RBPs in the pop-up window and enter submit.

    The RBP binding sites of selected RBPs are visualized simultaneously via the UCSC genome browser (red).

    3 Genomic context of RBP binding sites

    (1) RBP

    (2) RBP info: a link to RBP summary including basic information, sequence motifs and structural preferences.

    (3) Tissue type

    (4) Position: a link to the UCSC genome browser which will display any associated binding sites.

    (5) Strand

    (6) score(Piranha score: Peak heights from the CLIP-seq data. PARalyzer score: T-to-C transition ratios ranging from 0 to 1, ratios greater than 0.5 indicate protein-binding while less than 0.5 indicate without protein-binding. CIMS score: Mismatch peak heights from the HITS-CLIP data or truncated peak heights from the iCLIP data. eCLIP score: -log10 P-value)

    (7) PhastCons score: the mean conservation scores for the RBP binding sites using genome-wide phastCons intensities.

    (8) Phylop score: the mean conservation scores for the RBP binding sites using genome-wide phyloP intensities.

    (9) Data accession: data accession number and a link to the database.

    (10) Genomic context

    (11) Transcript ID

    Section2. RNA: Crosstalk

    This section provides the interaction of RBP binding sites and some post-transcriptional regulatory events such as miRNA targets, RNA modification and RNA editing.

    miRNA binding (exp.) within RBP binding sites

     

    This section provides interactions of RBP binding sites and miRNA targets identified experimentally by AGO CLIP-seq. AGO CLIP-seq data sets experimentally identified miRNA-target interactions in a genome-wide manner, then we used miRanda to predict targeting miRNAs for AGO protein binding sites.

    Here are brief explanations for each column of this table:

    1 RBP: name of RNA binding protein.

    2 RBP info: a link to RBP summary including basic information, sequence motifs and structural preferences.

    3 Tissue Type

    4 Position: RBP binding region on the RNA

    5 Strand

    6 CLIP-seq technology and peak calling method

    7 Score: RBP binding score calculated by the peak calling method.

    8 PhastCons score

    9 PhyloP score

    10 Data accession

    11 miRNA: the name of miRNA.

    12 miRNA binding sites: miRNA binding sites on the RNA.

    13 Binding energy: the unit of binding energy is kcal/mol.

    14 Confidence: larger score means stronger prediction confidence.

     

    miRNA binding (pred.) within RBP binding sites

     

    This section provides interactions of RBP binding sites and miRNA targets predicted by bioinformatics tools including miRanda, RNAhybrid, psRobot and psRNAtarget.

    Here are brief explanations for each column of this table:

    The explanations of column 1-10 are the same of above table.

    11 miRNA: the name of miRNA.

    12 miRNA binding sites: miRNA binding sites on the RNA.

    13 Prediction tool: miRanda and RNAhybrid for human, mouse, fly, worm; psRobot and psRNAtarget for Arabidopsis.

    14 Binding energy: the unit of binding energy is kcal/mol.

    15 Confidence: For miRanda, larger score means stronger prediction confidence. For RNAhybrid, the score is the P-value of prediction. For psRobot and psRNAtarget, smaller score means stronger prediction condifence.

     

    RNA modification within RBP binding sites

     

    This section provides RNA modification sites located in the RNA binding regions. Here are brief explanations for each column of this table:

    The explanations of column 1-10 are the same of above table.

    11 Modification: type of RNA modification, including m1A (N1-methyladenosine), m5C (5-methylcytosine), m6A (N6-methyladenosine), Nm (2’-O-methylation) and PseudoU (Pseudouridine).

    12 RNA modification sites: the coordinate of the RNA modification site.

    13 Support Number: the number of supporting experiments.

     

    RNA editing within RBP binding sites

     

    This section provides RNA modification sites located in the RNA binding regions. Here are brief explanations for each column of this table:

    The explanations of column 1-10 are the same of above table.

    11 RNA editing sites: RNA editing position.

    12 Tissue type: the tissue type or cell line of editing event.

    13 Level: The score from 0 to 1 to show the editing level.

     

    Section 3. RNA: Variation

    This module provides the SNPs located in the RNA binding region.

     

    Here are brief explanations for each column of this table:

    1 RBP: name of RNA binding protein.

    2 RBP info: a link to RBP summary including basic information, sequence motifs and structural preferences.

    3 Tissue Type

    4 Position: RBP binding region on the RNA

    5 Strand

    6 CLIP-seq technology and peak calling method

    7 Score: RBP binding score calculated by the peak calling method.

    8 PhastCons score

    9 PhyloP score

    10 Data accession

    11 SNP id

    12 Ref (+ strand): the reference genomic allele.

    13 Alt (+ strand): the corresponding alternative allele.

    14 SNP position: coordinate of the SNP site.

     

    Section 4. RNA: Disease

    This module provides disease-associated SNVs located in the RBP binding region, including GWAS SNPs, ClinVar SNPs, Cancer COSMIC SNVs, Cancer TCGA whole-exom SNVs within RBP binding sites.

    GWAS SNPs within RBP binding sites

     

    Here are brief explanations for each column of this table:

    1 RBP: name of RNA binding protein.

    2 RBP info: a link to RBP summary including basic information, sequence motifs and structural preferences.

    3 Tissue Type

    4 Position: RBP binding region on the RNA

    5 Strand

    6 CLIP-seq technology and peak calling method

    7 Score: RBP binding score calculated by the peak calling method.

    8 PhastCons score

    9 PhyloP score

    10 Data accession

    11 Disease

    12 Ref (+ strand): the reference genomic allele.

    13 Alt (+ strand): the corresponding alternative allele.

    14 SNP position

    15 P value: confidence of SNP-trait associations (p-value < 1E-03).

    16 Text remark: if the conditional analysis of a SNP is reported and its p-value < 1E-3, then this specific condition/method will be specified in this column. See details in GWASdb2 paper.

     

    ClinVar SNPs within RBP binding sites

     

    The explanations of column 1-10 are the same of above table. And the explanations of other columns are here:

    11 Disease

    12 Ref (+ strand): the reference genomic allele.

    13 Alt (+ strand): the corresponding alternative allele.

    14 SNP position: a link to the UCSC genome browser to visualize multiple RBP binding sites and their associated genomic variants.

     

    2.4.3 Cancer TCGA whole-genome SNVs within RBP binding sites

     

    The explanations of column 1-10 are the same of above table. And the explanations of other columns are here:

    11 Cancer type

    12 Patient ID.

    13 Ref (+ strand): the reference genomic allele.

    14 Alt (+ strand): the corresponding alternative allele.

    15 Mutation position: a link to the UCSC genome browser to visualize multiple RBP binding sites and their associated genomic variants.

     

    Cancer TCGA whole-exome SNVs within RBP binding sites

     

    The explanations of column 1-10 are the same of above table. And the explanations of other columns are here:

    11 Cancer type

    12 Patient ID.

    13 Ref (+ strand): the reference genomic allele.

    14 Alt (+ strand): the corresponding alternative allele.

    15 Mutation position: a link to the UCSC genome browser to visualize multiple RBP binding sites and their associated genomic variants.

     

    Cancer COSMIC SNVs within RBP binding sites

     

    The explanations of column 1-10 are the same of above table. And the explanations of other columns are here:

    11 Cancer type

    12 Patient ID.

    13 Ref (+ strand): the reference genomic allele.

    14 Alt (+ strand): the corresponding alternative allele.

    15 Mutation position: a link to the UCSC genome browser to visualize multiple RBP binding sites and their associated genomic variants.

     

    Translatome module

    Here is an example showing how to search the translation landscape of one gene in translatome module and read the results.

    Input:

     

    Firstly, you should select a species: human, mouse, fly, worm, Arabidopsis or yeast and enter a gene name(i.e., TP53). Then you should choose “Search a Gene or lncRNA”.

     

    Result:

    3.1 TP53 translatome summary

     

    This section is the general summary of the selected gene. It includes:

    (1) In the ORF annotation summary, all ORFs that are included within the gene. Based on given annotation, ORFs are further categorized into different groups. The number on the top of each bar represents the number of ORFs falling in the corresponding ORF category.

    The explanation of different ORF categories is illustrated as following:

     

    Annotated ORFs(aORFs): ORFs that are annotated by reference. aORFs are colored in green in the diagram.

    Extended ORFs/Truncated ORFs: ORFs that are of the same stop codon as aORFs but have different translation initiation sites.

    Upstream ORFs(uORFs)/downstream ORFs(dORFs): ORFs that are located upstream/downstream of one aORF.

    Internal overlapped ORFs: ORFs that are within different reading frame as the aORFs but have overlapping with the aORFs.

    Unannotated ORFs: ORFs without any annotation.

     

    (2) ORF density across samples shows the Ribo-seq density of each ORF across different samples.

    3.2 Annotated ORFs of TP53

     

    Table view of all aORFs for the query gene(i.e.,TP53). It includes:

    (1) ORFID: ORFID is named based on the composition of transcript, reading frame, translation start sites and stop sites.

    (2) Transcript

    (3) Category

    (4) Reading frame: Reading frame is counted based on the relative distance between transcript start sites(TSS).

    (5) Start position: start position is numbered based on the relative position of translation initiation sites along the transcript.

    (6) End position: end position refers to the last nucleotide of ORF along the transcript.

    (7) Length: ORF length

     

    3.3 Extended/Truncated ORFs of TP53

     

    3.4 Other ORFs of TP53

     

    3.5 Detail information for a specific ORF:

     

    For each ORF, a detail characterization is provided by clicking on the link of this ORF from which the server will automatically demonstrate the translation efficiency(TE), translation density and the potential of active translation of this ORF.

    (1) Translation efficiency(TE):

     

    This table summarizes TE calculated based on either original signal of Ribo-seq or denoised periodic signal of Ribo-seq across different samples. TE is defined as the ratio between Ribo-seq RPKM and RNA-seq RPKM(1).

     

    (2) Translation density:

     

    This table summarizes Ribo-seq reads density of selected ORF across different samples. Translation density is defined as the average reads intensity within the ORF.

     

    (3) Identify translated region:

     

    In this section, we used four published methods, i.e., RiboWave, RiboTaper(2), ORFscore(3) and RibORF(4). The number on the top of each bar represents the calculated output for each method among different samples. For RiboWave and RiboTaper, the output is presented in the format of -1*log(p value) in which p value < 0.05(or -1*log(p value) > 2.996) indicates the potential of active translation. Similarly, the output of ORFscore requires its value higher than 6.044 to indicate active translation and RibORF requires its score higher than 0.7.

     

    (4) Signal track demonstration:

     

    Finally, we also provide the option to output the signal track of Ribo-seq data for either original signal or the denoised periodic footprint. In this part, user can specify multiple datasets at the same time. The signal track is presented along the transcript with the studied ORF highlighted in green on the bottom. Ribo-seq signals are colored in blue.

     


    Tested web browsers

    Chrome Firefox Safari IE
    Windows Version 66 or above Versions released after Aug. 2012 (Version 15.0) NA IE 10 or above
    Mac OS Version 66 or above Versions released after Aug. 2012 (Version 15.0) Version 11.1 or above NA

    We adapted the Javascript library plotly.js, jQuery, D3, Highcharts, DataTables, Knockout, as well as the HTML framework Bootstrap. The genome browser is based on UCSC Genome Browser and JBrowse.