Quantification of reads
Both STAR and FeatureCounts can be used to quantify and obtain gene-level quantification of expression. We will use FeatureCounts here. Before carrying out quantification, it’s important to determine the strandedness of your samples. Reach out to your sequencing facility to determine the strandedness of your samples.
For workflow users -
Expand the featureCounts part in the workflow
The default value under “Specify strand information” is “Unstranded” and can be changed by using the edit button next to “Specify strand information”
A Gene annotation file is required and can be uploaded by using the “Upload Data” button on the left upper-hand side of the page
The file should appear in the history and can then be selected from the dropdown menu of gene annotation file
The output format default is “Gene-ID “t” read-count (MultiQC/DESeq2/edgeR/limma-voom compatible)” and can be edited using the edit button
We would also need a gene length file for downstream analysis which is selected as true
For users running each step -
On the left-hand side of the homepage, search for “featureCounts” and open up the tool
Select the output of RNA STAR (RNA STAR on collection N: mapped.bam) from the dropdown list under “Alignment file”
Depending on whether your samples were stranded or unstranded, select it from the dropdown menu
Upload a gene annotation file for the genome you are working with using the “Upload Data” button on the left upper-hand side of the page
The file should appear in the history and can then be selected from the dropdown menu of gene annotation file
Under “Output format”, specify the format as “Gene-ID “t” read-count (MultiQC/DESeq2/edgeR/limma-voom compatible)”
Select “Yes” under “Create gene-length file”
Under “Options for paired-end reads”, select “Enabled; fragments (or templates) will be counted instead of reads” under “Count fragments instead of reads” (This is the format you need for downstream analysis)
Set “Minimum mapping quality per read” as 10 under “Read filtering options”
Under “Advanced options”, select “exon” under “GFF feature type filter”
Set “gene_id” as the “GFF gene identifier”
Select “Disabled; reads that align to multiple features or overlapping features are excluded” under “Allow reads to map to multiple features”
Click “Run” to run the tool
MultiQC can be run on featureCounts output and is optional. In order to run MultiQC -
Under “Results” > “Insert Results” > select “featureCounts” for “Which tool was used to generate logs?”
Select “featureCounts on collection N: Summary” (the output of featureCounts) under “Output of FeatureCounts”
Note
After running the featureCounts tool, you can look at the output generated by featureCounts to see whether it was successful as shown below
The successful run of featureCounts can be assessed within Galaxy by looking for the information button of your job
The output of MultiQC on featureCounts results should contain a webpage that can be accessed from the history and downloaded to be viewed. At this stage, you are done with the primary and secondary analysis workflow and are ready with the expression table with which you can work with the tertiary analysis workflow.