GO pathway analysis

At this stage, we have extracted genes that are significantly differentially expressed and visualized them. We are now ready to carry out Gene Ontology (GO) analysis using goseq.

For users that are running the workflow, these are the tools that would carry out a Gene Ontology analysis -

  • Prepare the first dataset for goseq

    1. Compute

    2. Cut

    3. Change case

  • Prepare the gene length file

    1. Extract dataset

    2. Change case

    3. goseq

Under goseq, make sure that the correct gene ID format is selected under “Select Gene ID format”. None of the other tools need to be tweaked

You can read further down this page if you would like to know how the other tools work and what they do (This will also show you the default parameters), or navigate to the next page which goes through the output generated by goseq tool

For users who are running each step, let’s start with the steps -

Preparing the first dataset with differentially expressed genes

  • The first tool to be used is “Compute”

  • Under “Add expression”, enter “bool(c7<0.05)”

  • Select the output of DESeq2 - the DESeq2 result file under “as a new column to”

  • Click on “Execute”

“Cut” tool is the second tool that will be used -

  • Under “Cut columns”, enter “c1,c8” (We want the gene name and true/false column)

  • Select “tab” under “Delimited by”

  • “From” the output of the “Compute” tool

  • Click on “Execute”

The last tool in this step is “Change Case”

  • Select the output of the “Cut” tool under “From”

  • Under “Change case of columns”, enter c1

  • Use “tab” as the delimiter

  • Enter “Upper case” under “To”

  • Click on “Execute”

  • Rename the output to “Gene IDs and differential expression”

Preparing second dataset with the gene lengths -

  • The first step under this is searching for the feature length collection generated by featureCounts from previous history and if run within this history, should be easily accessible

  • The next tool that is used is “Extract Dataset”

  • Select “featureCounts on collection N: Feature lengths” under “Input List”

  • Under “How should a dataset be selected?”, enter “The first dataset”

  • Click on “Execute”

The last tool that will be used is “Change Case”

  • Enter the output of “Extract Dataset” under “From”

  • Under “Change case of columns”, enter “c1”

  • Use “tab” as the delimiter

  • Enter “Upper case” under “To”

  • Click on “Execute”

  • Rename the output to “Gene IDs and length”

The last step in GO analysis is using the tool “goseq” to perfrom GO analysis

  • Search for the tool under “Tools”

  • Select the file “Gene IDs and differential expression” under “Differentially expressed genes file”

  • Under “Gene lengths file”, select “Gene IDs and length”

  • Under “Gene categories”, select “Get categories”

  • In this category, select the genome to use under “Select a genome to use”

  • Since Ensembl gene IDs were used, select “Ensembl Gene ID” under “Select Gene ID format”. Please note: if your GTF file has a different Gene ID format, be sure to select that from the dropdown menu

  • Select one or more categories - depending on which category you would want, select “GO: Cellular Component, GO: Biological Process, GO: Molecular Function”

  • In “Output Options”, select “Yes” to “Output Top GO terms plot”

  • Select “Yes” to “Extract the DE genes for the categories (GO/KEGG terms)?”

  • Click on “Execute”

The next page shows the output of goseq.