2014-06-16T01:30:49Z

Agnieszka: Created page with "== What does SGSGeneLoss depend on? == SGSGeneLoss depends on the following: * [http://www.java.com/en/ Java 1.6] or higher * [http://www.r-project.org/ R/3.1.0] * [http://source..."

== What does SGSGeneLoss depend on? ==
SGSGeneLoss depends on the following:
* [http://www.java.com/en/ Java 1.6] or higher
* [http://www.r-project.org/ R/3.1.0]
* [http://sourceforge.net/projects/picard/files/picard-tools/ picard-tools]
* [http://ggplot2.org/ ggplot2]
* [http://www.bioconductor.org/packages/release/bioc/html/ggbio.html ggbio]

== Download ==
* Latest Version 0.1 (29/04/2014):
** [http://appliedbioinformatics.com.au/download/SGSGeneLoss.tar.gz SGSGeneLoss.tar.gz] should contain
*** four main programs: SGSGeneLoss.jar, graph_chromosomes.R, graph_main.R, graph_circles.R
*** readme file
*** sample_results folder with results for sample data]

== How to install? ==
* SGSGeneLoss.tar.gz
* Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss
* Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory)
* Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89)
* Place picard-1.89.jar and sam-1.89.jar in ./my_gene_loss/SGSGeneLoss_lib
* Now you are ready to run SGSGeneLoss

== Input and output files for SGSGeneLoss.jar ==
* Input files:
** Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
** Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
* Output files
** Result files for each chromosome separately
** File with overall stats - stats.txt
** File with summary for all the chromosomes used - chrs.txt (this file is used by one of the R scripts)
** File with list of genes lost for all the chromosomes - graph.txt (this file is used by one of the R scripts)

== Command line options for SGSGeneLoss.jar==

Required:

bamPath - path to your bam file/files, has to end with / or \ bamPath=/home/my_bams/

bamFileList - a single .bam file or a comma separated list, only file names, bam and corresponding .bai files have to be in a directory provided in bamPath bamFileList=bam1.bam,bam2.bam

gffFile - location of gff3 file gffFile=/home/my_gffs/annot.gff3

outDirPath - location output directory, has to end with / or \ outDirPath=/home/my_results

Optional:

minCov - minimal coverage threshold to consider position covered [minCov=1]

chromosomeList - comma separated list of chromosomes to be used for analysis, use all, for all chromosomes [chromosomeList=all]

lostCutoff - coverage cutoff to consider gene as lost for calculating stats [lostCutoff=0.0]

covCats - coverage categories for visualization [cavCats=0,10,20,30,40,70]

extendedFmt - used extended format, additional info included in output files [regular format]

To see help run: java -jar SGSGeneLoss.jar help

== Sample command ==
* Move into directory where SGSGeneLoss.jar is
* Please make sure that all your supplied paths end with / or \

java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/
chromosomeList=all

java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam, arabidopsis2.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3
outDirPath=/home/uqagnieszka/results/ chromosomeList=Chr1,Chr2 minCov=2 lostCutoff=0.05 covCats=0,2,5,10,20 extendedFmt

== Output files format ==

All the output files are comma separated text files.
*.excov files - files with results for each chromosome (files use chromosome names as in .bam files), files come in two formats basic (default) or extended (extendedFmt)
**basic format: chromosome,ID,is_lost,start_position,end_postion,frac_exons_covered,frac_gene_covered,ave_cov_depth_exons,cov_cat,ave_cove_depth_gene
**extended format: contains additional columns with information about each of the exons
*stats.txt - file with summary information about all genes
*chrs.txt - file with summary information about chromosomes
**chr,start,end,len
*graph.txt - file with list of genes lost as determined by lostCutoff
**chr,id,start,end

==Plotting results==

Results are visualized using R scripts.

Two ways of visualization are possible:
*results per chromosome
*results for all chromosomes as a circular graph

'''Results per chromosome:'''

What you need:
*scripts graph_chromosomes.R, graph_main.R in the same directory
*.excov files (either basic or extended) with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc.
*directory (location) where files with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc. can be found
*file listing all the result files for which you want graphs drawn, one per line - for example graph_list.txt file which looks like this:
Chr1.excov
Chr2.excov
Chr3.excov

graph_chromosomes.R takes three arguments in this order:

1. location of directory where .excov file are located

2. file listing all the result files for which you want graphs drawn

3. gene loss cutoff

Rscript --vanilla graph_chromosomes.R /home/uqagnieszka/results /home/uqagnieszka/results/graph_list.txt 0.0

'''Summary results for all chromosomes, possibly multiple samples:'''

What you need:
*script graph_circles.R
*graph.txt from SGSGeneLoss.jar run
*chrs.txt from SGSGeneLoss.jar run
*file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers
chrs,no
chr1,1
chr2,2
chr10,10

graph_circles.R takes four arguments in this order:

1. file with chromosome info - chrs.txt from SGSGeneLoss.jar run

2. file with chromosome order

3. file with genes lost - graph.txt from SGSGeneLoss.jar run; it can be a comma separated list of multiple files (for example multiple samples). Circles will be drawn in the following order:

first file in the list is the innermost circle, so if you have graph1.txt,graph2.txt,graph3.txt, order of circles will reflect order of files, starting from the inside

4. output file

Rscript --vanilla graph_circles.R chrs.txt chrs_order.txt graph1.txt,graph2.txt,graph3.txt out.png

== FAQ ==
* If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files

Back to [[Main_Page]]