Difference between revisions of "SGSSynteny"

From Applied Bioinformatics Group
Jump to: navigation, search
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
* Latest Version 0.1 (29/04/2014):
 
* Latest Version 0.1 (29/04/2014):
 
** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain
 
** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain
*** four main programs: SGSSynteny.v0.1.jar, graph_synteny.R
+
*** two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
 
*** readme file  
 
*** readme file  
 
*** folder with source code
 
*** folder with source code
Line 32: Line 32:
 
* Output files
 
* Output files
 
** Result files for each chromosome separately - .cluster files
 
** Result files for each chromosome separately - .cluster files
** File with overall stats - stats.csv
+
** File with overall stats - stats.txt
  
 
== Command line options for SGSSynteny.jar==
 
== Command line options for SGSSynteny.jar==
Line 77: Line 77:
 
All the output files are comma separated text files.
 
All the output files are comma separated text files.
 
*.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
 
*.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
*stats.csv - file with summary information about all genes
+
*stats.txt - file with summary information about all genes
  
 
==Plotting results==
 
==Plotting results==
  
 
Results are visualized using R script.
 
Results are visualized using R script.
 
Two ways of visualization are possible:
 
*results per chromosome
 
  
 
'''Results per chromosome:'''
 
'''Results per chromosome:'''
Line 91: Line 88:
 
*script graph_synteny.R
 
*script graph_synteny.R
 
*.clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
 
*.clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
*directory (location) where files with results from SGSGeneLoss.jar: Chr1.clusters, Chr2.clusters etc. can be found
+
*directory (location) where files with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc. can be found
  
 
graph_synteny.R takes three arguments in this order:
 
graph_synteny.R takes three arguments in this order:
Line 101: Line 98:
 
3. output path '''ending with /'''  
 
3. output path '''ending with /'''  
 
    
 
    
  Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs
+
  Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/
  
 
== FAQ ==
 
== FAQ ==

Latest revision as of 04:20, 18 August 2014

What does SGSSynteny depend on?

SGSGeneLoss depends on the following:

Download

  • Latest Version 0.1 (29/04/2014):
    • SGSSynteny.v0.1.tar.gz should contain
      • two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
      • readme file
      • folder with source code

From now on in this manula SGSSynteny.v0.1.jar and graph_synteny.v0.1.R are referred to as SGSSynteny.jar and graph_synteny.R

To run the programs you have to use full names SGSSynteny.v0.1.jar and graph_synteny.v0.1.R

How to install?

  • SGSSynteny.tar.gz
  • Unpack SGSSynteny.tar.gz and place SGSSynteny.jar and all the R scripts in chosen directory/directories, for example ./my_synteny
  • Move into ./my_synteny and create SGSSynteny_lib directory (on linux: cd ./my_synteny, mkdir SGSSynteny_lib directory)
    • The name of the lib directory is the name of the .jar file witout .jar extension + _lib, so if you are using SGSSynteny.v0.1.jar the lib directory is SGSSynteny.v0.1_lib
    • The lib directory has to be in the same folder as the .jar file
  • Download picard-tools (SGSSynteny.jar was tested with picard-tools 1.89)
  • Place picard-1.89.jar and sam-1.89.jar in ./my_gene_loss/SGSSynteny_lib
  • Now you are ready to run SGSSynteny

Input and output files for SGSSynteny.v0.1.jar

  • Input files:
    • Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
    • Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
  • Output files
    • Result files for each chromosome separately - .cluster files
    • File with overall stats - stats.txt

Command line options for SGSSynteny.jar

Required:

bamPath - path to bam file, only folder path, do not specify bam file names here, folder has to contain both .bam and .bai files; has to end with “/” or “\”

bamFileList - comma separated list of all the bam files to be used

gffFile - path to .gff3 file, including file name; has to contain at least genes and exons features

outDirPath - directory for the output files; has to end with “/” or “\”

Optional:

expectCov - expected coverage [null]

minFracHor - minimum horizontal coverage required to consider genes as syntenic [0.3]

minCovVer - minimum coverage depth required to consider genes as syntenic [2.0]

chromosomeList - comma separated list of chromosomes, used `all` for all the chromosomes in .bam file [all]

DBepsilon - Eps value for DBSCAN (radius) [26]

DBmin - minPts value for DBSCAN (min cluster size) [24]

genesOrExons - used whole genes or exons for coverage calculations [exons]

mergeDistance - distance (no of genes) separating clusters for them to be merged [30]

esimateMinCovVer - estimate min coverage depth used for clustering based on x points with highest coverage depth, esimateMinCovVer=0.45 – use 45% of points with highest coverage [null]

To see help run: java -jar SGSSynteny.jar help

Sample command

  • Please make sure that all your supplied paths end with / or \
java -Xmx16g -jar SGSSynteny.jar bamPath=/home/my_bams/ gffFile=/home/references/Bdistachyon_192_gene_exons.gff3 outDirPath=/home/results/ chromosomeList=Bd1,Bd2,Bd3,Bd4,Bd5  bamFileList=my_bam.sorted.bam  DBepsilon=30 DBmin=25 expectCov=500 minCovVer=2.0 minFracHor=0.4

Output files format

All the output files are comma separated text files.

  • .cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
  • stats.txt - file with summary information about all genes

Plotting results

Results are visualized using R script.

Results per chromosome:

What you need:

  • script graph_synteny.R
  • .clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
  • directory (location) where files with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc. can be found

graph_synteny.R takes three arguments in this order:

1. location of directory where .clusters file are located

2. lower limit of the Y axis

3. output path ending with /

Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/

FAQ

  • If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files


Back to Main_Page