Darmor Tapidor

From Applied Bioinformatics Group
Jump to: navigation, search

This page collects the files for Bayer et al. B. napus Darmor/Tapidor genome paper


Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions Collinearity_scripts.zip

LASTZSorter.py - sorts contigs based on LASTZ alignment with reference LASTZSorter.py

contigPlacer - places contigs based on recombination patterns contigPlacer

R-scripts used for plotting - Venn-diagrams, boxplots R_plotting_scripts.zip

The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS


Tapidor genetic map from MSTMap (txt) - MSTMap_Input.zip Input file for MSTMap

Darmor SNPs anchored on Darmor v8.1 reference (gff3)

Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)

Repetitive_Collapsed_Genes.zip List of genes in repetitive and collapsed regions

Repetitive_Collapsed_Regions.zip Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)

SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip Pfam and Swissprot results for repetitive and collapsed genes


List of Transposase related PFAM IDs used for filtering

GO_Arabidopsis_Terms.zip Swiss-Prot/Arabidopsis based GO terms for Darmor and Tapidor annotation

Tapidor v6.3

Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules

Tapidor_v6.3_contig_order.zip - contig positions in assembly as gff3

Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - predicted proteins

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - predicted transcripts

Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins

Darmor v8.1

WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.

Darmor_v81_assembly_fasta.gz - assembly as pseudo-molecules

Darmor_v8.1_contig_order.zip - order of contigs as gff3 files

Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - proteins, fasta

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - transcripts, fasta

Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins