The code and data provided on this page are described in the following papers:
Description | File(s) | Format | Comments |
---|---|---|---|
Code | |||
Archive | fpsac_1.0.tar.gz | Tarball (tar.gz) | Python and shell scripts |
Input Data | |||
Contigs | black_death_contigs_individual_8291.fa.gz | FASTA | The length of contigs is 20bp longer than the number given in the contig id. |
Extant genomes | black_death_extant_genomes.fa.gz | FASTA | Names have been modified to avoid characters '.', ':' and '-'. Only chromosome sequences (no plasmid) are considered. |
Species tree | black_death_species_tree.nhx | NHX augmented: ancestral node of interest marked by @ |
Outgroups are not resolved (non-binary root) |
Intermediate results | |||
Megablast output | black_death_megablast_hits.txt | BLAST hit table | Obtained using NCBI Megablast with default parameters. |
Homologous markers families | black_death_homologous_markers_families.txt | family_header = family_id family_multiplicity
extant_occurrence_1 = extant_genome.Chr:start-end orientation(+/-) contig_id_and_length:start-end(,contig_id_and_length:start-end...) ... extant_occurrence_k = extant_genome.Chr:start-end orientation(+/-) contig_id_and_length:start-end(,contig_id_and_length:start-end...) |
Family 13 was excluded from further analysis. |
Adjacencies and repeat_spanning_intervals | black_death_adjacencies.txt black_death_repeat_spanning_intervals.txt | ANGES format augmented to include gaps coordinates
each adjacency or common interval (character) is represented by a single row in the file: character_id|phylogenetic_weight;list_of_species_containing_character:list_of_markers_in_character list_of-gaps_coordinates |
Markers were doubled to account for orientation: family X induced two families, with respective ids 2X (for the head of the markers) and 2X-1 (tails of the markers) Adjacencies (2X,2X-1) are weighted by 10000 to ensure markers are properly reconstructed. See there for further details. |
Selected subset of adjacencies of maximum weight and compatible with a circular structure | Kept adjacencies: black_death_kept_adjacencies.txt
Discarded adjacencies: black_death_discarded_adjacencies.txt |
ANGES format augmented to include gaps coordinates as above. |
Algorithm: Linearization of ancestral multichromosomal genomes.
Markers are still doubled, as above. |
Selected subset of repeat spanning intervals of maximum weight and compatible with the selected adjacencies | Kept intervals: black_death_kept_repeat_spanning_intervals.txt
Discarded intervals: black_death_discarded_repeat_spanning_intervals.txt |
ANGES format augmented to include gaps coordinates as above. |
Markers are still doubled, as above. |
Markers circular order (without outgroup adjacencies) | black_death_markers_order.txt | Undoubled markers. | |
Outgroup supported adjacencies | black_death_outgroup_adjacencies.txt | ||
Ancestral gaps | black_death_gaps.txt | Undoubled markers. | |
Extant gaps alignments | black_death_gaps_alignments.tar.gz | Default Muscle output: FASTA (see Muscle manual) | Computed using Muscle 3.8.31 |
Final results | |||
Ancestral genome: DNA sequence | black_death_DNA_sequence.fa.gz | Gaps for outgroup adjacencies are replaced by a sequence of 50 Ns. | |
Ancestral genome: sequence map with extant annotations | black_death_ancestral_sequence_map | ||
Annotation | black_death_ancestral_sequence_annotation_Basys.gbk | Obtained with Basys. |