today has been another pretty good day! this morning, brad had left three comments on github and i was able to use and incorporate all three suggestions relatively easily. these included getting the path to the folder of loci from the command-line, stripping newlines from a list and getting the length of a sequence object by accessing the first element of the list (something i banged my head against for about an hour yesterday).
after that, i switched my efforts to including a "SNP" category in the database. i found a python script online that tallies up variable sites from an aligned fasta file (seqlite.py, written by steve haddock); i did some modifications to the script and import it as a function into my inputLocusMDB.py script. shockingly enough, it worked! so now each locus has its own document in a mongoDB collection and each document contains the following fields: _id, locus, length, SNPs, path, individuals[]. script output (to the screen) looks like this:
-----------------------------------------------------------------
Got this folder: fasta
locus = RAILmatic_1_aln ; number alleles = 36 ; length = 301 ; path = fasta/RAILmatic_1_aln.fasta ; SNPs = 7
locus = RAILmatic_2_aln ; number alleles = 34 ; length = 287 ; path = fasta/RAILmatic_2_aln.fasta ; SNPs = 10
locus = RAILmatic_3_aln ; number alleles = 34 ; length = 322 ; path = fasta/RAILmatic_3_aln.fasta ; SNPs = 12
locus = RAILmatic_4_aln ; number alleles = 24 ; length = 330 ; path = fasta/RAILmatic_4_aln.fasta ; SNPs = 3
locus = RAILmatic_5_aln ; number alleles = 40 ; length = 276 ; path = fasta/RAILmatic_5_aln.fasta ; SNPs = 5
locus = RAILmatic_6_aln ; number alleles = 40 ; length = 323 ; path = fasta/RAILmatic_6_aln.fasta ; SNPs = 11
locus = RAILmatic_7_aln ; number alleles = 30 ; length = 365 ; path = fasta/RAILmatic_7_aln.fasta ; SNPs = 12
locus = RAILmatic_8_aln ; number alleles = 14 ; length = 317 ; path = fasta/RAILmatic_8_aln.fasta ; SNPs = 2
total of 8 loci
...reading the names file into a list...
name: R01 found in: 8 loci
name: R02 found in: 5 loci
name: R03 found in: 5 loci
name: R04 found in: 6 loci
name: R05 found in: 7 loci
name: R06 found in: 7 loci
name: R07 found in: 8 loci
name: R08 found in: 8 loci
name: R09 found in: 4 loci
name: R10 found in: 4 loci
name: R11 found in: 7 loci
name: R12 found in: 7 loci
name: R13 found in: 7 loci
name: R14 found in: 8 loci
name: R15 found in: 6 loci
name: R16 found in: 4 loci
name: R17 found in: 6 loci
name: R18 found in: 7 loci
name: R19 found in: 7 loci
name: R20 found in: 5 loci
-----------------------------------------------------------------
i had some trouble yesterday figuring out how to commit my code to github, so i think making sure i have a functioning code-committing protocol is next on my to do list. i'm also going to think hard about whether the database is set-up in the most functional way and how the metadata from the SAM/BAM files will be incorporated once i start on that.
No comments:
Post a Comment