Pageviews

Wednesday, May 25, 2011

day 2: SNP#/locus now in database!

today has been another pretty good day! this morning, brad had left three comments on github and i was able to use and incorporate all three suggestions relatively easily. these included getting the path to the folder of loci from the command-line, stripping newlines from a list and getting the length of a sequence object by accessing the first element of the list (something i banged my head against for about an hour yesterday). 

after that, i switched my efforts to including a "SNP" category in the database. i found a python script online that tallies up variable sites from an aligned fasta file (seqlite.py, written by steve haddock); i did some modifications to the script and import it as a function into my inputLocusMDB.py script. shockingly enough, it worked! so now each locus has its own document in a mongoDB collection and each document contains the following fields: _id, locus, length, SNPs, path, individuals[]. script output (to the screen) looks like this:

-----------------------------------------------------------------

Got this folder: fasta
locus =  RAILmatic_1_aln ; number alleles =  36 ; length =  301 ; path =  fasta/RAILmatic_1_aln.fasta ; SNPs =  7
locus =  RAILmatic_2_aln ; number alleles =  34 ; length =  287 ; path =  fasta/RAILmatic_2_aln.fasta ; SNPs =  10
locus =  RAILmatic_3_aln ; number alleles =  34 ; length =  322 ; path =  fasta/RAILmatic_3_aln.fasta ; SNPs =  12
locus =  RAILmatic_4_aln ; number alleles =  24 ; length =  330 ; path =  fasta/RAILmatic_4_aln.fasta ; SNPs =  3
locus =  RAILmatic_5_aln ; number alleles =  40 ; length =  276 ; path =  fasta/RAILmatic_5_aln.fasta ; SNPs =  5
locus =  RAILmatic_6_aln ; number alleles =  40 ; length =  323 ; path =  fasta/RAILmatic_6_aln.fasta ; SNPs =  11
locus =  RAILmatic_7_aln ; number alleles =  30 ; length =  365 ; path =  fasta/RAILmatic_7_aln.fasta ; SNPs =  12
locus =  RAILmatic_8_aln ; number alleles =  14 ; length =  317 ; path =  fasta/RAILmatic_8_aln.fasta ; SNPs =  2

total of  8 loci

...reading the names file into a list...

name:  R01 found in:  8 loci
name:  R02 found in:  5 loci
name:  R03 found in:  5 loci
name:  R04 found in:  6 loci
name:  R05 found in:  7 loci
name:  R06 found in:  7 loci
name:  R07 found in:  8 loci
name:  R08 found in:  8 loci
name:  R09 found in:  4 loci
name:  R10 found in:  4 loci
name:  R11 found in:  7 loci
name:  R12 found in:  7 loci
name:  R13 found in:  7 loci
name:  R14 found in:  8 loci
name:  R15 found in:  6 loci
name:  R16 found in:  4 loci
name:  R17 found in:  6 loci
name:  R18 found in:  7 loci
name:  R19 found in:  7 loci
name:  R20 found in:  5 loci

-----------------------------------------------------------------

i had some trouble yesterday figuring out how to commit my code to github, so i think making sure i have a functioning code-committing protocol is next on my to do list. i'm also going to think hard about whether the database is set-up in the most functional way and how the metadata from the SAM/BAM files will be incorporated once i start on that.

No comments:

Post a Comment