Pageviews

Tuesday, May 24, 2011

knock on wood

coding has been going relatively well this morning/afternoon (knock on wood). starting with a folder of multi-fasta locus files, i've been able to read each of the files into a SeqIO.index and store the name of the locus and an array of the individuals represented in the locus to a MongoDB collection. i've also been able to output how many of the loci each individual has and the length of each locus. for my first python script ever, i feel not too bad about it. here's the screen output when i run the script on a folder with 5 loci and ≤ 20 individuals (R01..R20, up to 40 alleles since these guys are diploid):

-----------------------------------------------------------------
locus =  RAILmatic_1 ; number alleles =  34 length =  301
locus =  RAILmatic_2 ; number alleles =  32 length =  287
locus =  RAILmatic_3 ; number alleles =  30 length =  322
locus =  RAILmatic_4 ; number alleles =  12 length =  330
locus =  RAILmatic_5 ; number alleles =  40 length =  274
total of  5 loci

...reading the names file into a list...

name:  R01 found in:  5 loci
name:  R02 found in:  3 loci
name:  R03 found in:  3 loci
name:  R04 found in:  4 loci
name:  R05 found in:  4 loci
name:  R06 found in:  5 loci
name:  R07 found in:  5 loci
name:  R08 found in:  4 loci
name:  R09 found in:  2 loci
name:  R10 found in:  2 loci
name:  R11 found in:  3 loci
name:  R12 found in:  5 loci
name:  R13 found in:  4 loci
name:  R14 found in:  5 loci
name:  R15 found in:  4 loci
name:  R16 found in:  2 loci
name:  R17 found in:  4 loci
name:  R18 found in:  4 loci
name:  R19 found in:  4 loci
name:  R20 found in:  2 loci

-----------------------------------------------------------------
the MongoDB collection has a document like this for each locus:

{
"_id" : ObjectId("4ddc0f9a10d1b1a91c000003"),
"length" : 330,
"individuals" : [
"R07.02",
"R07.01",
"R14.02",
"R14.01",
"R12.01",
"R06.01",
"R06.02",
"R12.02",
"R01.01",
"R01.02",
"R19.01",
"R19.02"
],
"locus" : "RAILmatic_4.fasta"
}

-----------------------------------------------------------------
(the .01 and .02 refer to the 2 alleles each individual has)

next tasks = commit code to GitHub (not that it's particularly special, but want to get into the habit); work on updating documents in MongoDB; import SAM files and save metadata in appropriate MongoDB document

1 comment:

  1. Sarah -- great stuff. Really happy you are getting into this so quickly. I wrote a few notes with practical things on your GitHub commit:

    Commit notes

    ReplyDelete