Pageviews

Monday, May 30, 2011

bam read counts

i've been thinking about organization and the database a lot and i think i've come up with a working plan for not only getting the data i want from the sam/bam files, but putting it in the right place for future access. i've also been reading/researching how to query and update mongoDB and utilize pysam and pymongo, so that implementation of this plan is as smooth as possible.


PLAN OF ATTACK:


  • open BAMFILE (one per individual)
    • query mongoDB for all documents that contain BAMFILE in "individuals" value array
    • foreach mongoDB document returned
      • query BAMFILE for locus (samfile.fetch)
      • count reads that align to locus 
      • push count to Individual in locus in mongoDB


i.e., a document that looks like this:
{
"locus" : "loc010",
"individuals" : [
"R01" 
"R02"
"R03"
   ],
"length" : 332
}

will be:
{
"locus" : "loc010",
"individuals" : [
"R01" : 3 ,
"R02" : 15 ,
"R03" : 6
   ],
"length" : 332
}

thus, each element of the individuals array is now a key/value pair. (right?) now i just need to figure out how to do each step. it's going to be a fun week!


No comments:

Post a Comment