i've been thinking about organization and the database a lot and i think i've come up with a working plan for not only getting the data i want from the sam/bam files, but putting it in the right place for future access. i've also been reading/researching how to query and update mongoDB and utilize pysam and pymongo, so that implementation of this plan is as smooth as possible.
PLAN OF ATTACK:
i.e., a document that looks like this:
{
"locus" : "loc010",
"individuals" : [
"R01"
"R02"
"R03"
],
"length" : 332
}
will be:
{
"locus" : "loc010",
"individuals" : [
"R01" : 3 ,
"R02" : 15 ,
"R03" : 6
],
"length" : 332
}
thus, each element of the individuals array is now a key/value pair. (right?) now i just need to figure out how to do each step. it's going to be a fun week!
PLAN OF ATTACK:
- open BAMFILE (one per individual)
- query mongoDB for all documents that contain BAMFILE in "individuals" value array
- foreach mongoDB document returned
- query BAMFILE for locus (samfile.fetch)
- count reads that align to locus
- push count to Individual in locus in mongoDB
i.e., a document that looks like this:
{
"locus" : "loc010",
"individuals" : [
"R01"
"R02"
"R03"
],
"length" : 332
}
will be:
{
"locus" : "loc010",
"individuals" : [
"R01" : 3 ,
"R02" : 15 ,
"R03" : 6
],
"length" : 332
}
thus, each element of the individuals array is now a key/value pair. (right?) now i just need to figure out how to do each step. it's going to be a fun week!