Tuesday, June 7, 2011

brainstorm: counts

this morning i had a couple of brainstorms involving storing the total number of (high quality) reads an individual has (i.e., present in the sam/bam file), storing the number of loci an individual has (i.e. how many locus files each individual is present in) and storing the bam directory path so that the user won't have to enter it more than once. so i've updated, and, and i'll commit them later today. (i also incorporated suggestions jeremy made and am in the process of adding more comments throughout the scripts.)

db.demographic has as many documents as there are individuals in the dataset and they now all look something like this:

{  "_id" : ObjectId("4dee6475318a120c74000000"),
 "Individual" : "J01",
 "Latitude" : "45.678",
 "Location" : "NoPlace, TX",
 "Longitude" : "-109.876",
 "Population" : "POP1",
 "Species" : "Junco hyemalis",
 "numLoci" : 148,
 "totalReads" : "6281" }

db.loci has as many documents as there are loci (plus one for the bamPath document, which contains just the path to the bam directory). they now all look something like this:

"_id" : ObjectId("4dee6482318a120c78000295"),
  "SNPs" : 10,
  "indInFasta" : [ "J01",
   "J19" ],
 "individuals" : { "J01" : 3,
  "J02" : 2,
  "J03" : 1,
  "J04" : 2,
  "J05" : 2,
  "J06" : 1,
  "J11" : 1,
  "J12" : 2,
  "J13" : 2,
  "J14" : 2,
  "J15" : 1,
  "J17" : 3,
  "J19" : 6 },
 "length" : 266,
 "locusFasta" : "JUNCOmatic_719_aln.fasta",
 "locusNumber" : "719",
 "path" : "/Users/shird/Documents/Dropbox/lociNGS1/juncoLoci/JUNCOmatic_719_aln.fasta" }

i'm quite satisfied with the state of the database right now, but that might be because i haven't done much with it. i've gotten reasonably good at adding/updating things, but haven't really retrieved/displayed much. i am a little scared about how well i'll be able to incorporate database info into the GUI and what's going to happen when i apply these scripts to datasets i haven't personally created. that's next, i suppose. ONWARD!

1 comment:

  1. Sarah;
    This is all looking great. One suggestion on the retrieval connundrum -- you might want to play with some scripts to pull out the information and just dump it to a terminal in a useful fashion that resembles what you plan to do in a more graphical/friendly interface. That could help you determine if your data model is as expected and works well for the types of queries you'll be trying to do. Great work so far,