regarding brad's advice that i should try to get desired output onto the terminal screen: i have made some progress and indeed feel better about the GUI-filled future. (cue homer simpson: "mmmm. gooey-filled future".)
outputting the demographic data in the db.demographic collection was easy. this data is strictly {key:value} pairs, so nothing fancy. with a new script, displayMDBdata.py, i can get out all the data that i put in and in the original format (which sounds slightly circular but is good). the beautiful script:
outputting the demographic data in the db.demographic collection was easy. this data is strictly {key:value} pairs, so nothing fancy. with a new script, displayMDBdata.py, i can get out all the data that i put in and in the original format (which sounds slightly circular but is good). the beautiful script:
the beautiful output:
Setup cursor: <pymongo.cursor.Cursor object at 0x10155c250>
Individual: J01 , Location: NoPlace, TX , Total Number of Reads: 6281 , Number of Loci: 148
Individual: J02 , Location: NoPlace, TX , Total Number of Reads: 5570 , Number of Loci: 126
...etc...
outputting the locus information has revealed an intellectual hurdle that i failed to jump over (a la america's funniest home videos), regarding cursors. i tried the formula that worked for outputting the demographic data: set up a cursor and print the value for each key. but since the list of individuals within each fasta locus is stored as an array and the counts associated with each individual within a locus is stored as an embedded document, the not as beautiful output looks like this:
Locus: JUNCOmatic_100_aln.fasta , Individuals Present: [u'J09', u'J08', u'J10', u'J18', u'J19', u'J01', u'J17', u'J03', u'J06', u'J07', u'J11'] , Individuals with Read Counts: {u'J15': 1, u'J12': 2, u'J20': 2, u'J14': 5, u'J13': 2, u'J09': 10, u'J08': 4, u'J16': 1, u'J10': 8, u'J18': 9, u'J19': 4, u'J01': 4, u'J17': 3, u'J03': 5, u'J11': 3, u'J05': 1, u'J04': 2, u'J07': 4, u'J06': 8}
i don't know what i was expecting but that isn't it. the u'data' format confuses me (unicode?) and i haven't been able to wrap my brain around json formatted data and translating with python (even with a module called simplejson!)
so near as i can tell, the "Individuals Present" array and "Individuals with Read Counts" documents are not "cursored" - whatever dictionary-creating magic happens with saving the results of the find() as a cursor doesn't apply to embedded data. something tells me this should be an easy fix, it's just a matter of finding it. so that's up next...
Sarah;
ReplyDeleteGreat work -- glad you are making some progress.
In the code you posted, you can just do 'for x in cursor' and avoid the 'iter()' function entirely.
For your locus information, could you push the code to GitHub? What it looks like is that the individual is a dictionary, and you are printing that out. So you should be able to get individual items with locus["Individuals Present"]. The u'data' is unicode, as you suspected and totally normal. If you print out an individual item (locus["Individuals Present"][0]) the unicode part will "go away" since it gets converted into a nice string to display in the terminal.
Let us know if you get stuck with anything, happy to help with more specifics after taking a look at the code,
Brad