Visualizing bacteria described and named in the last century
http://home.utah.edu/~ks42/Visualization_assignment/nomenclature.htmldownload code
Instruction for visualization.
Either use link above to view visualization from the web or download and unpack the code into a directory.
The code files I wrote
bubble_nomenclature.js
timeline.js
nomenclature.html
If you download the code go to the terminal and start a server using the following command
$ python -m SimpleHTTPServer 8080then in your browser open the html file in the unpacked directory from above
ex. (localhost:8080/PATH_TO/Visualization_assignment/nomenclature.html)
You can resize the visual in the browser using command -/+
The slider at the top allow you to select years, once you have selected the slider handle you can use arrow keys.
A mouseover on a bubble brings up a tooltip.
Proposal:
Background:
In the late 1800s bacterial taxonomy was proposed and over the last century
technological improvements have allowed us to better distinguish novel organisms. The
result is that today, there are >12,000 named bacteria species.
My goal is create a visualization that shows how the number of named organism has grown over-time. I will use D3 as the visual toolkit. The majority of the data can be compiled from the german culture collection website (www.dsmz.de). My background is in the domain of interest. I am a microbiologist who how has largely worked on taxonomic matters and identifying unknown microorganisms.
Main objective:
To create a bubble chart with each bubble representing a genus (~2000) and the size of the bubble representing the number of species that have been named under the genus The bubbles will increase in size as new species are described. Time will most likely be yearly increments.
[Examples Mycobacterium (genus); M. chelonae and M. abscessus (two mycobacterium species].
Secondary objective 1:
Include a time line with important scientific/microbiology achievements to add context to the time points.
Secondary objective 2:
Test visual representations that could group bubbles based on their relatedness. Pack bubbles?
My goal is create a visualization that shows how the number of named organism has grown over-time. I will use D3 as the visual toolkit. The majority of the data can be compiled from the german culture collection website (www.dsmz.de). My background is in the domain of interest. I am a microbiologist who how has largely worked on taxonomic matters and identifying unknown microorganisms.
Main objective:
To create a bubble chart with each bubble representing a genus (~2000) and the size of the bubble representing the number of species that have been named under the genus The bubbles will increase in size as new species are described. Time will most likely be yearly increments.
[Examples Mycobacterium (genus); M. chelonae and M. abscessus (two mycobacterium species].
Secondary objective 1:
Include a time line with important scientific/microbiology achievements to add context to the time points.
Secondary objective 2:
Test visual representations that could group bubbles based on their relatedness. Pack bubbles?
Project NoteBook
I started the project back in early october in order to use it at a molecular pathology conference. I gave a talk regarding microbiology sequence databases. The version of the visualization I showed at the conference was well received.
I decided to use D3 for the project, because of the library of visualization types and the ability to transition between different values. This final project was also an opportunity to explore this toolkit and gain experience with javascript and web development.
This data was transformed into the following format with parsing using python scripts. Each row represented a unique genera and the columns are different years with the cumulative number of species described at that point in time. This was saved as a csv to used in D3. I test several version and ultimately settled on version using 10 year increments.
D3 illustration.
I used the static bubble chart located at http://bl.ocks.org/mbostock/4063269 as a template. What I was attempting to do with the chart I had not yet seen done in d3. That is to have the bubbles change in size over-time. So found that this is actually a complicated issue for the following reason:
The bubble pack library in D3 takes care of the spacing. So you pass it the value/size and number of bubbles and it calculates the space they should take. Therefor the bubble size is relative to other bubbles at a given time point.
I spent quite a lot of time testing different ways to overcome this. The primary thing that helped was to pass the largest values (year 2013) for each bubble and then use that value to resize. I really was not pleased with this effect because it created large space around bubbles that would ultimately get bigger (pictured below) in later years. The good thing about this is that you could set it so no overlap would occur.
What I settled on was to give each bubble a location regardless of how big it would become. The problem with this is that overlap would now occur. because I allowed overlap I made the bubbles transparent. One reason I was sold on this representation was that it kinds of looks like a petri dish with bacterial colonies spotted on it.
Controls
I played around with radio, automatic, and slider controls to change between the years. First I tried radio buttons, which worked fine. It took some time to figure out how to update the values to get the transition to be visually appealing. The radio buttons seemed very old school, so I wanted to try alternatives.
I thought no controls would be an interesting experiment. I was able to accomplish this by using the following structure
....
.duration(5000)
.delay(500)
.ease("linear").each("end",function() { // finished 1950
d3.select(this). // add the new values
transition() // and transition
.attr("r", function (d) { return d._1975 > 2 ? d._1975 / 2 : d._1975 ;})
.duration(5000)
.delay(500) //delay before next step
})...
I missed having control of the animation and did not feel I had the time to spend to create a 'pause'able visualization, since I could not find any d3 documentation or examples that implemented this.
In the final version of the visualization I used a slider. The slider has a natural quantitative encoding, so I though it was the most appropriate way to allow the user to change years.
Timeline (secondary objective 1)
The timeline was meant to supplement the visualization and add some historical context. I created a second d3 javascript file to implement the timeline. I obtained some historical footnotes for the American Society of Microbiology website to pepper the timeline.
The most difficult part of the timeline was adding a tooltip. I went though several version of my own and user created libraries.
Cluster bubbles based on relationship(Secondary objective 2)
I played around with ways to do this but never achieved a usable version. The bubble packing in D3 did not scale to my values. Here is an example of my initial test. The packing is based on the taxonomic lineage (e.g. phylum, order, class, family)
Summary:
I am pleased with the visualization. It conveys the information I was intending on showing, which is that species descriptions are growing at an accelerated rate. In my AMP talk, additional context was added concerning the sequence reference databases and clinical diagnostic testing.
I think what failed was having to much overlap. This affected the way the tooltip worked when hovering over the certain bubbles. Maybe sorting the data could have alleviated smaller bubble being behind larger ones. I would have also like it to look a little more professional but since I was learning d3, javascript, html, and css all at the sametime, I did not have the skills yet to do that.
For me personally, I learned a lot about web development and feel I will be using d3 and processing a lot more in my work. I also learned a lot in the class, thanks for teaching it. It was very enjoyable.
The Data:
The data used for the visualization is from the german culture collection. They provided a text file that contains a list of taxa and the year the species was described.
The dsmz data file:
The most important columns for this project are GENUS, SPECIES and AUTHORS. The authors column indicates when the species was first described in a publication. If two of more dates were present I always used the lower value. Multiple dates often represent revisions to the taxa.
This data was transformed into the following format with parsing using python scripts. Each row represented a unique genera and the columns are different years with the cumulative number of species described at that point in time. This was saved as a csv to used in D3. I test several version and ultimately settled on version using 10 year increments.
D3 illustration.
I used the static bubble chart located at http://bl.ocks.org/mbostock/4063269 as a template. What I was attempting to do with the chart I had not yet seen done in d3. That is to have the bubbles change in size over-time. So found that this is actually a complicated issue for the following reason:
The bubble pack library in D3 takes care of the spacing. So you pass it the value/size and number of bubbles and it calculates the space they should take. Therefor the bubble size is relative to other bubbles at a given time point.
I spent quite a lot of time testing different ways to overcome this. The primary thing that helped was to pass the largest values (year 2013) for each bubble and then use that value to resize. I really was not pleased with this effect because it created large space around bubbles that would ultimately get bigger (pictured below) in later years. The good thing about this is that you could set it so no overlap would occur.
What I settled on was to give each bubble a location regardless of how big it would become. The problem with this is that overlap would now occur. because I allowed overlap I made the bubbles transparent. One reason I was sold on this representation was that it kinds of looks like a petri dish with bacterial colonies spotted on it.
Controls
I played around with radio, automatic, and slider controls to change between the years. First I tried radio buttons, which worked fine. It took some time to figure out how to update the values to get the transition to be visually appealing. The radio buttons seemed very old school, so I wanted to try alternatives.
I thought no controls would be an interesting experiment. I was able to accomplish this by using the following structure
....
.duration(5000)
.delay(500)
.ease("linear").each("end",function() { // finished 1950
d3.select(this). // add the new values
transition() // and transition
.attr("r", function (d) { return d._1975 > 2 ? d._1975 / 2 : d._1975 ;})
.duration(5000)
.delay(500) //delay before next step
})...
I missed having control of the animation and did not feel I had the time to spend to create a 'pause'able visualization, since I could not find any d3 documentation or examples that implemented this.
In the final version of the visualization I used a slider. The slider has a natural quantitative encoding, so I though it was the most appropriate way to allow the user to change years.
Timeline (secondary objective 1)
The timeline was meant to supplement the visualization and add some historical context. I created a second d3 javascript file to implement the timeline. I obtained some historical footnotes for the American Society of Microbiology website to pepper the timeline.
The most difficult part of the timeline was adding a tooltip. I went though several version of my own and user created libraries.
Cluster bubbles based on relationship(Secondary objective 2)
I played around with ways to do this but never achieved a usable version. The bubble packing in D3 did not scale to my values. Here is an example of my initial test. The packing is based on the taxonomic lineage (e.g. phylum, order, class, family)
Summary:
I am pleased with the visualization. It conveys the information I was intending on showing, which is that species descriptions are growing at an accelerated rate. In my AMP talk, additional context was added concerning the sequence reference databases and clinical diagnostic testing.
I think what failed was having to much overlap. This affected the way the tooltip worked when hovering over the certain bubbles. Maybe sorting the data could have alleviated smaller bubble being behind larger ones. I would have also like it to look a little more professional but since I was learning d3, javascript, html, and css all at the sametime, I did not have the skills yet to do that.
For me personally, I learned a lot about web development and feel I will be using d3 and processing a lot more in my work. I also learned a lot in the class, thanks for teaching it. It was very enjoyable.