Thursday, December 19, 2013

Final Project

Visualizing bacteria described and named in the last century

http://home.utah.edu/~ks42/Visualization_assignment/nomenclature.html

download code

Instruction for visualization.
Either use link above to view visualization from the web or download and unpack the code into a directory.

The code files I wrote
bubble_nomenclature.js
timeline.js 
nomenclature.html

If you download the code go to the terminal and start a server using the following command
$ python -m SimpleHTTPServer 8080
then in your browser open the html file in the unpacked directory from above

ex. (localhost:8080/PATH_TO/Visualization_assignment/nomenclature.html)

You can resize the visual in the browser using command -/+
The slider at the top allow you to select years, once you have selected the slider handle you can use arrow keys.
A mouseover on a bubble brings up a tooltip.


Proposal:
Background:
In the late 1800s bacterial taxonomy was proposed and over the last century technological improvements have allowed us to better distinguish novel organisms. The result is that today, there are >12,000 named bacteria species.
My goal is create a visualization that shows how the number of named organism has grown over-time. I will use D3 as the visual toolkit. The majority of the data can be compiled from the german culture collection website (www.dsmz.de). My background is in the domain of interest. I am a microbiologist who how has largely worked on taxonomic matters and identifying unknown microorganisms. 

Main objective:
To create a bubble chart with each bubble representing a genus (~2000) and the size of the bubble representing the number of species that have been named under the genus The bubbles will increase in size as new species are described. Time will most likely be yearly increments.
[Examples Mycobacterium (genus); M. chelonae and M. abscessus (two mycobacterium species].

Secondary objective 1:
Include a time line with important scientific/microbiology achievements to add context to the time points.
Secondary objective 2:
Test visual representations that could group bubbles based on their relatedness. Pack bubbles?


Project NoteBook

I started the project back in early october in order to use it at a molecular pathology conference. I gave a talk regarding microbiology sequence databases.  The version of the visualization I showed at the conference was well received.

I decided to use D3 for the project, because of the library of visualization types and the ability to transition between different values.  This final project was also an opportunity to explore this toolkit and gain experience with javascript and web development.

The Data:
The data used for the visualization is from the german culture collection.  They provided a text file that contains a list of taxa and the year the species was described.

The dsmz data file:

The most important columns for this project are GENUS, SPECIES and AUTHORS.  The authors column indicates when the species was first described in a publication.  If two of more dates were present I always used the lower value.  Multiple dates often represent revisions to the taxa.  

This data was transformed into the following format with parsing using python scripts.  Each row represented a unique genera and the columns are different years with the cumulative number of species described at that point in time.  This was saved as a csv to used in D3. I test several version and ultimately settled on version using 10 year increments.


D3 illustration.
I used the static bubble chart located at http://bl.ocks.org/mbostock/4063269 as a template. What I was attempting to do with the chart I had not yet seen done in d3.  That is to have the bubbles change in size over-time.  So found that this is actually a complicated issue for the following reason:
The bubble pack library in D3 takes care of the spacing. So you pass it the value/size and number of bubbles and it calculates the space they should take. Therefor the bubble size is relative to other bubbles at a given time point.

I spent quite a lot of time testing different ways to overcome this. The primary thing that helped was to pass the largest values (year 2013) for each bubble and then use that value to resize.  I really was not pleased with this effect because it created large space around bubbles that would ultimately get bigger (pictured below) in later years. The good thing about this is that you could set it so no overlap would occur.

What I settled on was to give each bubble a location regardless of how big it would become.  The problem with this is that overlap would now occur. because I allowed overlap I made the bubbles transparent. One reason I was sold on this representation was that it kinds of looks like a petri dish with bacterial colonies spotted on it.


Controls
I played around with radio, automatic, and slider controls to change between the years.  First I tried radio buttons, which worked fine.  It took some time to figure out how to update the values to get the transition to be visually appealing.  The radio buttons seemed very old school, so I wanted to try alternatives.


I thought no controls would be an interesting experiment. I was able to accomplish this by using the following structure

....
                             .duration(5000)
                            .delay(500)
                            .ease("linear").each("end",function() { // finished 1950
                                d3.select(this).       // add the new values
                                    transition()         // and transition
                                    .attr("r", function (d) { return d._1975 > 2 ? d._1975 / 2 : d._1975 ;})
                                    .duration(5000)
                                    .delay(500) //delay before next step
                            })...

I missed having control of the animation and did not feel I had the time to spend to create a 'pause'able visualization, since I could not find any d3 documentation or examples that implemented this.

In the final version of the visualization I used a slider. The slider has a natural quantitative encoding, so I though it was the most appropriate way to allow the user to change years.

Timeline (secondary objective 1)
The timeline was meant to supplement the visualization and add some historical context.  I created a second d3 javascript file to implement the timeline. I obtained some historical footnotes for the American Society of Microbiology website to pepper the timeline.


The most difficult part of the timeline was adding a tooltip.  I went though several version of my own and user created libraries.

Cluster bubbles based on relationship(Secondary objective 2)
I played around with ways to do this but never achieved a usable version.  The bubble packing in D3 did not scale to my values. Here is an example of my initial test.  The packing is based on the taxonomic lineage (e.g. phylum, order, class, family) 

Summary:
I am pleased with the visualization. It conveys the information I was intending on showing, which is that species descriptions are growing at an accelerated rate. In my AMP talk, additional context was added concerning the sequence reference databases and clinical diagnostic testing.

I think what failed was having to much overlap.  This affected the way the tooltip worked when hovering over the certain bubbles. Maybe sorting the data could have alleviated smaller bubble being behind larger ones.  I would have also like it to look a little more professional but since I was learning d3, javascript, html, and css all at the sametime, I did not have the skills yet to do that.

For me personally, I learned a lot about web development and feel I will be using d3 and processing a lot more in my work.  I also learned a lot in the class, thanks for teaching it. It was very enjoyable.






Tuesday, November 26, 2013

Transfer functions

DOWNLOAD CODE


Understanding Transfer Functions

I explored both the x-mas and engine example sets in ImageVis3D.  It was more difficult to create a nice image with the x-mas set. Below is an image with the alpha adjusted to hide portions of the image that obscure the tree.
With the alpha adjusted I attempted to modify the colors. Below represents one of the more successful attempts.  With the adjustment I was able to apply different colors to the tree itself and the ornaments.

  • What did you like about the transfer function editor?
The ability to drag your mouse around to adjust the channels made it easy to play around with different values. The histogram with the values was also nice.
  • What is difficult about this editor / widget?
It would have been nice to have a reset button.  Understanding the checkboxes for the color components took some time figure out.

  • How would you improve the 1D transfer function editor?
I would require each channel to be set separately. 


Running the Volume Renderer

Volume Visualization and Control Panel

I explored several datasets in the processing volume visualization application.  Below is the bucky ball dataset.
  • What were you able to find from your volume data set?
The buckyballs isovalues are between 233 and below 128.  Setting the center out of this range limits the ability to view the volume rendering.
  • What is useful about the step function?
The major attribute that makes it possible to create a good view of any of the datasets is the CENTER value. 
  • What makes this particular function limited?
 My assumption is that the CENTER essentially replaces the need to set in alpha for the dataset.
With this type of input you cannot set the alpha value to include different isovalue ranges.

Code Structure 
Designing Your Own Transfer Function Widget 

It seems to me that the most useful feature of a transfer function widget is a histogram that provides some information about the data values.  With the histogram present the ability to visualize how the four channels are mapped on top provides the most useful ability for adjustment.

Below are my sketches.  My first idea was really simple.  It was just to add range sliders for each of the channels.  These could be adjusted and the image change would allow the user to evaluate the correct values to use.


The second idea was to add a histogram and a range slider. This is very similar to the first simple widget however the histogram would display where the channels are currently mapped.


My final design choice and the one that I will proceed with involves 4 histograms, one for each channel.  Each histogram would have a vertical slider and a horizontal range slider that would adjust the intensity and range to map respectively.  In addition the user will have the ability to add additional sliders and thus the ability to map additional values to a single channel.  This design choice is more limited than the widget provided in ImageVis3D but It will provide similar functionality and allow me to explore the Controls library.






My final control panel is below.  The panel has four histograms for each channel.  The color of the controls c indicate which channel is being adjusted.  Below each histogram is a range slider that can be adjusted on each end and can be moved by clicking the middle.  The rectangle on the histogram indicates the current range selected to add that channels values. A slider on the histogram allows the intensity of the channel to be adjusted.  Clicking the 'plus' button under each range slider allows the user to set additional values for a single channel.  This interaction can be seen in the alpha channel. The rectangle is numbered to make it easier differentiate multiple channel settings.

The histogram represents the count of values from 1-255.  The counts are log transformed to reduce compress the data.
Finding Good Transfer Functions 

I first evaluated the bonsai tree using my transfer function widget.  The image is below. The above control panel was used to generate the image.  Like most of the datasets the values at the beginning are overrepresented noise.  The leaves of the bonsai have values from 0-54, where the pot has values from 199-210.  This allowed me to color them different.
 I then looked at the foot dataset.  The flesh had a lower isovalue which I was able to render as red.  The bone isovalues ranged from 50 - 255 and the more dense portions had higher isovalues.



  • What are the strengths and weaknesses of your design?
I like my design, I think it is easy to use and allows for a lot of adjustment.  My choice of using sliders does not allow the user to create nuanced  adjustments. They are forced to use rectangles.  This is a weakness. I attempted to overcome this by allowing multiple boxes
  • What would you change to make your widget more effective?
I would allow for even more rectangles to be drawn. I would remove the vertical slider and allow the user to drag the highlighting rectangle instead.  They would be able to adjust the middle-top, which would move the the box up and down or grab a corner to move the corner down or up creating an irregular shape.
  • What are the pros and cons for volume rendering as a technique? What are the challenges?
Volume rendering allows you to focus on different densities of the image and move through the image, which can lead to greater insight.  The con is that it is hard to automate a good rendering and the finding a good representation is often done empirically.

Sunday, November 10, 2013

Scalar_data

DOWNLOAD CODE



DATA READER
I started the assignment by loading the data into a 1-D array and then mapping the highest and lowest pixel values as 1 and 255, respectively.  This should create a grey scale image.  I then displayed this image on the screen using the width and height values parsed from the NRRD file. Below is the image as it appeared on the screen and mapped to the same coordinates as the NRRD file.

COLOR MAP
I then changed the grey scale to a color mapped image. I did this by using colorLerp() and choosing between two different saturation levels of the same color (below) and also two different colors, red and green (below).

Questions
  • Where did you get your color map?
    • I used color sphere linked under the color lecture to choose my colors.  For the red and green I just picked them to see what they looked like.
  • What makes it an appropriate color map for this data?
    • The colors need to show the contrast in order to see the image details

INTERPOLATE THE GRID
Next I adjusted the grid size to have a fixed height of 800 px.  The major change in drawing the image was to use rect() instead of point(), so that I could fill in the white space that was created by stretching the image. Below is the image stretched.
To do the bilinear interpolation I changed to the test set so that I could have a smaller data set.  Here is an image of the test set without interpolation.
While in the end I found the bilinear interpolation to be straight forward. I did have a difficult time conceptualizing it.  To make it a little easier I modified my code and put the data into a 2-D array.  Then I looped through each data points getting the color values of each of the four corners. I used the map function to convert them to values between 1-255.  Then I used lerp() to get the first x and 2nd x values.  Then those two values were used to get the corresponding y value.  This was done for every pixel position.  The following image was produced after implementing the bilinear interpolation.
Now I looked at the brain data and produced the following interpolated image.  The image does look smoother than the first stretched image.

Questions
  • What, if anything, makes interpolation of your data tricky?
    • The interpolation is made a little tricky by not having only integers.  
  • Do you notice anything odd about the data? Do any values stick out? 
    • I did not notice anything odd the data seemed to work fine once the algorithm was implemented.

ISOCONTORS - MARCHING CUBES
I went back to the test data to test the marching cubes.  I first implemented the algorithm without interpolating the data but just by figuring out the cell binary values and then using those values to draw the possible cubes.  I drew the following image using this approach.
 This matched the example on the assignment page with the exception that some of my ambiguous cube cases were flipped.  I then used the map() function to help mapping the values to the grid range. The following image is achieved after this adjustment.
Next I loaded the brain data set with an isovalue of 176.
To explore the isovalues, I added an up-down-arrow interaction which will increment the isovalue by 2.  On the low end the image will disappear when the isovalue is set around 100 on the high end it disappears around 230.  Below is an image with an isovalue of 210 and 144.


I also implement the 'c' keystroke to switch between marching cubes and the bilinear interpolation image.

Questions
  • Are there any problems with your marching squares algorithm?
    • it works as described
  • What is an interesting isovalue on the brain data set? Why?
    • the isovalue 210 is interesting because it highlights the skull rather than the brain.
  • Compared to a color map, are there any tasks that isocontours seem more effective for? Why or why not? Which technique do you think is better?
    • I don't think one technique is better that the other.  The ability to change the isovalues is useful for the isocontors, this would be useful to focus attention to a particular feature.  
DATA EXPLORATION
Below is the Mt. Hood with the color values I used for the brain data.
The following is the Mt. Hood data with a three toned color map.  Negative values in the file make a clear demarcation between the high contrast areas.
For marching squares the image did not show up until the isovalue was set to 126 or lower.  The highest peaks are not seen until the isovalue is less than zero.

Questions
  • How did you adjust your color map for the mt Hood data set?
    • I added a third tone to the color map to make the peak area less saturated.
  • Did the isocontors in the mt Hood dataset differ from the brain data set? Why?
    • Yes there are far less points that have similar values. The effect is only a few parts of the image are contoured at any one isovalue.
  • For the brain and mt Hood data sets, were either color maps or isocontors more effective for either one of these data sets? Why?
    • I think the brain set work much better with the contors because of the issue mentioned in the previous question. the color map that I had set for the brain was not as effective with the mount hood data. Once adjusted the color map seemed to be similar for both.
the mt. hood adjusted color map on the brain data.

Tuesday, October 22, 2013

Parallel Coordinates

Parallel Coordinates

DOWNLOAD CODE


APPLICATION INSTRUCTIONS:
The code downloaded from the link above is an interactive parallel coordinate visual.  The application will display car (default) or camera data.
Select Data Source:
Change the data source by clicking the 'Data Source' button in the control panel on the right.
Order of Lines:
Lines can be moved by click and dragging the line handle at the bottom of each line. when the mouse is released the lines will be order based on the current position.
Sorting:
Each line can be toggled into descending (default) or ascending order by clicking the arrow next to the line handle.  The arrow indicates the current direction of sorting.
Selection:
To select data point move the mouse near a line and click and drag down or up to select data points. Release to set selection. Data points not selected will be colored in a light gray hue.
Deleting Lines:
A line can be deleted by clicking the x icon near the line handle.
Clustering:
To perform k-means clustering click the 'perform k-means' in the control panel.  To change the number of clusters use the arrows next to the button.  The range of cluster allowed is 2-8.
Cluster coloring:
To modify cluster coloring drag a color from the color panel to a cluster and release.
Reset:
To remove any ordering, selections, deletions, or clustering click the reset button.



NOTEBOOK

Planning Phase:
Started the project.
I have read through the assignment and downloaded the data set.

1) Cleaning the data.
I inspected the dataset and cleaned the file in excel.  To clean the data I created a header column that had the column names, similar to the headings in the  Jason Davies' example. I noticed that there is a column listed as 'Origin' excluded from the Davies' example.  I went ahead and kept the column in place. This cleaned set was exported as a tab separated file. Some values are blank, it appears that Davies' simply recorded these values as zero. I may do the same.

2) Import data into processing.
I went ahead and repurposed the FloatTable.pde from the previous assignment.  I then imported the data to make sure it loaded fine. Using a couple of print statements as shown below indicated that the data was imported fine and I am ready to proceed.
3) Sketching out plan.
I looked at the assignment in more detail and came up with the following types of interactions.

As per the example instructions:
- The lines will be moveable and dragged into position between other lines.  I want a handle at the bottom of the line to achieve this.
- Each line can be sorted in ascending or descending order. I will have a clickable arrow next to each line that will allow the user to control sorting.
- To filter the data I will include a 'mouse dragged' selection on the line so that certain data points can be highlight.  The Davies' example had this type of selection.
- I would also like the user to be able to delete a column.


4) Programming.
First goal - draw the lines with the datapoints represented as dots.
This will required scaling the data to a line of fixed length. I used the map function to translate the points to the plot. The screenshot shows data from the first column of the Car data.

In this sketch I created the line for the first column of data. the red dots are the data points applied to the line. the circle will be my handle for moving the columns.


After making the initial line representation I decided to make a line class. Spacing is determined by the plot area and number of lines. Below each column of the car data is displayed with the red points encoding the data points.
I modified the code to create a line class.  The red dots represent the data.
Prior to working on the labels and scale, I wanted to play around with the interaction.  Having less than optimal results with MouseDragged I instead used a 'click to select' method.  Click once on the circle below the line and the line will move based on the mouseX position. Click again and the line is released. Lines currently do not reorient.

The movement works well. I next wanted to draw the lines between each line. Below is the results of drawing the lines.
With the lines in place I coded more of the interaction.  The visual will now move lines into position based on there X-value.  If a click is made to release the line it will be placed between the lines it was moved to. I was able to re-tool the interaction and now the line can be clicked and dragged if it is released between two other lines it will snap to that location.
Prior to adding more interactions I wanted to add labels and scales.  Below is the addition of labels and scales. I rounded values to be more visually appealing and adjusted the mapping to correlate with the updated scale.
For the next step I wanted to allow sorting of the axis.  If I planned everything right this should only require swapping the coordinates in the map system.  To simplify I added a clickable arrow to allow the reordering for each line.
Adding selection - Here I created a selection box that can be created when near a line.  The lines that are selected will be highlight.
5) k-means clustering
At this point the major tasks outlined in the assignment have been achieved.  For the next step I will initiate the k-means clustering

k-means: I went to the wikipedia page and also found a good tutorial at http://mnemstudio.org/clustering-k-means-example-1.htm.  At this site they had both python and java codes examples.  I used the python code as a skeleton and modified it to work with the car data in procession.  After a few days of modification I had the clustering working and used color to categorize the clusters. The clustering class that I made will allow many dimensions and the ability to choose the number of clusters. the screen shot below shows four different clusters. Two or three clusters works best for this dataset.
6) Controls and Selection.
k-means clustering.  I will allow the user to choose the number of clusters and to start the clustering. When data points are selected the k-means encoding will be removed. below is a screenshot.
The second control will allow the selection of color.  I used colorbrewer to pick colors and then added two different saturation level for each hue.  Using the lower saturation allows better visualization of lines that lay on top of one another, since they become darker.  The clusters for k-means can now be modified by dragging new colors to the legend. how many can we actually distinguish?  We are only good at distinguishing 6-12 colors simultaneously.
7) Viewing the second dataset.
While I was implement the controls above I was also exploring the Camera dataset.  While doing this I need to slightly adjust the plot size and I staggered the line labels to prevent overlap. below is a screenshot of the camera data.
8) The one addition that I made to better explore the data was a delete line button, which allows users to remove a line that may not be informative.  clustering can then be performed on a subset of the data. With this a user can remove data that is less informative in the clustering. The image below shows clustering with the year and origin removed from the car data.

Critique:
The parallel coordinate representation is great for visualizing quantitative values at once.  While I had difficulty at first with the view, I have grown to appreciate it.  I think without interactions the view has limited utility, but with interaction it becomes a very powerful tool to explore the data. The interactions that have the most utility are repositioning the line, sorting the data in ascending or descending order, and highlighting specific data points. Deleting lines is also helpful to remove data that is less interesting. The ability to include color-based data categories would also be useful. In my application I did not fully implement the last interaction.

With the interaction it was much easier to spot associations in the data. For example, by highlighting Japanese and European car manufacturers, it illustrates that no cars in the highlighted subset are 8 cylinders (below).

Similarly, the correlation between the time-to-60 mph is correlated with the number of cylinders. Obviously this dataset does not include Ferraris. 
While parallel coordinates has the most power with quantitative data, it can display coded categorical data as well. The 'origin' in the car data is an example of this.  The values are encode 1-3 to display on the view however they are not ordered.  It is valuable to have them because they allow the user to select that data directly from the graph.

I like the data view and the interactions that my application provides. I would have liked to include more color coding based on categories, however all item I outlined during my sketching process were incorporated into the application. I think the sketching process was useful, because it required thinking about the problem and then designing the code prior to coding, Ultimately the sketching made me represent the data as a line class, which made all other interactions much easier to implement.  If given more time I would have liked to make more the visual and interaction buttons more visually appealing. Is there a library for nice processing buttons?