Visualization_assignments: Parallel Coordinates

Parallel Coordinates

APPLICATION INSTRUCTIONS:
The code downloaded from the link above is an interactive parallel coordinate visual. The application will display car (default) or camera data.
Select Data Source:
Change the data source by clicking the 'Data Source' button in the control panel on the right.
Order of Lines:
Lines can be moved by click and dragging the line handle at the bottom of each line. when the mouse is released the lines will be order based on the current position.
Sorting:
Each line can be toggled into descending (default) or ascending order by clicking the arrow next to the line handle. The arrow indicates the current direction of sorting.
Selection:
To select data point move the mouse near a line and click and drag down or up to select data points. Release to set selection. Data points not selected will be colored in a light gray hue.
Deleting Lines:
A line can be deleted by clicking the x icon near the line handle.
Clustering:
To perform k-means clustering click the 'perform k-means' in the control panel. To change the number of clusters use the arrows next to the button. The range of cluster allowed is 2-8.
Cluster coloring:
To modify cluster coloring drag a color from the color panel to a cluster and release.
Reset:
To remove any ordering, selections, deletions, or clustering click the reset button.

NOTEBOOK

Planning Phase:
Started the project.
I have read through the assignment and downloaded the data set.

1) Cleaning the data.
I inspected the dataset and cleaned the file in excel. To clean the data I created a header column that had the column names, similar to the headings in the Jason Davies' example. I noticed that there is a column listed as 'Origin' excluded from the Davies' example. I went ahead and kept the column in place. This cleaned set was exported as a tab separated file. Some values are blank, it appears that Davies' simply recorded these values as zero. I may do the same.

2) Import data into processing.
I went ahead and repurposed the FloatTable.pde from the previous assignment. I then imported the data to make sure it loaded fine. Using a couple of print statements as shown below indicated that the data was imported fine and I am ready to proceed.

3) Sketching out plan.
I looked at the assignment in more detail and came up with the following types of interactions.

As per the example instructions:
- The lines will be moveable and dragged into position between other lines. I want a handle at the bottom of the line to achieve this.
- Each line can be sorted in ascending or descending order. I will have a clickable arrow next to each line that will allow the user to control sorting.
- To filter the data I will include a 'mouse dragged' selection on the line so that certain data points can be highlight. The Davies' example had this type of selection.
- I would also like the user to be able to delete a column.

4) Programming.

First goal - draw the lines with the datapoints represented as dots.

This will required scaling the data to a line of fixed length. I used the map function to translate the points to the plot. The screenshot shows data from the first column of the Car data.

In this sketch I created the line for the first column of data. the red dots are the data points applied to the line. the circle will be my handle for moving the columns.

After making the initial line representation I decided to make a line class. Spacing is determined by the plot area and number of lines. Below each column of the car data is displayed with the red points encoding the data points.

I modified the code to create a line class. The red dots represent the data.

Prior to working on the labels and scale, I wanted to play around with the interaction. Having less than optimal results with MouseDragged I instead used a 'click to select' method. Click once on the circle below the line and the line will move based on the mouseX position. Click again and the line is released. Lines currently do not reorient.

The movement works well. I next wanted to draw the lines between each line. Below is the results of drawing the lines.

With the lines in place I coded more of the interaction. The visual will now move lines into position based on there X-value. If a click is made to release the line it will be placed between the lines it was moved to. I was able to re-tool the interaction and now the line can be clicked and dragged if it is released between two other lines it will snap to that location.

Prior to adding more interactions I wanted to add labels and scales. Below is the addition of labels and scales. I rounded values to be more visually appealing and adjusted the mapping to correlate with the updated scale.

For the next step I wanted to allow sorting of the axis. If I planned everything right this should only require swapping the coordinates in the map system. To simplify I added a clickable arrow to allow the reordering for each line.

Adding selection - Here I created a selection box that can be created when near a line. The lines that are selected will be highlight.

5) k-means clustering
At this point the major tasks outlined in the assignment have been achieved. For the next step I will initiate the k-means clustering

k-means: I went to the wikipedia page and also found a good tutorial at http://mnemstudio.org/clustering-k-means-example-1.htm. At this site they had both python and java codes examples. I used the python code as a skeleton and modified it to work with the car data in procession. After a few days of modification I had the clustering working and used color to categorize the clusters. The clustering class that I made will allow many dimensions and the ability to choose the number of clusters. the screen shot below shows four different clusters. Two or three clusters works best for this dataset.

6) Controls and Selection.
k-means clustering. I will allow the user to choose the number of clusters and to start the clustering. When data points are selected the k-means encoding will be removed. below is a screenshot.

The second control will allow the selection of color. I used colorbrewer to pick colors and then added two different saturation level for each hue. Using the lower saturation allows better visualization of lines that lay on top of one another, since they become darker. The clusters for k-means can now be modified by dragging new colors to the legend. how many can we actually distinguish? We are only good at distinguishing 6-12 colors simultaneously.

7) Viewing the second dataset.
While I was implement the controls above I was also exploring the Camera dataset. While doing this I need to slightly adjust the plot size and I staggered the line labels to prevent overlap. below is a screenshot of the camera data.

8) The one addition that I made to better explore the data was a delete line button, which allows users to remove a line that may not be informative. clustering can then be performed on a subset of the data. With this a user can remove data that is less informative in the clustering. The image below shows clustering with the year and origin removed from the car data.

Critique:
The parallel coordinate representation is great for visualizing quantitative values at once. While I had difficulty at first with the view, I have grown to appreciate it. I think without interactions the view has limited utility, but with interaction it becomes a very powerful tool to explore the data. The interactions that have the most utility are repositioning the line, sorting the data in ascending or descending order, and highlighting specific data points. Deleting lines is also helpful to remove data that is less interesting. The ability to include color-based data categories would also be useful. In my application I did not fully implement the last interaction.

With the interaction it was much easier to spot associations in the data. For example, by highlighting Japanese and European car manufacturers, it illustrates that no cars in the highlighted subset are 8 cylinders (below).

Similarly, the correlation between the time-to-60 mph is correlated with the number of cylinders. Obviously this dataset does not include Ferraris.

While parallel coordinates has the most power with quantitative data, it can display coded categorical data as well. The 'origin' in the car data is an example of this. The values are encode 1-3 to display on the view however they are not ordered. It is valuable to have them because they allow the user to select that data directly from the graph.

I like the data view and the interactions that my application provides. I would have liked to include more color coding based on categories, however all item I outlined during my sketching process were incorporated into the application. I think the sketching process was useful, because it required thinking about the problem and then designing the code prior to coding, Ultimately the sketching made me represent the data as a line class, which made all other interactions much easier to implement. If given more time I would have liked to make more the visual and interaction buttons more visually appealing. Is there a library for nice processing buttons?

Visualization_assignments

Tuesday, October 22, 2013

Parallel Coordinates

No comments:

Post a Comment