Loch Prospector: MetaData Visualization for Lakes of Open Data

Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne

Motivation

Research on managing open data requires an appropriate collection of such datasets in order to test novel algorithms and techniques. Gaining insights about their properties can also inform the process of designing algorithms or benchmarks. Currently, the interface of most open data portals is limited in that regard.

Visualization

Here we provide a visualization that uses Multidimensional Scaling (MDS) to depict datasets from https://www.data.gov/. We focus on four structural (metadata) attributes of the datasets:

The number of rows
The number of columns
The percentage of null values
The percentage of unique values (those that appear only once in the column)

We also show (on the right) the distribution of these 4 attributes for each selection of datasets.

How MDS works

Multidimensional Scaling can help us encode many attributes in the same visualization. For each pair of datasets d₁,d₂ we calculate their (weighted) Euclidean distance:

\( \begin{align} dist(d_1, d_2) = & [w_r(d_1.rows-d_2.rows)^2 + w_c(d_1.cols-d_2.cols)^2 + \\& w_n(d_1.nulls-d_2.nulls)^2+ w_u(d_1.unqs-d_2.unqs)^2]^{1/2} \end{align}\)

An MDS algorithm then places the points (i.e., datasets) in a 2-dimensional space such that these distances are preserved as much as possible.

Acknowledgments

D3: Data-Driven Documents by Mike Bostock.
MathJax
Pure CSS responsive "Fork me on GitHub" ribbon by Chris Heilmann.
An implementation of MDS by Ben Frederickson.
Reusable Charts for Scatterplot by Mike Bostock.
Sliders for filters.
Histogram chart by Yan Holtz and also by Mike Bostock.
Cooperative brush and tooltip by Matthieu Viry.

IEEE Vis 2020

Paper and Supplementary Materials can be found at: https://osf.io/zkxv9/

Watch a short 30s video teaser:

or a 7 minute demo video:

Number of rows:
Number of columns (Total):
Null Values (Total):
Unique Values (Total):

Loch Prospector: MetaData Visualization for Lakes of Open Data

Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne

Motivation

Visualization

How MDS works

Select the data type that you are interested in:

Adjust the weights according to when two datasets are similar
(higher weight places more importance to an attribute)

Filters

Distribution Summary

Acknowledgments

IEEE Vis 2020

Loch Prospector: MetaData Visualization for Lakes of Open Data

Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne

Motivation

Visualization

How MDS works

Select the data type that you are interested in:

Adjust the weights according to when two datasets are similar(higher weight places more importance to an attribute)

Filters

Distribution Summary

Acknowledgments

IEEE Vis 2020

Adjust the weights according to when two datasets are similar
(higher weight places more importance to an attribute)