Research on managing open data requires an appropriate collection of such datasets in order to test novel algorithms and techniques. Gaining insights about their properties can also inform the process of designing algorithms or benchmarks. Currently, the interface of most open data portals is limited in that regard.

Here we provide a visualization that uses Multidimensional Scaling (MDS) to depict datasets from https://www.data.gov/.
We focus on four *structural* (metadata) attributes of the datasets:

- The number of rows
- The number of columns
- The percentage of null values
- The percentage of unique values (those that appear only once in the column)

We also show (on the right) the distribution of these 4 attributes for each selection of datasets.

Multidimensional Scaling can help us encode many attributes in the same visualization. For each pair of datasets d_{1},d_{2} we calculate their (weighted) Euclidean distance:

\( \begin{align} dist(d_1, d_2) = & [w_r(d_1.rows-d_2.rows)^2 + w_c(d_1.cols-d_2.cols)^2 + \\& w_n(d_1.nulls-d_2.nulls)^2+ w_u(d_1.unqs-d_2.unqs)^2]^{1/2} \end{align}\)

An MDS algorithm then places the points (i.e., datasets) in a 2-dimensional space such that these distances are preserved as much as possible.

(higher weight places more importance to an attribute)

Number of Rows

Number of Categorical Columns

Number of Numerical Columns

Percentage of Unique Values (Categorical)

Percentage of Null Values (Categorical)

Percentage of Unique Values (Numerical)

Percentage of Null Values (Numerical)

MDS Plot: Datasets are embedded into 2D-space such that datasets that are closer together are more similar.

- D3: Data-Driven Documents by Mike Bostock.
- MathJax
- Pure CSS responsive "Fork me on GitHub" ribbon by Chris Heilmann.
- An implementation of MDS by Ben Frederickson.
- Reusable Charts for Scatterplot by Mike Bostock.
- Sliders for filters.
- Histogram chart by Yan Holtz and also by Mike Bostock.
- Cooperative brush and tooltip by Matthieu Viry.

Paper and Supplementary Materials can be found at: https://osf.io/zkxv9/

Watch a short 30s video teaser:

or a 7 minute demo video: