Fork me on GitHub

# Motivation

Research on managing open data requires an appropriate collection of such datasets in order to test novel algorithms and techniques. Gaining insights about their properties can also inform the process of designing algorithms or benchmarks. Currently, the interface of most open data portals is limited in that regard.

# Visualization

Here we provide a visualization that uses Multidimensional Scaling (MDS) to depict datasets from https://www.data.gov/. We focus on four structural (metadata) attributes of the datasets:

• The number of rows
• The number of columns
• The percentage of null values
• The percentage of unique values (those that appear only once in the column)

We also show (on the right) the distribution of these 4 attributes for each selection of datasets.

# How MDS works

Multidimensional Scaling can help us encode many attributes in the same visualization. For each pair of datasets d1,d2 we calculate their (weighted) Euclidean distance:

\begin{align} dist(d_1, d_2) = & [w_r(d_1.rows-d_2.rows)^2 + w_c(d_1.cols-d_2.cols)^2 + \\& w_n(d_1.nulls-d_2.nulls)^2+ w_u(d_1.unqs-d_2.unqs)^2]^{1/2} \end{align}

An MDS algorithm then places the points (i.e., datasets) in a 2-dimensional space such that these distances are preserved as much as possible.

#### Adjust the weights according to when two datasets are similar(higher weight places more importance to an attribute)

 Number of rows: Number of columns (Total): Null Values (Total): Unique Values (Total):

### Filters

Number of Rows

Number of Categorical Columns

Number of Numerical Columns

Percentage of Unique Values (Categorical)

Percentage of Null Values (Categorical)

Percentage of Unique Values (Numerical)

Percentage of Null Values (Numerical)

### Distribution Summary

MDS Plot: Datasets are embedded into 2D-space such that datasets that are closer together are more similar.

# IEEE Vis 2020

Paper and Supplementary Materials can be found at: https://osf.io/zkxv9/

Watch a short 30s video teaser:

or a 7 minute demo video: