Shiny manifolds

A friend of mine made a pretty slick app the other day using Shiny, a web app platform for data visualization made by the makers of RStudio. I’ve wanted to play around more with web dev and data viz, so I decided to try my hand at Shiny. But I needed something to work with.

I decided to go with the diffusion maps manifold learning algorithm, which we looked at in-depth in an awesome course I took last year called Elements of Mathematical Machine Learning. With the general objective of learning the underlying parameters of data, we started by looking at the simple image dataset below.

Spinning bunny data set. Han Solo.

The underlying parameter, the thing generating differences between the image data points, is simple: the rotation of the bunny. So, what we ideally want is to represent each image by which way the bunny is oriented. But that seems crazy; we would need to map our RGB image data from 32,400-dimensional space (linearizing the pixels in each RGB channel) to a place on the unit circle in 2-dimensional space. Well, diffusion maps can do that and a whole lot more.

The app I made lets you pick a manifold dataset and see the representation of the data by the first two diffusion maps under different parameter settings. Now is a good time to actually post the link so you can check out the Shiny app: go here! The code lives in my shiny-manifold repo.

A bit about the Shiny app

There are three datasets currently available:

If you want to see the manifolds, you can use the Plot View menu. The bunny images are shown as an animation, while the others are scatter plots. Supposedly, shinyRGL is a neat package to get 3D plots into Shiny. However, the current build appears to be broken and all I can do right now is throw in some static images. When/if it’s fixed, I’ll update the app.

Once you decide on a dataset, you can fiddle with the diffusion map parameters to see how well you can fit the data. Ideally, you can represent all of the manifolds in two dimensions, so we’re only looking at the first two diffusion maps (analagous to the first two principal components in PCA). Here are some things to try out and think about. If you want to brush up on manifold learning first, skip to the next section.

A bit about diffusion maps

The diffusion map algorithm is very cool, and its theoretical backing is really interesting. If you really want to understand it, check out Stephane Lafon’s dissertation. I’ll give a small taste of it here, which is hopefully enough to help you think about what’s happening as you play with the app.

Most spectral manifold learning algorithms (like Isomap or locally linear embedding) boil down to constructing a kernel which preserves some property of the data, and then performing kernel principal component analysis. Diffusion maps almost falls into this paradigm, up to some simple transforms. At the heart of the diffusion map algorithm is the decomposition of the diffusion kernel. This kernel gives pairwise diffusion distances, which is a random walk distance on a graph formed by connecting the data points. The idea here is to approximate the Laplace-Beltrami operator on the manifold. Its spectral properties give the instrinsic geometry of the manifold and thus can be used for dimensionality reduction. And remember, that’s exactly what we’re doing: defining a kernel and taking its singular value decomposition.

On to the diffusion map parameters. Some clever spectral graph theory allows us to construct the diffusion kernel as a weighted transform of a standard Gaussian kernel matrix. The kernel width is one of the parameters that needs to be set. As we increase the kernel width, we can think of the graph being more and more connected such that a walk can diffuse quickly between faraway points. In the kernel construction, we also need to adjust for the non-uniform sampling of the manifold in any real dataset. Thus, we need to set the sampling density influence. The changes here are more subtle and more technical. If you aren’t familiar with diffusion maps, see if you can get a sense of what this parameter does by playing with the app.