Last week, my buddies and I had a few hours between the last football game of the day and that evening’s entertainment. So we decided to make some entertainment for ourselves in the form of a drinking game called ride the bus. A quick Google search suggests that there are maybe 500 different games called “ride the bus”. But I’ll explain the version we played. There are a few initial rounds that produce a loser, who then has to ride the bus. Four rows of four cards each are dealt face down, and there’s a task for each row

Row 1 (R1): Select a card, guess whether it’s black or red, and turn it over. Pretty straight forward.

Row 2 (R2): Select a card, guess whether it’s higher or lower than the card turned over in R1, and turn that card over. No suit ordering, and ties go to the runner (if the first card was an 8 and you turn over an 8, you win either way).

Row 3 (R3): Select a card, guess whether it’s between or outside of the first two cards you turn over, and turn that card over. Again, no suit ordering and ties go to the runner.

Row 4 (R4): Select a card, guess its suit, and turn it over. That’s it.

Here’s why it’s a game. If your guess is wrong on R1, the card you turned over is discarded, and new one is dealt in its place, you have to take 2 drinks, and then you start over. If you are wrong on R2, you replace both cards you turned over, take 4 drinks, and start over. If you’re wrong on R3, same thing happens but you take 6 drinks. If you’re wrong on R4, same thing happens but you’re taking 8 drinks. So if you get all four rows right on the first round, you don’t have to drink anything. In the worst case, you never get four in a row right, and you have to take 2 drinks per card for a nice total of 104 sips of awful beer. Mercy rule usually gets called before then.

Statistics of beer drinking

So the first question that came to mind was, “how many drinks can I expect to take?” So you start by computing the probability of succeeding in each row on a random guess: 0.5 for R1, a little better than 0.5 for R2 and R3, and 0.25 for R4. But then I stopped, because that’s a very homework-y way of doing this.

People are really into visualizing lots of data. No big surprise there. One algorithm that pops up a lot no matter what field you’re in is t-SNE (t-Distributed Stochastic Neighbor Embedding). For mapping to 2 and 3 dimensions (i.e. those you can visualize), t-SNE handles grouping better than just about any other algorithm I’ve seen. I’ve seen a lot of demonstrations of visualizations made by t-SNE, but I wanted a better idea of what was going on under the hood. I’ve heard a lot of people describe it as trying to preserve the pairwise distances between the original data and embedded data. But just glancing at the algorithm suggests that’s not really what’s happening at all.

So just for myself, I made this animation comparing t-SNE to the random projection method. The code is here (you need to download the Python t-SNE implementation and the MNIST data), and I put descriptions of the algorithms further down. Random projections and t-SNE are about as different as you can get (between dimensionality reduction techniques, anyways). Random projections is linear, t-SNE is non-linear. Random projections sits back and lets the Johnson-Lindenstrauss Lemma do it’s thing, t-SNE aggressively optimizes the KL divergence cost function. But I didn’t realize they’d be this different.

Here’s a “quick” post on looking for structure in a data set. There are a lot of different approaches people are taking these days, so I figured I could do a tour of some diverse ones that give relatively interpretable results. Specifically, let’s see what three people would do:

Mathilda, an applied mathematician’s applied mathematician

Griff, a graph theorist, and

Deepak, a deep learning person

To make it a little more interesting, let’s use some political data. Specifically, let’s take all of the 400+ House representatives as our observations and see how they voted on the ten major bills (our features) as defined by the New York Times (as of September, 2015). The observations have some kind of labeling: Democrat or Republican. But I’m really not sure how good it is. Does it represent the structure of the data very well? Are we really a two party system, or is there something else going on?

I don’t know the answers to these questions, so I’m going to write this as a procedural type of thing. I’ll show the code for how I’m trying things as I go. You can also find it here.

This post is on my undergrad research with the Image Processing and Analysis Group at Yale, which I’m presenting at SPIE Medical Imaging 2016. The goal is to predict the location of cancer within the prostate using multiparametric magnetic resonance imaging.

###A crash course in prostate cancer diagnosis

The bad: The current, run-of-the-mill way to diagnose prostate cancer being used at the vast majority of hospitals is an ultrasound-guided needle biopsy. This is not a great system. Tumors don’t really show up in ultrasound, so the biopsy is performed in an undirected fashion using a grid over the prostate. Moreover, this tells us nothing about the size or shape of tumors when we get back a positive core.

The good: By changing the way the magnets behave during an MRI scan, we get multiparametric MRI (mpMRI): images of the same region of the body with different medical significances. We call these different image types parameter maps or channels, and an example is down below. Unlike ultrasound, we can see tumors using mpMRI. And some really awesome research on ultrasound/MRI fusion imaging allows us line up incoming ultrasound images in real-time during a procedure with MR images we’ve already taken (since they can’t be done in real-time). That means we can use regions suspected of being cancer (ROIs) on an MR image as targets for the biopsy procedure. This leads to intelligent biopsies and intelligent treatment, since we know something about the tumors ahead of time.

The goal: To identify ROIs on MR images, highly skilled radiologists need to analyze the MR images in-depth. MRI is done in 3D by taking 2D image slices at different heights through the pelvic region (where the prostate is). So we may have 80 images per each mpMRI channel. Most hospitals don’t have the resources for this. But using supervised learning, we might be able to identify the ROIs automatically.

A friend of mine made a pretty slick app the other day using Shiny, a web app platform for data visualization made by the makers of RStudio. I’ve wanted to play around more with web dev and data viz, so I decided to try my hand at Shiny. But I needed something to work with.

I decided to go with the diffusion maps manifold learning algorithm, which we looked at in-depth in an awesome course I took last year called Elements of Mathematical Machine Learning. With the general objective of learning the underlying parameters of data, we started by looking at the simple image dataset below.

The underlying parameter, the thing generating differences between the image data points, is simple: the rotation of the bunny. So, what we ideally want is to represent each image by which way the bunny is oriented. But that seems crazy; we would need to map our RGB image data from 32,400-dimensional space (linearizing the pixels in each RGB channel) to a place on the unit circle in 2-dimensional space. Well, diffusion maps can do that and a whole lot more.

The app I made lets you pick a manifold dataset and see the representation of the data by the first two diffusion maps under different parameter settings. Now is a good time to actually post the link so you can check out the Shiny app: go here! The code lives in my shiny-manifold repo.