The Physics of PageRank

Some of you may know that the physicist Kenneth Wilson died on June 15^{th}. His work made a huge impact in physics. Not just particle physics, but condensed matter physics as well. No part of physics which is formulated as a quantum field theory was left untouched. So, what did he do? And what does it have to do with PageRank?

Wilson’s Magnum Opus

In order for theorists to actually calculate anything using quantum field theory (QFT), renormalization theory was needed. Feynman, Tomonoga, and Schwinger each independently introduced renormalization into QFT. Freeman Dyson came along afterwards and showed that the three formalisms mentioned above were actually the same thing. But the theory wasn’t on firm theoretical ground before Wilson’s work. Kenneth Wilson introduced the Renormalization Group (RG). And with the RG, we could calculate things with QFT in a systematic, conceptually coherent manner.

(For more on the history, you can read a great blog post about it at Quantum Frontiers.)

When I was in graduate school, the RG was easily one of the most interesting aspects of theoretical physics. I loved that it provided us with a clear theory of critical phenomena and phase transitions as well.

What does this have to do with PageRank though? Here’s a rough outline of the relationship…

The Renormalization Group

Imagine a system described by a vector of parameters, K, and consider a transformation that shifts the parameters to some K'. We can describe this situation with the transformation equation


where \cal{R} is the RG operator. If the transformation operator is applied repeatedly, then we end up with a sequence of parameter vectors K, K', K'',\ldots:


where n=0,1,2,\dots and K^{(0)} is our initial K.

If we plotted the initial K^{(0)} and plotted results of the successive transformations, we would have a sequence of points connecting K^{(0)} and K^{(n)}. The sequence of points is the RG flow and is represented by the above equation.

It’s possible that we may have fixed points in our parameter space. A fixed point K^{\ast} that remains unchanged when the transformation is applied:

Note that I’m glossing over a lot of detail.

The PageRank Algorithm

At its core, PageRank is an algorithm which answers the following question:

If we select a site in a network or graph at random, what is the likelihood of reaching any other given site?

PageRank determines the answer in an iterative fashion, using equations that look a lot like the RG equations. The iterative algorithm proceeds until the probabilities on the sites stop changing. This is our RG fixed-point.

You can learn more about PageRank and the iterative ranking algorithms which came before in a 2010 arXiv article PageRank: Standing on the shoulders of giants.

I’ll really have to look into this in more depth sometime, especially where the fixed-points are concerned.

Clojure and Prismatic’s Graph Library

I’ve been a fan and user of Prismatic for a while now and have recently started to get acquainted with Clojure (I learned Common Lisp in 2007/8 but haven’t touched it in years). So, I finally read Prismatic’s blog post about their Graph library, which allowed them to decrease the complexity of their backend. Graph sounds pretty darned cool and I think the same kind of thinking could be useful in some of my work at Digital Reasoning, since we’ve got some pretty complex software as well. Unfortunately, I’ll have to learn a bit more Clojure before I can really grok whether my instinct is correct or not. But, at least it should be a fun journey. Though, since Scala is also functional, it would be instructive to see about applying the same thinking there.

One-dimensional Particle Physics and BubbleSort

I recently perused Donald Knuth’s website and saw that he once had a lecture called Constructing bubblesort at random: one-dimensional particle physics. As a physicist and software engineer, I naturally found this interesting. So, I thought I’d play with it.

The code

One of the papers Knuth links to on the site for his talk gives the following instructions:

Start with infinitely many 1s followed by infinitely many 0s; then randomly interchange adjacent elements that are out of order.

The code in my Github repo is modeled after those instructions and what is in Knuth’s lecture. Basically, start with a chain of 1s and 0s, as specified below, sort them at random. It also tells you:

  • How many random swaps it takes to sort the list
  • The distribution of 1s and 0s (histograms), in case you want to plot them

Ok, great. But how does this relate to particle physics?

The physics connection

The physics connection becomes a bit more clear if you imagine a 1D lattice, where the 1s are particles and the 0s are vacancies. Now, pick any site k and ask how many particles are to the right of site k. Call the result R_k. The number of 0s to the left of site k is L_k. Note that if k=N/2, then R_k=L_k.

It might be interesting to track how R_k and L_k change with “time”. Here, time means the number of random swaps needed to sort the list. For a long lattice, it might also be interesting to compute the distribution of 1s or 0s as a function of lattice site index. Then we could watch it change from all 1s on the left to all 1s on the right. Might look like a Heaviside Step Function inverting bit by bit.

Knuth specifically connects this toy problem to Young Tableaux, which are a way of describing the representations of certain groups (some of which are relevant to particle physics). My code doesn’t really go into any of this. I thought the things in the above paragraphs were more interesting.