Comments on the Apple Watch

My Apple Watch arrived yesterday. I chose the 42mm stainless steel model, with the black leather loop. It looks great and feels comfortable on my wrist. The only thing that I find a bit odd is that I can sometimes hear the magnets in the band moving against each other when I move my wrist. Not a deal-breaker, just a minor oddity.

I wanted the Apple watch primarily for the fitness tracking aspects. I was thrilled when I heard it had a gyroscope, accelerometer, and heartbeat sensor. I was less thrilled when I learned that Apple hasn’t opened up sensor access in WatchKit. I hope they’ll change their minds about that in the future. I have (or had) an app in mind that needs such data. Also, it would be interesting to look at gyroscope or accelerometer data from a session of Angamardana, which is the most intense workout I’ve ever encountered.

I do wish the watch’s Workout app would allow me to rename the “Other” exercise to something of my choosing or at least rename it in the Health app on my iPhone. The ironic thing is that if you manually add workout data in the workout section of the Health app, there are a myriad of workout types. The Health app does allow data exports though, so that’s nice.

Markov Chain Text Generation – Part 1

This is the first in a series of posts about random text generation algorithms. We’ll cover Markov chain text generators and context-free grammars, which I’ve wanted to experiment with for some time. If you’re not familiar with random text generators, then you may be wondering what I’m talking about and why the heck I want to build one.

Why bother?

I’m interested in this for a few reasons: it forces me to go deeper into Clojure (my language of choice for this project) and to learn Clojurescript (for the webapp portion), I’ll learn a lot, and it will be loads of fun. Also, random text generators keep popping up in the news because they’ve been used to generate fake research papers in computer science, math, and physics that some have published, often in conference proceedings and on the e-print archive. Some fake papers even make it into peer-reviewed journals, to shagrin of academia and the publishing industry. This has highlighted flaws in the peer-review process.

Also, there’s a random text generator based off of Deepak Chopra’s twitter stream. The results are pretty interesting.

What’s a random text generator?

A random text generator is a program that spits out random snippets of text, be they phrases, sentences, paragraphs, or larger things. They can choose individual letters, words, or phrases at random, depending on how they’re designed. Since we’ll focus on Markov chain text generators, this post starts at the beginning, with Markov chains, also called Markov models.

What’s a Markov chain?

We want to generate reasonable, sensible sentences, at random. To do this, we need to know, given an initial word or phrase, which words or phrases are likely to follow. We can model these probabilities by studying existing text.

Consider a sentence as a sequence of N words, w_1, \ldots, w_N. Markov theory says that any given position in the sequence encodes all the information necessary to predict items that come later in the sequence. Or, in mathematical terms, we can write the probability of the sequence as:

P(w_1, \ldots, w_N)=P(w_1)\prod_{n=2}^N P(w_n|w_{n-1})

The term on the right-hand side of the above equation tells us the probability that n-1^{st} word follows the n^{th} word. In Markov chain language, this is called a transition probability. This terminology comes from viewing a Markov chain as a sequence of transitions between states of a system. These probabilities are what we need to calculate.

Calculating Transition Probabilities

To predict which word is likely to follow a given word, we need to know the transition probabilities, which we’ll write as P_{n,n-1} for short. There are several ways to calculate them. This is where language modeling comes into play. Each way to calculate the probabilities corresponds to certain assumptions. We’ll look at three such models.

Unigram model

The simplest assumption is that the next word is independent of previous word, or P_{n,n-1}=P(w_n). This is like selecting the next word entirely at random.

Bigram model

The next simplest assumption is that the next word depends on the previous word, or P_{n,n-1}=P(w_n|w_{n-1}). This is an example of a first-order Markov model.

Trigram model

Also called a second-order Markov model, here we assume the next word depends on the previous two words, or P_{n,n-1}=P(w_n|w_{n-1}, w_{n-2}).

We can illustrate these or any Markov models as directed acyclic graphs. The bigram and trigram models are shown below.

DAG representation of a first and second-order Markov model.
DAG representation of a first and second-order Markov model.

What does a third-order Markov model assume? Looking at Markov models from the state transition perspective, what do these assumptions say about how much the current state depends on the past?

Markov chains have many applications. You’ve already heard of random text generation, but two more that you’ve definitely come across are sentence completion and word completion. Think predictive typing apps, search query suggestions, and PageRank itself (see The Physics of PageRank for more on this).