Tail-recursion in Scala and Clojure

I recently read a blog post about issues with writing purely functional code in Scala. The post talked a bit about a paper addressing tail-recursion in Scala. The code example (taken from the paper) was an implementation of zip lists, called zipIndex, that was not properly tail-recursive and would result in a stack overflow for relatively small inputs. A later post from the author will look at ways of addressing the problem and I’m looking forward to reading about it.

I’m assuming the next post will do something similar to the tail-recursive factorial function.


def factorial(n: BigInt): BigInt = {
def fact(n: BigInt, result: BigInt): BigInt = {
if (n == 0) return result
else return fact(n – 1, result * n)
}
return fact(n, 1)
}

I’d write a zip list in much the same way:


// Scala analog to Clojure's loop-recur construct
def loopRecur[A](index: Int, coll: Seq[A], zippedList: List[(Int, A)]): List[(Int, A)] = {
if (coll.isEmpty) return zippedList
else return loopRecur(index + 1, coll.tail, zippedList ++ List((index, coll.head)))
}
// Given a sequecne of items, returns a List of tuples of the form (item index, item)
def zipIndex[A](coll: Seq[A]): List[(Int, A)] = {
return loopRecur(0, coll, List.empty[(Int, A)])
}

The recursion is handled with a function called loopRecur, the name of which was inspired by Clojure’s loop and recur forms. I’ve tested the above implementation of zipIndex with inputs of up to 100,000 elements. If this were in Clojure, the code would run much faster than the Scala.


(defn zipIndex [coll]
(loop [list coll
n 0
result []]
(if (empty? list)
result
(recur (rest list) (inc n) (conj result [n (first list)])))))

To test the Clojure, assuming you have access to a Clojure REPL, you can run (zipList (range 100000)) and then be amazed at how much faster the Clojure version runs compared to the Scala.

Probability Monad in Swift

I read about the probability monad in The Frequentist Approach to Probability a couple of years ago and thought it was pretty neat. I decided to make one in Swift, as an exercise in learning the language, after having done the same in Clojure and Java 8.

If you don’t know what a monad is, don’t worry, it’s not important for this post. Just know that it lets you do neat stuff with probability distributions, programmatically speaking. Ok… on to the code.

Protocols for Probabilities

I initially tried to do this with protocols. I had two. One that let us randomly sample values from a probability distribution


protocol Stochastic {
// the type of value stored in the distribution
associatedtype ValueType
// Sample a single value from the distribution
func get() -> ValueType
// Sample n values from the distribution
func sample(n: Int) -> [ValueType]
}

and another one that allowed for parameterization, like for the Poisson distribution.


protocol Parameterized {
associatedtype ParameterType
var p: ParameterType { get }
}

You may have noticed that there’s no protocol that allows you to map one distribution into another, which is what would make this into a monad. That’s because I had not yet figured out how to do it with structs or with classes. It’s easy to map one set of values drawn from a distribution into another set of values according to a function. But I really needed to create a new struct with a specific get() function. And then I remembered that functions were first class values in Swift!

Turns out you don’t need protocols or classes for this at all. You can do it all with a pretty simple struct!

Probability Distributions via Closures

With a single generic struct, we have everything we need for the probability monad. To convert one distribution into another, we need only pass in a function that maps elements of one distribution into elements of the other.


struct Distribution<A> {
var get: () -> A?
func sample(n: Int) -> [A] {
return (1…n).map { x in get()! }
}
func map<B>(f: A -> B) -> Distribution<B> {
var d = Distribution<B>(get: {() -> Optional<B> in return nil})
d.get = {
(Void) -> B in return f(self.get()!)
}
return d
}
//…
}

Let’s see what we can do if we start from the uniform distribution.


func randomDouble() -> Double {
return Double(arc4random()) / Double(UInt32.max)
}
let uniform = Distribution<Double>(get: randomDouble)
uniform.sample(5)

For starters, we can easily generate the true-false distribution by mapping the Uniform distribution with a function that generates the booleans from a double. From there, it’s straightforward to transform the true-false distribution into the Bernoulli distribution


let tf = uniform.map({ $0 < 0.7 })
tf.sample(10)
let bernoulli = tf.map({ $0 ? 1 : 0 })
bernoulli.sample(10)

By passing the appropriate transformation function or closure to the map function, one type of distribution can be converted into another.

BTW, to use the random number generators you’ll need to import Foundation. If you’re on a Mac, you could also import GameplayKit’s GKRandomSource. Or you can always use something from C.

Composing Distributions

If you want to compose distributions, the our struct needs an appropriate function: flatMap.


func flatMap<B>(f: A -> Distribution<B>) -> Distribution<B> {
var d = Distribution<B>(get: {() -> Optional<B> in return nil})
d.get = {
(Void) -> B in return f(self.get()!).get()!
}
return d
}

view raw

flatmap.swift

hosted with ❤ by GitHub

A standard example is the distribution you get from combining a pair of six-sided dice. We can start with a single die:


func nextInt(min min: Int, max: Int) -> ((Void) -> Int) {
assert(max > min)
return { () in return Int(arc4random_uniform(UInt32((max – min) + 1))) + min }
}
let die6 = Distribution<Int>(get: nextInt(min: 1, max: 6))
die6.sample(10)

Next, we can use the flatMap function to compose the distributions of a pair of six-sided dice by passing in a function that provides the behavior we need.


let dice = die6.flatMap({
(d1: Int) in return die6.map({ (d2: Int) in return d1 + d2 })
})
dice.sample(10)

Now that you’ve seen all the pieces, here’s the final form of the probability distribution struct:


struct Distribution<A> {
var get: () -> A?
func sample(n: Int) -> [A] {
return (1…n).map { x in get()! }
}
func map<B>(f: A -> B) -> Distribution<B> {
var d = Distribution<B>(get: {() -> Optional<B> in return nil})
d.get = {
(Void) -> B in return f(self.get()!)
}
return d
}
// FlatMaps one Distribution into another
func flatMap<B>(f: A -> Distribution<B>) -> Distribution<B> {
var d = Distribution<B>(get: {() -> Optional<B> in return nil})
d.get = {
(Void) -> B in return f(self.get()!).get()!
}
return d
}
let N = 10000
// probability of the predicate being true
func prob(predicate: A -> Bool) -> Double {
return Double(sample(N).filter(predicate).count) / Double(N)
}
// TODO: This function doesn't work just yet. It's returning the same number. Needs tail recursion too. Perhaps this is not possible in Swift?
// Samples from the new distribution so that the result matches the predicate
// func given(predicate: A -> Bool) -> Distribution<A> {
// var d: Distribution<A> = self
// let a = d.get()!
// d.get = { (Void) -> A in return predicate(a) ? a : d.get()! }
// return d
// }
func mean() -> Double {
return sample(N).reduce(0, combine: { $0 + Double(String($1))! }) / Double(N)
}
func variance() -> Double {
var sum: Double = 0
var sqrSum: Double = 0
for x in sample(N) {
let xx = Double(String(x))!
sum += xx
sqrSum += xx * xx
}
return (sqrSum – sum * sum / Double(N)) / Double(N-1)
}
func stdDev() -> Double {
return sqrt(self.variance())
}
}

Computing Statistics for a Distribution

You may have noticed a few functions for summary statistics (mean, etc) and probability computation. The most important function is prob, which lets you use predicates to ask questions of the distribution. There are a few basic examples of what you can do below.


uniform.mean() // Approximately 0.5
uniform.prob({ $0 > 0.7 }) // Approximately 0.3
dice.prob({ $0 == 7 }) // Approximately 1/6

If I can figure how to properly implement the given() function I’ll add that in a future post. I also want to be able to handle more interesting distributions, like that seen in a probabilistic graphical model.

All the code is on Github.

2015 Goals

Image courtesy of BOSS FIGHT.
Image courtesy of BOSS FIGHT.

As part of the 10 Days to a Better Blog workshop I wanted to revisit my goals for the year. I’ve never thought about what I wanted to accomplish in the year before. At least not concretely. A few weeks ago, I posted about what I want from 2015, but writing about the goals for this blog a few days ago has me wanting to look at my goals in more depth, not just the what, but the why.

I have four priorities for the year and several other things that I want to do. The priorities are to spend more time with my wife, intensify my practice of yoga, teach at least one public yoga class every month and a half, and write even better code at work.

Priorities

Spend more time with my wife

Spending more one-on-one time with my wife is one of the most important things that I want to do this year. Sharing our lives together has been such a beautiful experience these that I want to make sure I don’t shortchange our time together as I march towards making my goals a reality.

My wife and I do date nights sometimes, but we have a tendency to let the regularity of those taper off as our schedules fill up. I want us to do a date night every two weeks. I also want us to take a small trip at least once a year. Just us, not visiting family or friends. This is more important than usual since we’re both aiming to accomplish and do more this year than ever before.

Intensify my yoga practices

I practice for a few hours each day, which makes juggling yoga, work, play, and spending time with my wife challenging at times. Now that I’m teaching Isha Hatha Yoga, deepening my own practice is more important than ever. To that end, every three months I want to take three or four days off to focus exclusively on my own practice. It will make a big difference in all aspects of my life.

Teach at least one public yoga class each month

Practicing Isha Yoga transformed how I experience life and led me to my wife. (Rhyming is unintentional). Practicing Isha Hatha Yoga has been transforming my body and is steadily bring it to state of phenomenal ease. My body isn’t an obstacle anymore, it cooperates with me no matter what I wish to do. Since it’s worked so well for me and many others, I’m willing to share it with whomever wishes to learn. I figure that teaching a public class every six weeks is sustainable with my already working a fulltime job.

Other goals

Do something fun and challenging with Clojure

In 2013, I started working on a Clojure library for learning probabilistic graphical models (PGMs), Watershed. I’d like to work on it some more and get it to the point where it can be used for a variety of PGMs. Also, working on it was so much fun. I’ve got a list of things to add to it, including a few additional probability distributions and Bayesian Networks for a user to play with out of the box. I also want to make it easy for a user to roll their own PGM.

I also want to play with Onyx, which is a distributed computation system in Clojure. It looks interesting and powerful and could be quite useful at work.

Learn deep and shallow learning

I really want to dive into these topics but haven’t been able to make the time. It’s been very much on the back burner for the past year.

Spend more time outdoors

Mustn’t spend too much time with my computer. I’ve lost count of the number of times where I go to the office and don’t go outside until it’s time to go home. So I want to get outdoors a little more frequently.

Lastly, I want to keep in better contact with out-of-town friends, finish learning Swift, and work on my iOS app.

Now, how to track progress? Most of these goals are pass/fail. I either do these things or I don’t. I’ll keep a checklist of them all though. It will be fun to check things off as the days pass.

Clojure/conj 2013: Debriefing

So, I’ve been using/learning Clojure for about six months now and attended my first Clojure/conj. It’s been a lot of fun. Since I’m so new to Clojure, there was plenty that I wasn’t able to grok. But even in those cases I feel like I could see there was a lot of potential in whatever thing the speaker was sharing with us.

Now that I’m home I wanted to debrief and make note of the things I want to look at more closely. Here goes:

  • data.fressian – a Clojure wrapper around Fressian, which is an extensible binary data notation (not sure what the difference is between a format and a notation), with a really simple API. Stuart Halloway presented this. It sounds really easy to use and is designed to be language-agnostic. It would help make sure that data generated from Clojure code would not require that someone use Clojure to read it. I’ll have to look into this.

  • Prismatic’s Schema – Aria Haghighi talked about this. It seems that Schema doesn’t explicity add types, but it does allow one to sort of do type checking. And it can do a whole lot more than that, like compile down to Objective-C and make it easier to see what the inputs to a function actually are. This doesn’t appear to be a competitor to core.typed, but seems like it could be used as a complement, since it allows you to add much more information than just the types. It could be interesting to have some kind of documentation generated from the schema and existing docstrings.

  • core.logic and core.match – core.logic allows you to do logic programming in Clojure and core.match adds pattern matching (exciting!). I don’t know much about logic programming, but after looking at the examples on core.logic wiki I’d like to learn more. I wonder if it might be possible to use it to implement a markov logic network using core.logic. Pattern matching in Scala has been wonderful, so I’m glad to see that it’s available in Clojure as well.

  • core.async – We got a walkthrough of core.async. I won’t pretend to understand it. That said, the examples and use cases in the talk were pretty clear. If/As we get into more real-time analytics at DR it could be pretty useful. We’ll have to see. Want to learn more about it though, especially since I’m learning about web development.

  • Cascalog 2.0 – Sam Ritchie gave a great talk on Cascalog 2.0 (slides), which is a refactoring of the codebase that lets you run Cascalog on more than just Hadoop and for more than just “big data”. It could run on Storm, Spark (think you’ll have to roll your own generator?), or more. I will definitely be using this.

  • Clojurescript – This wasn’t the topic of a talk, but it came up in several talks. I’ve been wanting to try the javascript visualization library D3. If I learn some Clojurescript then this shouldn’t be too difficult, especially given that there are at least two libraries that have looked at this issue: Strokes and C2. Strokes lets you use D3 from Clojurescript, while C2 was inspired by D3. Clojure/conj let each attendee

So, yeah, it was pretty interesting.

Clojure and Prismatic’s Graph Library

I’ve been a fan and user of Prismatic for a while now and have recently started to get acquainted with Clojure (I learned Common Lisp in 2007/8 but haven’t touched it in years). So, I finally read Prismatic’s blog post about their Graph library, which allowed them to decrease the complexity of their backend. Graph sounds pretty darned cool and I think the same kind of thinking could be useful in some of my work at Digital Reasoning, since we’ve got some pretty complex software as well. Unfortunately, I’ll have to learn a bit more Clojure before I can really grok whether my instinct is correct or not. But, at least it should be a fun journey. Though, since Scala is also functional, it would be instructive to see about applying the same thinking there.

One-dimensional Particle Physics and BubbleSort

I recently perused Donald Knuth’s website and saw that he once had a lecture called Constructing bubblesort at random: one-dimensional particle physics. As a physicist and software engineer, I naturally found this interesting. So, I thought I’d play with it.

The code

One of the papers Knuth links to on the site for his talk gives the following instructions:

Start with infinitely many 1s followed by infinitely many 0s; then randomly interchange adjacent elements that are out of order.

The code in my Github repo is modeled after those instructions and what is in Knuth’s lecture. Basically, start with a chain of 1s and 0s, as specified below, sort them at random. It also tells you:

  • How many random swaps it takes to sort the list
  • The distribution of 1s and 0s (histograms), in case you want to plot them

Ok, great. But how does this relate to particle physics?

The physics connection

The physics connection becomes a bit more clear if you imagine a 1D lattice, where the 1s are particles and the 0s are vacancies. Now, pick any site k and ask how many particles are to the right of site k. Call the result R_k. The number of 0s to the left of site k is L_k. Note that if k=N/2, then R_k=L_k.

It might be interesting to track how R_k and L_k change with “time”. Here, time means the number of random swaps needed to sort the list. For a long lattice, it might also be interesting to compute the distribution of 1s or 0s as a function of lattice site index. Then we could watch it change from all 1s on the left to all 1s on the right. Might look like a Heaviside Step Function inverting bit by bit.

Knuth specifically connects this toy problem to Young Tableaux, which are a way of describing the representations of certain groups (some of which are relevant to particle physics). My code doesn’t really go into any of this. I thought the things in the above paragraphs were more interesting.