Functional Patterns

  • Collections
  • Combinators
  • Compositions
  • Laziness
  • Performance

Collections

Overview

  • Under package scala.collection
  • Sub-package mutable and immutable
  • Immutable List, Set, Map, ... imported by default
import scala.collection.mutable

val s1 = Set("A", "B")          // default immutable version
val s2 = mutable.Set("A", "B")  // imported mutable version

s1.add("C")  // error, no such method
s2.add("C")  // true, success!

Collections

Abstract class hierarchy

collections
  • Set/MapHashSet/HashMap
  • SortedSet/SortedMapTreeSet/TreeMap
  • IndexedSeqVector, Array, String, ...
  • LinearSeqList, Stream, Queue, Stack, ...

Collections

Unified API

import scala.collection.immutable._

// some common collection interfaces
// can traverse, iterate, access linearly (head/tail)
Traversable(1, 2, 3)      // creates a List by default
Iterable("x", "y", "z")   // also List
LinearSeq(1.0, 2.0, 3.0)  // List again
// can also random access
IndexedSeq(1.0, 2.0)      // creates a Vector by default

Set("a", "b", "c")
Map("a" -> 1, "b" -> 2, "c" -> 3)
// more on these later
List(1, 2, 3).map(_ + 1)
Set(1, 2, 3).map(_ + 1)

Collections

Lists

  • Fundamental data structure in FP
  • Linked list with head (item) and tail (rest)
val list = List(1, 2, 3)
list.head               // element, 1
list.tail               // rest, List(2, 3)
list.tail.tail.tail     // List()
// Haskell style syntax
1 :: 2 :: Nil           // head :: (head :: tail), right associative
def getHead(list: List[Int]) = list match {
  case 1 :: _ => "one"  // decompose list in pattern matching
  case 2 :: _ => "two"
  case _ => "many"
}

getHead(list)
getHead(list.tail)
getHead(list.tail.tail)

Collections

Concatenation

// ++ for 2 collections
List(1, 2) ++ List(3, 4)
Set(1, 2) ++ Set(2, 3)
Map("A" -> 1, "B" -> 2) ++ Map("C" -> 3)
// +:/:+ for preppending/appending an element in Seq (List, Vector, ...)
val list = List(1, 2, 3)
0 +: list  // colon facing the collection
list :+ 4  // linked list, append is O(n)!
// +/- for adding/substracting an element in set/map
val set = Set(1, 2, 3)
set + 4 - 3

val map = Map("A" -> 1, "B" -> 2, "C" -> 3)
map + ("D" -> 4) - "A"

map + ("A" -> 10)  // overwrites existing key

Collections

Set/Map as functions

val s = Set(1, 3, 5)
(s(1), s(2))     // (true, false), actually s.apply(1) & s.apply(2)
(1 to 5).map(s)  // Vector(true, false, true, false, true)
val m = Map(1 -> "a", 2 -> "b").withDefaultValue(null)
(m(1), m(3))     // (a, null), actually m.apply(1) & m.apply(3)
(1 to 5).map(m)  // Vector(a, b, null, null, null)

Collections

Ranges

Range(0, 10)     // Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Range(0, 10, 2)  // Range(0, 2, 4, 6, 8)

0 until 10       // Range(0, 10)
0 until 10 by 2  // Range(0, 10, 2)

1 to 10          // Range(1, 11)
1 to 10 by 2     // Range(1, 11, 2)

(1 to 10).toList
Commonly used in for comprehensions

Collections

Java interoperability

val s = List(1, 2, 3)
val j = java.util.Collections.singletonList(1)
// implicits for pimp my library pattern, more later
import scala.collection.JavaConverters._
s.asJava   // extra pimped method for scala.collection.List
j.asScala  // extra pimped method for java.util.List
// implicit conversion methods
import scala.collection.JavaConversions._

def countS(s: Seq[Int]) = s.size
def countJ(j: java.util.List[Int]) = j.size
countS(j)  // java.util.List -> Seq
countJ(s)  // List -> java.util.List

Collections

Exercises

  • Write a recurive function to print elements in a List[Int]
  • Concatenate a List[Int] and Set[Int]
  • How about a Set[Int] and keys in a Map[Int, String]
  • Given s: Set[String] and m: Map[String, Int], build a new Map with only keys from s

Combinators

Basics

  • Takes a function as argument
  • Applies on every element in the collection
  • Returns a new collection/single value
val l1 = List(1, 2, 3)
val l2 = l1.map(_ + 1)  // new List with +1 applied to every element
// break down
l1.map { x =>
  val r = x + 1
  println(s"$x + 1 = $r")  // side-effect
  r
}
// like map, but for side-effects only
l1.foreach(println)  // returns Unit
Also see Lambdas and Streams in Java 8 Libraries

Combinators

Slicing

val l1 = List(1, 2, 3, 4, 5)
val l2 = l1.filter(_ > 2)

l1.filter { x =>
  val r = x > 2
  println(x + " " + (if (r) "keep" else "drop"))
  r
}
l1.drop(1)  // drop first item
l1.drop(2)  // drop first 2 items
l1.dropWhile(_ % 2 == 1)  // drop first X consecutive even numbers
l1.take(2)
l1.takeWhile(_ % 2 == 1)

// also see takeRight and dropRight

Combinators

Combining/splitting

val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3)

l1.zip(l2)        // -> List[(String, Int)]
l1.zipWithIndex   // -> List[(String, Int)], i.e. Python enumerate
// Map entries are Tuple2:s (Pair)
l1.zip(l2).toMap                 // -> Map[String, Int]
Map("a" -> 1, "b" -> 2).toList   // -> List[(String, Int)]
(1 to 10).partition(_ % 2 == 0)  // -> (List[Int], List[Int])

Combinators

Flatten out

// List[List[Int]] -> List[Int]
List(List(1, 2, 3), List(4, 5, 6), List()).flatten  // empty member out
// List[Map[String, Int]] -> List[(String, Int)]
List(Map("A" -> 1, "B"-> 2), Map("C" -> 3, "D" -> 4)).flatten
val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")

lyrics.flatMap(_.split(" "))       // flatMap = map then flatten
lyrics.map(_.split(" ")).flatten   // equivalent
// break down
lyrics.flatMap { line =>
  val tokens = line.split(" ")     // split returns Array[String]
  println(tokens.mkString(" | "))  // and Java Array has ugly .toString()
  tokens
}

Combinators

Reduce

// reduce function is (T, T) => T for List[T]
List(2.0, 3.0, 4.0).reduce(math.pow)
// ((2.0 ^ 3.0) ^ 4.0)
Combining values into a single result of the same type
// break down
List(2.0, 3.0, 4.0).reduce { (x, y) =>
  val r = math.pow(x, y)
  println(s"$x ^ $y = $r")
  r
}
List("A", "B", "C").reduce("(%s, %s)".format(_, _))
List("A", "B", "C").reduceRight("(%s, %s)".format(_, _))

Combinators

Reduce visualized

☜ and ☞

reduction
Parallelizable if fn is associative and commutative (Monoid!)
More on this later...

Combinators

Fold

List(1, 1, 1, 2, 2, 3, 4).foldLeft(Set[Int]())(_ + _)  // Set[Int] + Int
Combining values into an initial value (of possibly a different type)
val bytes = List(222, 173, 190, 239)  // List[Int]

// 2 arguments, start value for the accumulator: String
// and binary operator: (String, Int) => String
// operator folds each item into accumulator
bytes.foldLeft("0x")(_ + _.toHexString.toUpperCase)
// break down
bytes.foldLeft("0x") { (str, byte) =>
  println("byte = \"%d\", str = %s".format(byte, str))
  str + byte.toHexString.toUpperCase
}
  • Aggregate elements into single value of a different type
  • e.g. Strings → Bloom filter, feature vectors → SGD
  • Also see foldRight

Combinators

Fold visualized

☜ and ☞

foldl foldr Folding a linked-list of 1 → 2 → 3 → 4 → 5 → []
Reduce is a special form of fold

Combinators

Scan

val range = (1 to 10)
val partials = range.scanLeft("List")(_ + ":" + _)
partials.foreach(println)
  • Similar to fold but retains partial results
  • Also see scanRight

Combinators

Grouping

val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")
val tokens = lyrics.flatMap(_.split(" "))  // List[String]

// use :paste mode in Scala console
tokens
  .groupBy(identity)            // Map[String, List[String]]
  .map(p => (p._1, p._2.size))  // Map[String, Int]
  .toVector                     // Vector[(String, Int)]
  .sortBy(_._2)                 // sort by second item
  .reverse
  .take(3)

// longest token
tokens.groupBy(_.length).toVector.sortBy(_._1).reverse.take(1)

Combinators

Exercises

  • Implement factorial with Range and reduce
  • Find which line in lyrics has most number of tokens
  • Count tokens by folding into Map[String, Int]
  • Build a histogram of token length (Map[Int, Int])

Compositions

Functions are objects

object addOne extends Function1[Int, Int] {
  def apply(x: Int): Int = x + 1
}

object add extends Function2[Int, Int, Int] {
  def apply(x: Int, y: Int): Int = x + y
}

object mul extends ((Double, Double) => Double) {
  def apply(x: Double, y: Double): Double = x * y
}

val div = (x: Double, y: Double) => x / y

Compositions

Chaining functions

def sqSum(v: Iterable[Double]) = v.map(math.pow(_, 2.0)).reduce(_ + _)

val l2norm1 = math.sqrt _ compose sqSum _  // l^2norm = sqrt(sqSum(v))
val l2norm2 = sqSum _ andThen math.sqrt _  // same as above

l2norm1(List(3.0, 4.0))
l2norm2(List(3.0, 4.0))

Compositions

Functions are data

val square = math.pow(_: Double, 2.0)  // partially applied function
square.getClass                        // Double => Double

// math.* are class methods (JVM), _ converts them to functions (object)
val functions = List(square, math.sqrt _, math.log _, math.log10 _)
// List[Double => Double]
functions.map(_(10.0))  // apply same argument to all functions
// chaining rightward
functions.reduce(_ compose _)(10.0)  // square(sqrt(log(log10(10.0))))
// chaining leftward
functions.reduce(_ andThen _)(10.0)  // log10(log(sqrt(square(10.0))))

Compositions

Predicates

// Int => Boolean
val isEven = { x: Int => x % 2 == 0 }
val isSquare = { x: Int => math.pow(math.sqrt(x).toInt, 2.0) == x }
val range = 1 to 30
range.filter(isEven)
range.filter(isSquare)
// take 2 predicates, create a new one
// more on templates later
def and[A](fn1: A => Boolean, fn2: A => Boolean) = {
  x: A => fn1(x) && fn2(x)
}
range.filter(and(isEven, isSquare))
See also Guava function explained

Compositions

Composing predicates

// function returning function
def not[A](fn: A => Boolean) = { x: A => !fn(x) }
range.filter(not(isEven))
range.filter(not(isSquare))
// function with variable number of arguments
def and[A](fns: (A => Boolean)*) = { x: A => fns.forall(fn => fn(x)) }
def or[A](fns: (A => Boolean)*) = { x: A => fns.exists(fn => fn(x)) }

range.filter(and(isEven, isSquare))
range.filter(or(isEven, isSquare))
Now you can programmatically control predicates, e.g. query parameter, streams

Compositions

Closure

// a function that creates new functions
def makePowFn(n: Double): Double => Double = {
  x: Double => math.pow(x, n)  // n is from enclosing scope
}

val square = makePowFn(2.0)  // n = 2.0 in double's closure
val cube = makePowFn(3.0)    // n = 3.0 in triple's closure

square(10.0)
cube(10.0)

Compositions

Partial functions

val one: PartialFunction[Int, String] = { case 1 => "one" }

one.isDefinedAt(1)  // true
one.isDefinedAt(2)  // false

val two: PartialFunction[Int, String] = { case 2 => "two" }
val three: PartialFunction[Int, String] = { case 3 => "three" }
val wildcard: PartialFunction[Int, String] = { case _ => "something else" }

// chaining partial functions
val oneTwoThree = one orElse two orElse three
val number = one orElse two orElse three orElse wildcard

List(1, 2, 3, 4, 5).map(oneTwoThree.isDefinedAt)
List(1, 2, 3, 4, 5).map(number.isDefinedAt)

Compositions

Lifting partial functions

val one: PartialFunction[Int, String] = { case 1 => "one" }
one(1)         // defined
one(2)         // error!

val plainOne = one.lift  // plain function that returns Option
plainOne(1)    // Some(one)
plainOne(2)    // None

val partialOne = Function.unlift(plainOne)
partialOne(1)  // defined
partialOne(2)  // error!

Compositions

Anonymous partial functions

val lyrics = List("We all live in Amerika", "Amerika ist wunderbar")

// use :paste mode in Scala console
lyrics
  .flatMap(_.split(" "))  // List[String]
  .groupBy(identity)      // Map[String, List[String]]
  .map { case (token, list) => (token, list.size) }
  // more readable than .map(p => (p._1, p._2.size))
case class Band(name: String, members: Int)
val bands = List(
  Band("Pungent Stench", 3),
  Band("Rammstein", 6),
  Band("Haggard", 18))

// don't care about band name
bands.filter { case Band(_, members) => members > 10 }

Compositions

Exercises

  • Given \(\log_b(x) = \frac{\log(x)}{\log(b)}\), implement
    mkLogWithBase(b: Double): Double => Double
  • Using and, find captalized tokens that has 3+ characters and contains 'i'

Laziness

lazy

Laziness

Lazy val and views

def plog(x: Int) = {
  println(x)
  math.log(x)
}
lazy val data = plog(10)

println(data)  // Computing...
println(data)  // cached
// Scala collections are strictly (eagerly) evaluated
(1 to 100).map(plog).take(5)  // waste of CPU cycles

// .view to convert to lazy view, .force to strictly evaluate
(1 to 100).view.map(plog).take(5).force
Clojure and Haskell are lazy
So are Scalding pipes and Spark RDDs

Laziness

Streams

// recursive, lazy, and infinite, n, n+1, n+2, ...
def intsFrom(n: BigInt): Stream[BigInt] = n #:: intsFrom(n + 1)

intsFrom(10).take(5).force     // Stream(10, 11, 12, 13, 14)
Stream.from(10).take(5).force  // same as above

// use :paste mode in Scala console
val factorial = Stream
  .from(2)             // 2, 3, ...
  .scanLeft(1)(_ * _)  // 1, 2 * 1, 3 * 2 * 1, ...

factorial.take(10).force

Laziness

Examples

// Fibonacci take 1
val fibs1: Stream[Int] = 0 #:: 1 #:: fibs1.zip(fibs1.tail).map { n => n._1 + n._2 }
fibs1.take(10).force

// Fibonacci take 2
val fibs2: Stream[Int] = 0 #:: fibs2.scanLeft(1)(_ + _)
fibs2.take(10).force

// prime numbers
def sieve(s: Stream[Int]): Stream[Int] = {
  s.head #:: sieve(s.tail.filter(_ % s.head != 0))
}
val primes: Stream[] = sieve(Stream.from(2))
primes.take(10).force
See Project Euler for more challenges

Performance

Choose the right data structures

  • Built-in operations are usually performant
  • Immutable collections avoid copying when possible
  • But still slower than mutable counterparts
  • Both are slower than Java counterparts
  • Beware of small, temporary objects and GC
  • Trade off between readability and performance
Scala collections performance characteristics

Performance

Unnecessary copies

val m = Map("A" -> 1, "B" -> 2, "C" -> 3)
m.toList.map(t => (t._1, t._2 + 1)).toMap
2 temp lists and 2 * n temp tuples
for ((k, v) <- m) yield (k, v + 1)
m.map { case (k, v) => (k, v + 1) }
m.mapValues(_ + 1)
n temp tuples

Performance

How many copies now?

val m1 = Map("A" -> 1.0, "B" -> 2.0, "C" -> 3.0)
val m2 = Map("A" -> 1.5, "B" -> 2.5, "D" -> 3.5)

def addMaps(m1: Map[String, Double], m2: Map[String, Double]) {
  val i = m1.keySet intersect m2.keySet        // Set[String]
  val m = i.map { k => k -> (m1(k) + m2(k)) }  // Set[(String, Double)]
  (m1 -- i) ++ (m2 -- i) ++ m                  // better: m1 ++ m2 ++ m
}
// 50 million Map:s
val pipe = FancyBigDataPipe[Map[String, Double]]("hdfs://...")

// how many copies now?
pipe.foldLeft(Map[String, Double]())(addMaps)

Performance

Slightly fewer

val m1 = Map("A" -> 1.0, "B" -> 2.0, "C" -> 3.0)
val m2 = Map("A" -> 1.5, "B" -> 2.5, "D" -> 3.5)

// how many copies?
(m1.keySet ++ m2.keySet) map { k =>
  k -> (m1.getOrElse(k, 0.0) + m2.getOrElse(k, 0.0))
}

Harder Faster Better Shorter

m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0.0)) }

Performance

Cheat with mutable collections

import scala.collection.mutable. { Map => MMap }

def addMaps(m1: MMap[String, Double], m2: MMap[String, Double]) = {
  m2.foreach { case (k, v) => m1(k) = v + m1.getOrElse(k, 0.0) }
  m1
}

pipe.foldLeft(MMap[String, Double]())(addMaps)
Impure but gets the job done, but beware of side-effects!

Performance

Third party libraries

  • Aware of memory characteristics
  • Bloom filter - Algebird vs. Guava
  • Linear algebra - Breeze vs. JBLAS
  • Benchmark!

Performance

vs. agility

performance

That's It

Further reading

  • What every hipster should know about
    functional programming video/slides
  • Option[T], Some and None
  • Parallel collections
  • Guava functional idioms and futures

⇒ Day 3