Who am I? frederic.peschanski@lip6.fr — fredokun @ github
- associate professor at Sorbonne University (ex-UPMC)
- researcher at the Lip6 computer science laboratory
- (live) programming & maths geek
- long-time Lisper (scheme, CL, clojure(script))
(ns my.m$macros)
(require '[reagent.core :as r])
(require '[reagent.ratom :as ratom])
;; A live-coding presentation made with klipse
;; (thank you Yehonathan/viebel!)
(defn showme [s] [:h3 (str s)])
[:div (showme (js/Date.))]
Want to play with the presentation ?
⇒ https://fredokun.github.io/talk-clojureD-2018
(repo: https://github.com/fredokun/talk-clojureD-2018)
Who am I? frederic.peschanski@lip6.fr — fredokun @ github
Goals:
⇒ no greek letters !
Non-goals:
Thanks to:
Warning! this presentation is code heavy! The whole source code is in the slides
⇒ the final generator is about 100 lines of Clojure …
we’ll need to skip some parts (but you can play with the whole bunch online…DIY)
Emily wants to generate some objects at random
Important questions for Emily:
It all starts with a good random source. For data generation we’ll
go a long way with a good Pseudo Random Number Generator (PRNG).
Question: a good (portable) PRNG for clojure/script?
(goodness: uniformity, reproducibilty, efficiency
⇒ cf. https://www.gigabytes.it/data/prng/slides.pdf)
⇒ (defn xkcd-random-number [] 4)
?
⇒ standard library (rand
, etc.)?
⇒ rand-cljc?
⇒ test.check?
⇒ a better source?
(require '[clojure.test.check.random
:refer [make-random ;; create source with seed
split ;; two generators from one
rand-double ;; uniform double in range [0.0;1.0[
]])
(def src (make-random 424242))
(let [[src' src''] (split src)] ;; XXX: why two? we need only one ...
[(rand-double src) (rand-double src') (rand-double src'')])
""
Our source: an incremental PRNG
;; generate a double between 0.0 (inclusive) and 1.0 (exclusive)
(defn next-double [src]
(let [[src' src''] (split src)] ;; XXX: throw src?
[(rand-double src') src'']))
(next-double src)
""
Why ?
- trees everywhere:
→ elements/compounds (files/directories, shapes/groups,…)
→ structured documents (sections/subsections,…)
→ tree-shaped datastructures
→ expression trees (generating programs?),
→ etc.
- a non-trivial case study for uniform random generation
- beautiful maths and algorithms
How ?
- "simple" ad hoc algorithms for specific cases: binary trees and general trees
(interactive presentation available as a complement)
- more complex generic algorithms:
→ recursive method (uniform, exact size, ≅ 10.000 nodes, not beautiful)
→ boltzmann sampling (uniform, approx size, ≅ 1.000.000 nodes, beautiful)
(require '[clojure.spec :as s])
;; a spec for binary trees (with int labels)
(s/def ::bintree
(s/or :tip nil?
:node (s/tuple ::label ::bintree ::bintree)))
(s/def ::label int?)
;; example
(def ex-btree [1,
[2 nil nil],
[3 [4 nil,
[5 nil nil]],
[6 nil nil]]])
(s/valid? ::bintree ex-btree)
Random generation from spec (via test.check):
(s/exercise ::bintree 1)
""
Observations
- non-uniform generation (it’s biased but don’t know how)
- lack of control: biased towards small-ish trees
Let’s try the dedicated support for recursive structures
(require '[clojure.test.check.generators :as gen])
(def node-gen (fn [inner-gen]
(gen/tuple gen/int inner-gen inner-gen)))
(def bt-gen (gen/recursive-gen node-gen (gen/return nil)))
(gen/generate bt-gen 10)
""
Observations
- non-uniform generation (it’s biased but don’t know how)
- lack of control: small-ish trees
Unbiased sampling means sampling in the uniform distribution.
Defined for a combinatorial class:
n
Cn
of objects of size n
Uniform distribution: the probability of sampling an object
of size n
is (/ 1.0 Cn)
Example: binary trees as a combinatorial class:
n
of a tree is its number of (internal) nodes[:a nil [:b [:c nil nil] nil]]
is of size 3Cn
?;; a point = a node or a tip (a `nil`)
(defn nb-points [n] (+ (* 2 n) 1))
;; a tip = a `nil` value
(defn nb-tips [n] (inc n))
;; counting binary trees (https://oeis.org/A000108)
(defn catalans
([] (cons 1 (cons 1 (catalans 1 1))))
([n Cn] (lazy-seq (let [Cn+1 (* (/ (* 2 (nb-points n))
(nb-tips (inc n)))
Cn)]
(cons Cn+1 (catalans (inc n) Cn+1))))))
(take 15 (catalans))
""
nil ;; tree of size 0
""
[:a nil nil] ;; tree of size 1
""
[:a [:b nil nil] nil] ;; trees of size 2
[:a nil [:b nil nil]]
""
[:a [:b nil nil] [:c nil nil]] ;; trees of size 3
[:a [:b [:c nil nil] nil] nil]
[:a [:b nil [:c nil nil]] nil]
[:a nil [:b [:c nil nil] nil]]
[:a nil [:b nil [:c nil nil]]]
""
⇒ this recursive definition leads to a beautiful algorithm for
generating binary trees uniformly at random (cf. complement).
We spec a simple (map-based) DSL for tree grammars.
(s/def ::grammar (s/map-of keyword? ::elem))
(s/def ::elem (s/or :neutral ::neutral
:atom ::atom
:sum ::sum
:prod ::prod
:ref ::ref))
(s/def ::neutral #{1}) ;; ≅ empty
(s/def ::atom #{'z}) ;; an atom has size 1
;; (+ <e1> <e2> ...) either ... or ...
(s/def ::sum (s/cat :sum #{'+} :elems (s/+ ::elem)))
;; (* <e1> <e2> ...) tupling
(s/def ::prod (s/cat :prod #{'*} :elems (s/+ ::elem)))
;; recursion
(s/def ::ref keyword?)
""
Example 1: binary trees
(def bt-gram '{:btree (+ :tip :node)
:node (* z :btree :btree)
:tip (* 1)})
(s/valid? ::grammar bt-gram)
Example 2: general trees
(def gt-gram '{:gtree (* z :gtrees)
:gtrees (+ 1 :forest)
:forest (* z :gtree :gtrees)})
(s/valid? ::grammar gt-gram)
Example:
(def tt-gram '{:ttree (+ :one :two :three)
:one (* z)
:two (* z z :ttree :ttree)
:three (* z z z :ttree :ttree :ttree)})
(s/valid? ::grammar tt-gram)
In english: trees with internal nodes of arity either 2 and 3,
such that the leaves have size 1, the binary nodes have size 2
and the ternary nodes size 3.
⇒ how to generate such trees uniformly at random?
The tree grammars look suspiciously like equations…
'{:btree (+ :tip (* z :btree :btree))
:tip (* 1)}
""
Functional equation: btree(z) = 1 + z * btree(z)²
btree(z)
is called a generating function, it is a power series of the form:
(reduce + (for [n (range)] (* (coef n) (Math/pow z n))) ;; Algebraic Clojure, someday...
where n
is the size of the generated objects, and (coef n)
the number
of such objects. For binary trees it’s the Catalan numbers.
(take 15 (catalans))
What the underlying maths tell us about the generating functions
corresponding to tree grammars:
- they converge (i.e. are analytic) in 0
- there exists a radius of convergence between 0 (inclusive) and 1.0 (exclusive)
the limit point is called the (main) singularity
Suppose we have a tree grammar C
The Boltzmann probabilty of an object c
generated by this grammar is:
(/ (Math/pow z (size-of c)) (C z))
z
is the Boltzmann parameter(C z)
is the evaluation of the functional equation at that parameter.Remark: because of recursion, we need to bootstrap the evaluation somehow…
(defn eval-elem [elem z prev]
(cond
(= elem 1) 1.0
(= elem 'z) z
(keyword? elem) (get prev elem)
:else (let [[op & args] elem]
(case op
+ (apply + (map #(eval-elem % z prev) args))
* (apply * (map #(eval-elem % z prev) args))))))
(eval-elem '(* z :btree :btree) 0.25 {:btree 2.0, :tip 1.0, :node 2.0})
""
(defn mapkv [f m]
(into {} (map (fn [[k v]] (f k v)) m)))
(defn eval-grammar [grammar z prev]
(mapkv (fn [ref elem] [ref (eval-elem elem z prev)]) grammar))
(eval-grammar bt-gram 0.25 {:btree 2.0 :tip 1.0 :node 2.0})
""
Sampling of an object with a Boltmann model C
of parameter z
:
C
is 1
(a constant) then return the empty object (of size 0)C
is z
(an atom) then return the corresponding object of size 1C
is (+ A B)
(disjoint sum) then:A
with probability (/ (A z) (B z))
B
C
is (* A B)
generate a pair [a b]
with a
an object of A
and b
an object of B
Guarantee: the sampling is uniform
Questions:
- what should be the value of z
?
- how to compute efficiently the probabilities for disjoint sums?
The singularity is the limit of convergence for the generating functions
Theorem: By choosing the singularity as the Boltzmann parameter,
the expected size of an object generated by a Boltmann sampler is infinite.
Singular Boltzmann sampling = generating objects near or at the singularity.
Question: how to find the singularity for a given tree grammar?
⇒ numerical computation
(declare iter) ;; newton iteration
(declare diverge?) ;; divergence (we went too far)
;; Mr Oracle, please find the singularity
(defn oracle [class zmin zmax eps-iter eps-div]
;; (println "[search] zmin=" zmin "zmax=" zmax)
(if (< (- zmax zmin) eps-iter)
[zmin (iter class zmin eps-div true)]
(let [z (/ (+ zmin zmax)
2.0)
v (iter class z eps-div false)]
(if (diverge? v eps-div)
(recur class zmin z eps-iter eps-div)
(recur class z zmax eps-iter eps-div)))))
;; (oracle tt-gram 0.0 1.0 0.00001 0.00000001)
;; (oracle gt-gram 0.0 1.0 0.001 0.000001)
;; (oracle bt-gram 0.0 1.0 0.001 0.000001)
""
;; distance between vectors v1 and v2
(defn norm [v1 v2]
(reduce-kv (fn [norm elem y1]
(let [y2 (get v2 elem)
y (Math/abs (- y1 y2))]
(if (> y norm) y norm))) 0.0 v1))
(norm {:a 0.1 :b 0.3} {:a 0.2 :b -0.2})
""
;; iteration until distance is less than `eps` (thank you Mr. Newton)
(defn iter [gram z eps debug]
(loop [v1 (mapkv (fn [k _] [k 0.0]) gram)]
(let [v2 (eval-grammar gram z v1)]
;; (when debug (println "[iter] v2=" v2 "norm=" (norm v1 v2)))
(if (<= (norm v1 v2) eps)
v2
(recur v2)))))
;; (oracle bt-gram 0.0 1.0 0.001 0.01)
""
(defn some-kv [pred m]
(reduce-kv (fn [res k v]
(let [r (pred k v)]
(if r
(reduced r)
res))) nil m))
;; vector has diverged wrt. eps?
(defn diverge? [v eps]
(some-kv (fn [_ w] (or (< w 0.0) (> w (/ 1.0 eps)))) v))
""
An interesting pre-computation is to calculate the weights of
the operands of disjoint sums (+
operator)
(defn weighted-args [args z weights]
(let [eargs (mapv (fn [arg] [arg (eval-elem arg z weights)]) args)
total (apply + (map second eargs))]
(loop [eargs eargs, acc 0.0, wargs []]
(if (seq eargs)
(let [[arg weight] (first eargs)
acc' (+ acc weight)]
(recur (rest eargs) acc' (conj wargs [arg (/ acc' total)])))
;; no more arg
wargs))))
(defn weighted-elem [elem z weights]
(cond
(#{1 'z} elem) elem
(keyword? elem) elem
:else (let [[kind & args] elem]
(case kind
* elem
+ (cons '+ (weighted-args args z weights))))))
(defn weighted-gram [class z weights]
(mapkv (fn [ref elem] [ref (weighted-elem elem z weights)]) class))
""
;; binary trees
(let [[z v] (oracle bt-gram 0.0 1.0 0.00001 0.000001)]
(def bt-sing z)
(def bt-wgram (weighted-gram bt-gram z v)))
;; general trees
(let [[z v] (oracle gt-gram 0.0 1.0 0.00001 0.000001)]
(def gt-sing z)
(def gt-wgram (weighted-gram gt-gram z v)))
;; one-two-tree
(let [[z v] (oracle tt-gram 0.0 1.0 0.00001 0.000001)]
(def tt-sing z)
(def tt-wgram (weighted-gram tt-gram z v)))
""
tt-gram
(defn choose [src choices]
(let [[x src'] (next-double src)]
(some (fn [[elem proba]]
(and (<= x proba) [src' elem]))
choices)))
(choose (make-random) [[1 0.5] [:two 0.8] [:three 1.0]])
""
Based on the weighted grammer, the random generator is rather simple
(but there’s an issue lurking, can you find it?)
(declare product)
(defn gentree [src wgram label elem]
(cond
(= elem 1) [src nil]
(= elem 'z) [src 'z]
(keyword? elem) (recur src wgram elem (get wgram elem))
:else (let [[op & args] elem]
(case op
+ (let [[src' choice] (choose src args)]
(gentree src' wgram label choice))
* (reduce (fn [[src v] arg]
(let [[src' res] (gentree src wgram label arg)]
(if (#{nil 'z} res)
[src' v]
[src' (conj v res)]))) [src [label]] args)))))
;; (gentree (make-random) tt-wgram :ttree :ttree)
""
Problem: the generated tree is very often very small (e.g. size 1)
and could be very (very very …) large because:
- the Boltzmann distribution is largely biased ("peaked") towards small trees
- the expected size is infinite
⇒ how to control the size?
but be careful with (very very …) large trees
;; (tail recursive) generation of size
(defn gensize
[src wgram maxsize elem]
(loop [src src, elem elem, size 0, cont '()]
(if (>= size maxsize) ;; <== important!
[-1 src]
(cond
(nil? elem) (if (seq cont)
(recur src (first cont) size (rest cont))
[size src])
(= elem 1) (recur src nil size cont)
(= elem 'z) (recur src nil (inc size) cont)
(keyword? elem) (recur src (get wgram elem) size cont)
:else
(let [[op & args] elem]
(case op
+ (let [[src' elem'] (choose src args)]
(recur src' elem' size cont))
* (recur src (first args) size (concat (rest args) cont))))))))
;; (gensize (make-random) tt-wgram 1000 :ttree)
""
Problem: many trees are small, the random source must be "heated" to
obtain better results
(defn gensizes [src wgram maxsize elem]
(let [[size src'] (gensize src wgram maxsize elem)]
(if (> size 0)
(lazy-seq (cons [size src] (gensizes src' wgram maxsize elem)))
(recur src' wgram maxsize elem))))
(take 20 (map first
(gensizes (make-random) tt-wgram 1000 :ttree)))
""
(take 15 (map first (filter (fn [[size _]] (> size 100))
(gensizes (make-random) tt-wgram 1000 :ttree))))
""
(defn boltzmann [src wgram sing elem min-size max-size]
(let [[size src'] (first (filter (fn [[size _]] (>= size min-size))
(gensizes src wgram max-size elem)))
[src'' tree] (gentree src' wgram elem elem)]
[src'' size tree]))
""
but wait! There’s something’s fishy …
⇒ Our generator cannot generate a tree with one million nodes because …
because ???
because
gentree
is not tail-recursive!
(bummer!)
Looking for a Clojure Kata?
(defn gentree' [src wgram maxsize elem]
(loop [src src, label elem, elem elem, size 0, cont '()]
(cond
(>= size maxsize) [nil src]
(keyword? elem) (recur src elem (get wgram elem) size cont)
(= elem 1) (recur src nil nil size cont)
(seq? elem)
(let [[op & args] elem]
(case op
+ (let [[src' elem'] (choose src args)]
(recur src' label elem' size cont))
* (recur src label (first args) size (cons [label (rest args)] cont))))
:else
(let [[[node rst] & cont'] cont]
(cond
(= elem 'z) (if (vector? node)
(recur src nil nil (inc size) cont)
(recur src nil nil (inc size) (cons [[node] rst] cont')))
(nil? elem)
(cond
(seq rst) (recur src nil (first rst) size (cons [node (rest rst)] cont'))
(seq cont') (let [[[node' rst'] & cont''] cont']
(recur src nil nil size (cons [(conj node' node) rst'] cont'')))
:else [src node]))))))
(defn boltzmann' [src wgram sing elem min-size max-size]
(let [[size src'] (first (filter (fn [[size _]] (>= size min-size))
(gensizes src wgram max-size elem)))
[src'' tree] (gentree' src' wgram elem elem)]
[src'' size tree]))
""
Generating one-two-three’s
;; (boltzmann' (make-random) tt-wgram tt-sing :ttree 9 10)
""
Generating binary trees
;; (boltzmann' (make-random) bt-wgram bt-sing :btree 10 30)
""
Generating general trees
;; (boltzmann' (make-random) gt-wgram gt-sing :gtree 4 100)
""
(a lot) More on the topic?
Boltzmann Samplers for the Random Generation of Combinatorial Structures
Duchon, Flajollet, Louchard and Shaeffer, 2004
Thank you!
powered by Klipse /