Who am I? frederic.peschanski@lip6.fr — fredokun @ github
- associate professor at Sorbonne University (ex-UPMC)
- researcher at the Lip6 computer science laboratory
- (live) programming & maths geek
- long-time Lisper (scheme, CL, clojure(script))
(ns my.m$macros)
(require '[reagent.core :as r])
(require '[reagent.ratom :as ratom])
;; A live-coding presentation made with klipse
;; (thank you Yehonathan/viebel!)
(defn showme [s] [:h3 (str s)])
[:div (showme (js/Date.))]
Want to play with the presentation ?
⇒ https://fredokun.github.io/talk-clojureD-2018
(repo: https://github.com/fredokun/talk-clojureD-2018)
Who am I? frederic.peschanski@lip6.fr — fredokun @ github
Warning! this presentation is code heavy! The whole source code is in the slides
⇒ the final generator is about 100 lines of Clojure …
Our goal is to generate (large) trees…
It all starts with a good random source.
(goodness: uniformity, reproducibilty, efficiency
⇒ cf. https://www.gigabytes.it/data/prng/slides.pdf)
For data generation we’ll go a long way with a good Pseudo Random Number Generator (PRNG).
⇒ there’s a good one in test.check
(require '[clojure.test.check.random
:refer [make-random ;; create source with seed
split ;; two generators from one
rand-double ;; uniform double in range [0.0;1.0[
]])
""
Our source: an immutable PRNG
;; generate a double between 0.0 (inclusive) and 1.0 (exclusive)
(defn next-double [src]
(let [[src' src''] (split src)] ;; XXX: throw src?
[(rand-double src') src'']))
(next-double (make-random 424242))
""
Why ?
- trees everywhere:
→ elements/compounds (files/directories, shapes/groups,…)
→ structured documents (sections/subsections,…)
→ tree-shaped datastructures
→ expression trees (generating programs?),
→ etc.
- a non-trivial case study for uniform random generation
- beautiful maths and algorithms
How ?
- use spec
and test.check
?
- ad-hoc algorithms for specific cases: binary trees and general trees
- generic algorithms: recursive method, boltzmann sampling, …
A spec of binary trees …
(require '[clojure.spec :as s])
;; a spec for binary trees (with int labels)
(s/def ::bintree
(s/or :tip nil?
:node (s/tuple ::label ::bintree ::bintree)))
(s/def ::label int?)
;; example
(def ex-btree [1,
[2 nil nil],
[3 [4 nil,
[5 nil nil]],
[6 nil nil]]])
(s/valid? ::bintree ex-btree)
Random generation with spec (via test.check):
(s/exercise ::bintree 1)
""
Observations
- non-uniform generation: seems to be biased for small-ish trees
- lack of control: what if I want a bigger tree?
test.check has dedicated support for recursive structures
(require '[clojure.test.check.generators :as gen])
(def node-gen (fn [inner-gen]
(gen/tuple gen/int inner-gen inner-gen)))
(def bt-gen (gen/recursive-gen node-gen (gen/return nil)))
(gen/generate bt-gen 10)
""
Observations
- non-uniform generation: also small-ish trees
- lack of control: the size parameter does not seem to "work" (for trees)…
Unbiased sampling means sampling in the uniform distribution.
Defined for a combinatorial class ⇒ e.g. the integers in interval [1;100]
n
⇒ all integers in the interval have size 1Cn
of objects of size n
⇒ there are 100 integers of size 1Uniform distribution: the probability of sampling an object
of size n
is (/ 1 Cn)
⇒ the probability of sampling an integer in the interval is (/ 1 100)
We spec a simple (map-based) DSL for tree grammars.
(s/def ::grammar (s/map-of keyword? ::elem))
(s/def ::elem (s/or :neutral ::neutral
:atom ::atom
:sum ::sum
:prod ::prod
:ref ::ref))
(s/def ::neutral #{1}) ;; ≅ empty
(s/def ::atom #{'z}) ;; an atom has size 1
;; (+ <e1> <e2> ...) either ... or ...
(s/def ::sum (s/cat :sum #{'+} :elems (s/+ ::elem)))
;; (* <e1> <e2> ...) tupling
(s/def ::prod (s/cat :prod #{'*} :elems (s/+ ::elem)))
;; recursion
(s/def ::ref keyword?)
""
Example 1: binary trees
(def bt-gram '{:btree (+ :tip :node)
:node (* z :btree :btree)
:tip (* 1)})
(s/valid? ::grammar bt-gram)
Example 2: general trees
(def gt-gram '{:gtree (* z :gtrees)
:gtrees (+ 1 :forest)
:forest (* z :gtree :gtrees)})
(s/valid? ::grammar gt-gram)
Example:
(def tt-gram '{:ttree (+ :one :two :three)
:one (* z)
:two (* z z :ttree :ttree)
:three (* z z z :ttree :ttree :ttree)})
(s/valid? ::grammar tt-gram)
Remark: z
is "zed" … power of z
= size
thus in english: trees with internal nodes of arity either 2 and 3
such that:
- the leaves have size 1,
- the binary nodes have size 2
- and the ternary nodes size 3.
The tree grammars look suspiciously like equations…
'{:btree (+ :tip (* z :btree :btree))
:tip (* 1)}
""
… something like btree(z) = 1 + z * btree(z)²
Incremental evaluation of tree expressions …
(defn eval-elem [elem z prev]
(cond
(= elem 1) 1.0
(= elem 'z) z
(keyword? elem) (get prev elem)
:else (let [[op & args] elem]
(case op
+ (apply + (map #(eval-elem % z prev) args))
* (apply * (map #(eval-elem % z prev) args))))))
(eval-elem '(* z :btree :btree) 0.25 {:btree 2.0, :tip 1.0, :node 2.0})
""
… and tree grammars
(defn mapkv [f m]
(into {} (map (fn [[k v]] (f k v)) m)))
(defn eval-grammar [grammar z prev]
(mapkv (fn [ref elem] [ref (eval-elem elem z prev)]) grammar))
(eval-grammar bt-gram 0.25 {:btree 2.0 :tip 1.0 :node 2.0})
""
Sampling of an object with a Boltmann model (tree expression) C
of parameter z
:
C
is 1
(a constant) then return the empty object (of size 0)C
is z
(an atom) then return the corresponding object of size 1C
is (+ A B)
(disjoint sum) then:A
with probability (/ (A z) (B z))
B
C
is (* A B)
generate a pair [a b]
with a
an object of A
and b
an object of B
Guarantee: the sampling is uniform
Remark: the control parameter is z
, not (directly) the size of the tree.
Questions:
- what should be the value of z
?
- how to compute efficiently the probabilities for disjoint sums?
What the underlying maths tell us about the functional equations
corresponding to tree grammars:
- they converge (i.e. are analytic) in 0
- there exists a radius of convergence between 0 (inclusive) and 1.0 (exclusive)
⇒ The singularity is the limit of convergence
Theorem: By choosing the singularity as the Boltzmann parameter,
the expected size of an object generated by a Boltmann sampler is infinite.
Singular Boltzmann sampling = generating objects near or at the singularity.
Question: how to find the singularity for a given tree grammar?
⇒ numerical computation
(declare iter) ;; newton iteration
(declare diverge?) ;; divergence (we went too far)
;; Mr Oracle, please find the singularity
(defn oracle [class zmin zmax eps-iter eps-div debug?]
(when debug? (println "[search] zmin=" zmin "zmax=" zmax))
(if (< (- zmax zmin) eps-iter)
[zmin (iter class zmin eps-div true)]
(let [z (/ (+ zmin zmax)
2.0)
v (iter class z eps-div false)]
(when debug? (println " => z=" z "v=" v))
(if (diverge? v eps-div)
(recur class zmin z eps-iter eps-div debug?)
(recur class z zmax eps-iter eps-div debug?)))))
;; (oracle tt-gram 0.0 1.0 0.00001 0.00000001 true)
;; (oracle gt-gram 0.0 1.0 0.001 0.000001 true)
;; (oracle bt-gram 0.0 1.0 0.001 0.000001 true)
""
⇒ Newton iteration, cf. SICP section 1.1.7
;; distance between vectors v1 and v2
(defn norm [v1 v2]
(reduce-kv (fn [norm elem y1]
(let [y2 (get v2 elem)
y (Math/abs (- y1 y2))]
(if (> y norm) y norm))) 0.0 v1))
(norm {:a 0.1 :b 0.3} {:a 0.2 :b -0.2})
""
;; iteration until distance is less than `eps` (thank you Mr. Newton)
(defn iter [gram z eps debug]
(loop [v1 (mapkv (fn [k _] [k 0.0]) gram)]
(let [v2 (eval-grammar gram z v1)]
;; (when debug (println "[iter] v2=" v2 "norm=" (norm v1 v2)))
(if (<= (norm v1 v2) eps)
v2
(recur v2)))))
;; (oracle bt-gram 0.0 1.0 0.001 0.01 false)
""
(defn some-kv [pred m]
(reduce-kv (fn [res k v]
(let [r (pred k v)]
(if r
(reduced r)
res))) nil m))
;; vector has diverged wrt. eps?
(defn diverge? [v eps]
(some-kv (fn [_ w] (or (< w 0.0) (> w (/ 1.0 eps)))) v))
""
An interesting pre-computation is to calculate the weights of
the operands of disjoint sums (+
operator)
(defn weighted-args [args z weights]
(let [eargs (mapv (fn [arg] [arg (eval-elem arg z weights)]) args)
total (apply + (map second eargs))]
(loop [eargs eargs, acc 0.0, wargs []]
(if (seq eargs)
(let [[arg weight] (first eargs)
acc' (+ acc weight)]
(recur (rest eargs) acc' (conj wargs [arg (/ acc' total)])))
;; no more arg
wargs))))
(defn weighted-elem [elem z weights]
(cond
(#{1 'z} elem) elem
(keyword? elem) elem
:else (let [[kind & args] elem]
(case kind
* elem
+ (cons '+ (weighted-args args z weights))))))
(defn weighted-gram [class z weights]
(mapkv (fn [ref elem] [ref (weighted-elem elem z weights)]) class))
""
;; binary trees
(let [[z v] (oracle bt-gram 0.0 1.0 0.00001 0.000001 false)]
(def bt-sing z)
(def bt-wgram (weighted-gram bt-gram z v)))
;; general trees
(let [[z v] (oracle gt-gram 0.0 1.0 0.00001 0.000001 false)]
(def gt-sing z)
(def gt-wgram (weighted-gram gt-gram z v)))
;; one-two-three trees
(let [[z v] (oracle tt-gram 0.0 1.0 0.00001 0.000001 false)]
(def tt-sing z)
(def tt-wgram (weighted-gram tt-gram z v)))
""
tt-gram
(defn choose [src choices]
(let [[x src'] (next-double src)]
(some (fn [[elem proba]]
(and (<= x proba) [src' elem]))
choices)))
(choose (make-random) [[:one 0.5] [:two 0.8] [:three 1.0]])
""
Based on the weighted grammer, the random generator is rather simple
(but there’s an issue lurking, can you find it?)
(defn gentree [src wgram label elem]
(cond
(= elem 1) [src nil]
(= elem 'z) [src 'z]
(keyword? elem) (recur src wgram elem (get wgram elem))
:else (let [[op & args] elem]
(case op
+ (let [[src' choice] (choose src args)]
(recur src' wgram label choice))
* (reduce (fn [[src v] arg]
(let [[src' res] (gentree src wgram label arg)]
(if (#{nil 'z} res)
[src' v]
[src' (conj v res)]))) [src [label]] args)))))
;; (gentree (make-random) tt-wgram :ttree :ttree)
""
Problem: the generated tree is very often very small (e.g. size 1)
and could be rather infrequently very (very very …) large because:
- the Boltzmann distribution of sizes is largely biased ("peaked") towards small trees
- the expected size is infinite
⇒ how to control the size?
but be careful with (very very …) large trees
;; (tail recursive) generation of size
(defn gensize
[src wgram maxsize elem]
(loop [src src, elem elem, size 0, cont '()]
(if (>= size maxsize) ;; <== important!
[-1 src]
(cond
(nil? elem) (if (seq cont)
(recur src (first cont) size (rest cont))
[size src])
(= elem 1) (recur src nil size cont)
(= elem 'z) (recur src nil (inc size) cont)
(keyword? elem) (recur src (get wgram elem) size cont)
:else
(let [[op & args] elem]
(case op
+ (let [[src' elem'] (choose src args)]
(recur src' elem' size cont))
* (recur src (first args) size (concat (rest args) cont))))))))
;; (gensize (make-random) tt-wgram 1000 :ttree)
""
Problem: many trees are small, the random source must be "heated" to
obtain better results
(defn gensizes [src wgram maxsize elem]
(let [[size src'] (gensize src wgram maxsize elem)]
(if (> size 0)
(lazy-seq (cons [size src] (gensizes src' wgram maxsize elem)))
(recur src' wgram maxsize elem))))
(take 20 (map first
(gensizes (make-random) tt-wgram 1000 :ttree)))
""
⇒ this is called rejection sampling … but that’s just filter
!
(take 15 (map first (filter (fn [[size _]] (> size 100))
(gensizes (make-random) tt-wgram 1000 :ttree))))
""
(defn boltzmann [src wgram sing elem min-size max-size]
(let [[size src'] (first (filter (fn [[size _]] (>= size min-size))
(gensizes src wgram max-size elem)))
[src'' tree] (gentree src' wgram elem elem)]
[src'' size tree]))
""
but wait! There’s something’s fishy …
⇒ Our generator cannot generate a tree with one million nodes because …
because ???
because
gentree
is not tail-recursive!
(bummer!)
Looking for a Clojure Kata?
(defn gentree' [src wgram maxsize elem]
(loop [src src, label elem, elem elem, size 0, cont '()]
(cond
(>= size maxsize) [nil src]
(keyword? elem) (recur src elem (get wgram elem) size cont)
(= elem 1) (recur src nil nil size cont)
(seq? elem)
(let [[op & args] elem]
(case op
+ (let [[src' elem'] (choose src args)]
(recur src' label elem' size cont))
* (recur src label (first args) size (cons [label (rest args)] cont))))
:else
(let [[[node rst] & cont'] cont]
(cond
(= elem 'z) (if (vector? node)
(recur src nil nil (inc size) cont)
(recur src nil nil (inc size) (cons [[node] rst] cont')))
(nil? elem)
(cond
(seq rst) (recur src nil (first rst) size (cons [node (rest rst)] cont'))
(seq cont') (let [[[node' rst'] & cont''] cont']
(recur src nil nil size (cons [(conj node' node) rst'] cont'')))
:else [src node]))))))
""
(defn boltzmann' [src wgram sing elem min-size max-size]
(let [[size src'] (first (filter (fn [[size _]] (>= size min-size))
(gensizes src wgram max-size elem)))
[src'' tree] (gentree' src' wgram elem elem)]
[src'' size tree]))
""
Generating one-two-three’s
;; (boltzmann' (make-random) tt-wgram tt-sing :ttree 9 10)
""
Generating binary trees
;; (boltzmann' (make-random) bt-wgram bt-sing :btree 10 30)
""
Generating general trees
;; (boltzmann' (make-random) gt-wgram gt-sing :gtree 4 100)
""
(a lot) More on the topic?
Boltzmann Samplers for the Random Generation of Combinatorial Structures
Duchon, Flajollet, Louchard and Shaeffer, 2004
Thank you!
powered by Klipse /