lw’s blog

Counting Strings and the Goulden-Jackson Cluster Method

2024-02-10T10:00:00+00:00

A lot of this post was inspired by the papers of John Noonan & Doron Zeilberger and Zhuang (you can also checkoout the original paper, but it's a bit harder to follow).

Intro

Suppose we want to find the number of strings of length \( n \) (sourced from some given alphabet \( V \)), which don't contain any of a given set of strings \( B \) as a substring. Is there a fast way to do this?

The most basic case is excluding a string of a single character, in which case there are \( n^{\left|V\right| - 1} \) total strings. But past single character strings, reasoning becomes a bit more difficult. It's always true (and we will show) that the total number of strings follows a linear recurrence and so calculating the first few results using DP and using Berlekamp Massey will give a fast way, though we will show a way to compute a generating function directly.

A Derivation

Let's first define the weight \( W_R \) of some word \( w = w_1 \dots w_n \in V^* \) of length \( n \), and \( R \in \mathbb{Z}^+ \). We will define it using the set of variables \( x\left[w'\right] \) for all \( w' \in V^* \) of length \( R \) or less as follows:

\[ W_R(w) = \prod_{k = 1}^n \prod_{m = k}^{\min(k + R, n)} x\left[w_k \dots w_m\right] \]

Note some factors may appear more than once, for example:

\[ W_2(HELL) = x\left[H\right]x\left[E\right]x\left[L\right]^2x\left[HE\right]x\left[EL\right]x\left[LL\right] \]

Now, we define the generating function over \( x[w] \) where \( w \) has length \( \le R \) as:

\[ \Phi_R = \sum_{w \in V^*} W_R(w) \]

Our strategy will be to perform substitutions on \( \Phi_R \) in order to recover the generating functions we want. For example the mapping:

\begin{equation} x[w] \mapsto \left\{ \begin{array}{ll} 0, & \text{if } w \in B \text{, e.g. w is a string we want to exclude}\\ x, & \text{if } w \text{ is a single character string}\\ 1, & \text{otherwise} \end{array}\right. \end{equation}

Will give us the generating function \( \sum a_n x^n \) where \( a_n \) is the number of words of length \( n \) not containing any \( w \in B \) as a substring. We'll denote this generating function by \( f_B(x) \).

Computing \( \Phi_R \)

Let's define:

\[ Suff(w) = \{ w' \in V^* : \text{w' ends in w} \} \]

Now, all words in \( V^* \) must either be of length less than \( R \) or end in some string of length \( R \). Define:

\[ \Phi_{R, w} = \sum_{w' \in Suff(w)} W_R(w') \]

Then \( \Phi_R \) is the sum of \( \Phi_{R, w} \) for all words of length \( R \) plus the sum of \( W_R(w) \) for all words of length less than \( R \). Next, we see that our set of \( \Phi_{R, w} \) form a set of simultaneous equations:

\[ \Phi_{R, w_1 \dots w_R} = W_R(w) + \left(\prod_{i \ge 1} x\left[w_i \dots w_r \right] \right) \sum_{c \in V} \Phi_{R, cw_1 \dots w_{R - 1}} \]

Each equation says in essence, if a word \( w' \) ends in \( w \), then it must either be \( w \) itself, else we can drop the last character of \( w' \) and we are left another word with a suffix of length \( R \). And so calculating \( \Phi_R \) reduces to solving these equations. We can also glean from this that \( \Phi_R \) is rational in its variables which implies as we stated in our introduction, \( f_B(x) \) is the GF of a linear recurrence.

Note, making the substitution (1) prior to solving the system simplifies computing \( f_B(x) \). We'll denote (1) applied to \( \Phi_{R, w} \) as \( \Phi_{R, w}(x) \).

Example

Consider the binary string of length \( n \) not containing the substring \( 111 \). We see (making our substitution ahead of time):

\begin{align*} \Phi_{3, 000}(x) &= x^3 + x \left(\Phi_{3, 100}(x) + \Phi_{3, 000}(x) \right)\\ \Phi_{3, 001}(x) &= x^3 + x \left(\Phi_{3, 100}(x) + \Phi_{3, 000}(x) \right)\\ \Phi_{3, 010}(x) &= x^3 + x \left(\Phi_{3, 101}(x) + \Phi_{3, 001}(x) \right)\\ \Phi_{3, 011}(x) &= x^3 + x \left(\Phi_{3, 101}(x) + \Phi_{3, 001}(x) \right)\\ \Phi_{3, 100}(x) &= x^3 + x \left(\Phi_{3, 110}(x) + \Phi_{3, 010}(x) \right)\\ \Phi_{3, 101}(x) &= x^3 + x \left(\Phi_{3, 110}(x) + \Phi_{3, 010}(x) \right)\\ \Phi_{3, 110}(x) &= x^3 + x \left(\Phi_{3, 111}(x) + \Phi_{3, 011}(x) \right)\\ \Phi_{3, 111}(x) &= x \left(\Phi_{3, 111}(x) + \Phi_{3, 011}(x) \right)\\ \end{align*}

Solving, we find that:

\begin{align*} \Phi_{3, 000}(x) &= \Phi_{3, 001}(x) = \Phi_{3, 010}(x) = \Phi_{3, 011}(x) = -\frac{x^5 + x^4 + x^3}{x^3 + x^2 + x - 1}\\ \Phi_{3, 100}(x) &= \Phi_{3, 101}(x) = \Phi_{3, 110}(x) = -\frac{x^4 + x^3}{x^3 + x^2 + x - 1}\\ \Phi_{3, 111}(x) &= 0 \end{align*}

And thus:

\begin{align*} f_{\{111\}}(x) &= 1 + 2x + 4x^2 + \frac{4x^5 + 6x^4 + 7x^3}{1 - x^3 - x^2 - x}\\ &= \frac{x^2 + x + 1}{1 - x^3 - x^2 - x}\\ \end{align*}

AKA the (shifted) Tribonacci numbers.

The Goulden-Jackson Cluster Method

For large alphabets and \( R \) the method above will result in a lot of computational effort; our system alone will be of size \( \left|V\right|^R \). We'll introduce the Goulden-Jackson Cluster method as a means of reducing our work.

In this section we'll add a couple of restrictions on our set of bad words \( B \). Firstly, no bad word should appear as a substring of any other bad word - the bigger bad word can be removed from \( B \) for the same end result. Secondly, all \( b \in B \) should be of length at least two. If this is not true, we can equivalently remove \( v \in B \) from our alphabet \( V \).

Clusters

Given \( w \) and a set of words \( B \), we define a marked word as a pair \( (w, \{ (b_1, i_1), (b_2, i_2) \dots (b_l, i_l) : w_{i_k} \dots w_{i_k + length(b_k) - 1} = b_k \in B \}) \). For example, for \( B = \{HE, EL, LO \} \), the following is a marked word:

\[ (HELLO, \{ (HE, 1), (EL, 2), (LO, 4) \}) \]

And we define a cluster as a nonempty marked word for which every letter in \( w \) belongs to at least one bad word, and neighbouring bad words appearing in \( w \) always overlap:

\[ (HEL, \{ (HE, 1), (EL, 2) \}) \]

Note, every subword of \( B \) in \( w \) needn't be included in the marked word, for example:

\[ (HELLO, \{ (HE, 1) \}) \]

Is a completely valid marked word.

Given \( B \), we'll define \( C_B(w) \) as the set of all clusters on \( w \) (exercise: find \( w, B \) such that this set has size greater than one), \( M_B \) as the set of all marked words, and \( C_B \) as the set of all clusters.

The concatenation of two marked words \( m_1, \ m_2 \) will be denoted \( m_1m_2 \), and defined how you would expect.

A Formula

First of all, we'll give an equivalent definition of \( f_B(x) \):

\[ f_B(x) = \sum_{w \in L_B} x^{length(w)} \]

Where \( L_B \) is the set of all words in \( V^* \), not containing any word in \( B \) as a substring. We'll focus on calculating \( f_B(x) \) from here on, but other substitutions on \( \Phi_R \) act similarly (and are in examples).

Further define the auxiliary generating functions:

\begin{align*} F_B(x, t) &= \sum_{(w, S) = m \in M_B} x^{length(w)} t^{\left|S\right|}\\ C_B(x, t) &= \sum_{(w, S) = c \in C_B} x^{length(w)} t^{\left|S\right|} \end{align*}

And define \( Q(m = (w, S)) = x^{length(w)}t^{\left|S\right|} \) for brevity (it should be clear that \( Q(m_1m_2) = Q(m_1)Q(m_2) \)). Next, we see that every marked word \( m = (w, S) \) either ends in a character not present in any bad word in \( S \), or otherwise the last character is part of the last bad word in \( S \) (which itself must be part of a cluster):

\[ M_B = \{ e \} \cup \{ mc : m \in M_B, \ c \in C_B \} \cup \{ mv : m \in M_B, \ v \in V \}\\ \]

\begin{align*} \Rightarrow F_B(x, t) &= 1 + \sum_{m \in M_B} \sum_{c \in C_B} Q(mc) + \sum_{m \in M_B} \sum_{v \in V} Q(mv)\\ &= 1 + \sum_{m \in M_B} \sum_{c \in C_B} Q(m)Q(c) + \sum_{m \in M_B} \sum_{v \in V} Q(m)Q(v)\\ &= 1 + F_B(x, t)C_B(x, t) + \left|V\right|x \left(F_B(x, t)\right)\\ &= \frac{1}{1 - \left|V\right|x - C_B(x, t)} \end{align*}

Where \( e \) is the (unique) empty marked word; note also the union is disjoint. We also wave hands a bit for \( v \in V \), these always correspond to exactly one marked word given all elements of \( B \) have length greater than one.

Thus, calculating \( F_B(x, b) \) reduces to calculating \( C_B(x, t) \). We can group clusters according to their last bad word \( b \). For some cluster \( c = (w, S) \), the cluster must then either consist solely of \( b \) (which implies \( w = b \)), else we can remove \( b \) along with some suffix of \( w \) to produce a smaller cluster.

For each \( b \in B \) let \( C_B[b] \) denote the set of clusters ending in \( b \), with \( C_B[b](x, t) \) defined similarly. Then \( C_B[b](x, t) \) form a SLE, for example for \( B = \{HELE, ELEM\} \), we have:

\begin{align*} C_B[ELEM](x, t) &= C_B[HELE](x, t)xt + C_B[HELE](x, t)x^3t + x^4t\\ C_B[HELE](x, t) &= x^4t \end{align*}

Which results in:

\begin{align*} C_B(x, t) &= x^4t + x^4t(xt + x^3t + 1)\\ F_B(x, t) &= \frac{1}{(1 - 26x) - (x^4t + x^4t(xt + x^3t + 1))} \end{align*}

Now, recovering \( f_B(x) \) from \( F_B(x, t) \) is equivalent to substituting \( t = -1 \) (exercise!), resulting in:

\[ f_B(x) = \frac{1}{1 - x^7 - x^5 - 2x^4 - 26x} \]

Sample Sage implementation:

import string


def goulden_jackson(bad_words, alphabet=string.ascii_uppercase):
    s, gfvs = var("s"), {w: var(f"G_{w}") for w in bad_words}
    eqns = []
    for end_word in bad_words:
        eq = -s^(len(end_word))
        for i in range(1, len(end_word) + 1):
            sub = end_word[:i]
            for source_word in bad_words:
                if source_word.endswith(sub):
                    eq += -s^(len(end_word) - len(sub))*gfvs[source_word]
        eqns.append(eq == 0)

    soln = solve(eqns, *gfvs.values())
    CB = sum(eq.right() for eq in (soln[0] if len(bad_words) > 1 else soln))
    G = 1 / (1 - len(alphabet)*s - CB)
    return G.numerator() / G.denominator()

Our overlap checking is not optimised, we could do better with a suffix tree when bad_words is large. We could also further exploit symmetry, e.g. we must always have \( C_B[abb](x, t) = C_B[cbb](x, t) \).

Examples

PGF for the First Occurrence of a Binary String

For some binary string \( w = w_1 \dots w_l \), let \( G(x) = \sum_{n = 1} p_n x^n \) where \( p_n \) is defined as the probability that the first occurrence of the string \( w \) in a random infinite binary string starts at \( n \).

Then the number of binary strings of length \( n \) where the first occurrence of \( w \) occurs at the last \( l \) characters is given by the number of binary strings of length \( n \) which do contain \( w \) as a substring, minus the the number of binary strings of length \( n - 1 \) which contain \( w \) as a substring:

\begin{align*} (2^n - \left[x^n\right]f_{\{w\}}(x)) - (2^{n - 1} - \left[x^{n - 1}\right]f_{\{w\}}(x)) \end{align*}

Adjusting by \( l \) since we want the character where \( w \) starts:

\begin{align*} p_n = \frac{(2^{n + l} - \left[x^{n + l}\right]f_{\{w\}}(x)) - (2^{n + l - 1} - \left[x^{n + l - 1}\right]f_{\{w\}}(x))}{2^{n + l - 1}} \end{align*}

Summing and multiplying through by \( x \) to account for moving from \( 0 \) to \( 1 \) indexing:

\begin{align*} G(x) &= 2x(1 - x)\frac{\frac{1}{1 - 2x} - f_{\{w\}}(x)}{2^lx^l}\bigg\rvert_{x=\frac{x}{2}}\\ &= x\left(1 - \frac{x}{2}\right)\frac{\frac{1}{1 - x} - f_{\{w\}}(\frac{x}{2})}{x^l} \end{align*}

A Weighted Penney's Game

Consider a game between two players which consists of a set of rounds consisting of tosses of an unfair coin (say heads = \( p \)). Player 1 wins the round if the result is heads, and player 2 similarly for tails. A player wins the game if they reach \( k \) consecutive round wins. What is the probability player 1 wins? Markov chains may be a bit cleaner for this example, but we'll show how GJ can be applied anyway!

We will first calculate the number of strings of length \( n \) containing a given string \( w_1 \), and not containing a second string \( w_2 \) (denote this criteria \( C_1 \)). This corresponds to a sequence of tosses containing \( k \) \( H \)s in a row, but not \( k \) \( T \)s. Then we see how this allows us to calculate number of strings of length \( n \) containing a given string \( w_1 \) only as a suffix, and not containing a second string \( w_2 \) (\( C_2 \)), which will give us our result.

Let \( S_n \) consist of the set of strings satisfying \( C_1 \). Then if \( V_n \) is the set of strings of length \( n \) not containing \( w_2 \), and \( U_n \) is the set of strings of length \( n \) not containing \( w_1 \) or \( w_2 \), what is \( W_n = V_n - U_n \)?

We'll show by the classic subset argument \( W_n = S_n \). Suppose \( w \) is a target string, e.g. it contains \( w_1 \) and not \( w_2 \). Then \( w \in V_n \) since it doesn't contain \( w_2 \), and also \( w \not \in U_n \) since it contains \( w_1 \), which implies \( S_n \subseteq W_n \). Similarly, if \( w \in V_n\) then \( w \) does not contain \( w_2 \), and \( w \not \in U_n \) implies \( w \) must contain \( w_1 \) since we now it doesn't contain \( w_2 \); thus \( W_n \subseteq S_n \).

Since \( U_n \subseteq V_n \) we must have:

For brevity let \( x_n = \left|X_n\right| \). Since \( v_n, u_n \) are just substring exclusion problems, we can use our methods to calculate \( s_n \). But now how do we calculate the number of substrings of length \( n \) which don't contain \( w_2 \), and contain \( w_1 \) only as a suffix? Like in the previous example, we may be tempted to say "subtract \( 2s_{n - 1} \) from \( s_n \) to account for adding \( H \) or \( T \) to any \( w \in S_{n - 1} \)", but this is not correct since appending \( T \) to \( w \in S_{n - 1} \) may result in a string ending in \( k\ T\)s.

Let \( S(x, y) = \sum_{n, m} s_{n, m} x^ny^m \) where \( s_{n, m} \) is the number of strings with \( n \) \( H \)s and \( m \) \( T \)s satisfying \( C_1 \) (we know this one). Further let \( T(x, y) = \sum_{n, m} t_{n, m} x^ny^m \) where \( t_{n,m} \) is as like \( s_{n, m} \), but with added condition that the string ends in \( T \); and let Further let \( K(x, y) = \sum_{n, m} k_{n, m} x^ny^m \) where \( k_{n,m} \) is as like \( s_{n, m} \), but with added condition that the string ends in \( k - 1 \) lots of \( T \)s. Then the following holds:

\begin{align*} T(x, y) &= yS(x, y) - yK(x, y)\\ K(x, y) &= y^4S(x, y) - y^4T(x, y)\\ P(x, y) &= (1 - x - y)S(x, y) + K(x, y) \end{align*}

Where \( P(x, y) \) is the generating function we want (e.g. satisfying \( C_2 \)), which results in the following:

\[ P(x, y) = \left(1 - x - y + y \frac{y^{k - 1} - y^k}{1 - y^k}\right)S(x, y) \]

Now the probability of player 1 winning is just \( P(p, (1 - p)) \).

Questions

Is the following equivalent to our definition of a cluster: "Define \( (w, S) \) as a cluster if it cannot be decomposed as the concatenation (defined how you would expect) of two nonempty marked words"?
How may we find the generating function of the number of words of length \( n \) in which every letter must be contained in a bad word?

Exploring Proced

2022-12-26T10:00:00+00:00

I was searching in Github in vain for a tool which would I could use as a process monitor, until I found that a tool already exists, and is in fact already shipped with Emacs: proced.el (written by Roland Winkler). To start, we can kick off a Proced buffer by M-x proced, and by default we'll be greeted by something like:

Within this buffer, we can perform many useful process management operations:

Key	Action
`k`, `x`	Send a signal to the process under point
`f`	Filter processes (for example, `user-running` will only show processes owned by you which are running)
`F`	Choose between a collection of preset and user-defined attributes to show for each process (called formats)
	Refine the current list of processes according to attribute of the process under point (see `proced-grammar-alist` for some more information on how this works - for example pressing `ENTER` on the memory column of a given process will change it so that only processes with memory `>=` to the given process will be shown)
`m` / `u`	Mark/unmark the process at point, `M` / `U` mark/unmark all processes
`P`	Mark a process and its parents
`t`	Toggles marks
`r`	Renice process at point

Many of these commands will use marked processes instead of the process at point if any marked processes exist.

By default, processes will by sorted by CPU usage, this can be changed using s, followed by one of c to sort by CPU, m to sort by memory, p to sort by process ID, s to sort by start time, t to sort by time (= system time + user time), u to sort by user, and finally S will prompt you to choose a sort time based on all process attributes (even if they aren't present in the current format).

Customisation

Off the bat, by default the Proced buffer will not update automatically. An update can be manually triggered via g, but to emulate something similar to top / htop behaviour we can set:

(setq-default proced-auto-update-flag t)
(setq proced-auto-update-interval 1)

proced-auto-update-flag enables auto updating the Proced buffer (by default) every five seconds, and we use proced-auto-update-interval to shorten this to every second. We need setq-default for the first of these rather than setq since proced-auto-update-flag is a buffer-local variable (we can make use of this by calling proced-toggle-auto-update within a Proced buffer which will toggle auto-update without changing the global value of proced-auto-update-flag). I'm also not a fan of the default formats, but it's easy to define one yourself and set this as the default:

(add-to-list
 'proced-format-alist
 '(custom user pid ppid sess tree pcpu pmem rss start time state (args comm)))
(setq-default proced-format 'custom)

The car of the value you're adding is the name of the new format, and the other symbols are values which appear in list-system-processes (for more information see proced-format-alist). list-system-processes also gives a nice rundown on the meaning of each attribute. You can also add your own custom attributes, here's a great example I found in legoscia's Emacs config.

Something else you may notice is that moving down a row also sets the column you're in to args, personally I find this annoying, but you can turn this off:

(setq proced-goal-attribute nil)

The final cherry on top is that from Emacs 29 onwards, you can enable colouring for various attributes:

(setq proced-enable-color-flag t)

Which, using our new default format, leaves us with:

Full customisation (using use-package's :custom to handle the vagaries of global and buffer-local variable customisation, thanks to u/deaddyfreddy from reddit for this)

(use-package proced
  :ensure nil
  :commands proced
  :bind (("C-M-p" . proced))
  :custom
  (proced-auto-update-flag t)
  (proced-goal-attribute nil)
  (proced-show-remote-processes t)
  (proced-enable-color-flag t)
  (proced-format 'custom)
  :config
  (add-to-list
   'proced-format-alist
   '(custom user pid ppid sess tree pcpu pmem rss start time state (args comm))))

Rolling Your Own Formatting for Attributes

proced-grammar-alist opens the door for a lot of control over how attributes are shown in Proced buffers. The documentation goes into a lot of detail, but I'll provide a quick example here to give an idea.

Suppose our goal is to set the colour of Java executables in the args column to that strange orangey-brown colour that everyone seems to associate with Java. We can start by first defining our format function:

(defun my-format-java-args (args)
  (pcase-let* ((base (proced-format-args args))
               (`(,exe . ,rest) (split-string base))
               (exe-prop
                (if (string= exe "java")
                    (propertize exe 'font-lock-face '((t (:foreground "#f89820"))))
                  exe)))
    (mapconcat #'identity (cons exe-prop rest) " ")))

Now, we just need to tell proced-grammar-alist to use this function for the args attribute:

(setf (alist-get 'args proced-grammar-alist)
      '("Args"               ; name of the column
        my-format-java-args  ; format function
        left                 ; alignment within column
        proced-string-lessp  ; defines the sort method (ascending)
        nil                  ; non-nil reverses sort order
       (args pid)            ; sort scheme
       (nil t nil)))         ; refiner for custom refinement logic - see proced-refine

And you should see the results straight away:

Remote Systems

Thanks to Michael Albinus, from Emacs 29 onwards invoking proced when default-directory is remote (for example, your current buffer points to a remote file) and proced-show-remote-processes is non-nil, will prompt Proced to show processes from the remote system instead of your local machine, which can make proced a lot more useful when working with tramp.

Ponder This Nov 22

2022-11-09T10:18:00+00:00

The Question

You can view the question here: https://research.ibm.com/haifa/ponderthis/challenges/November2022.html. It asks suppose we are given a draw of \( b \) socks with \( a \) comfortable items and the remaining \( b - a \) uncomfortable, what is the smallest value of \( b \) of at least 100 digits such that the probability of drawing two comfortable socks is exactly \( \frac{1}{974170} \)?

Solution

For brevity, lets let \( k = 974170 \). Now, the first part of question is equivalent to finding \( a,b \) such that: \[ \frac{a}{b} * \frac{a - 1}{b - 1} = \frac{1}{k} \]

Multiplying out:

\begin{align*} ka(a - 1) = b(b - 1) &\iff b^2 + (-b) + (-ka^2 + ka) = 0 \\ &\iff b = \frac{1 \pm \sqrt{1 + 4ka^2 -4ka}}{2} \end{align*}

Where the second step just follows from the quadratic formula. Clearly, if \( b \) is positive we can disregard the negative sign, and note that radicand is always odd, hence if it's square, then the root will also be odd and thus the numerator even, and hence \( b \) will be a whole number.

So our problem reduces to finding \( a \) values such that the expression \( 1 + 4ka^2 -4ka \) is itself square. Now suppose \( 1 + 4ka^2 - 4ka = x^2 \) for some \( x \). Then we can complete and square and rearrange to obtain:

\begin{align} x^2 - k(2a - 1)^2 = -k + 1 \end{align}

Which is of the form of a generalised Pell's equation (letting \( y = 2a - 1, -k + 1 = N \)), which are quite tricky to solve. Solutions generalised Pell's equations can be grouped into seperate classes, and the fundamental solutions of each class can then be used to generate all solutions for that particular class. A fast method for solving these equations can be found here.

We find the fundamental solutions for each class in our case to be:

\[ (x, y) \in \{(-229969, 233), (-1, 1), (1, 1), (229969, 233), (974169, 987)\} \]

In order to determine all solutions for each class, consider \( r = x + y\sqrt{974170} \). We see \( (x, y) \) is a solution to (1) iff \( M(r) := r \bar{r} = (x + y\sqrt{974170})(x - y\sqrt{974170}) = -974169 \). Now it is straighforward to show \( M: \mathbb{Z}[\sqrt{974170}] \to \mathbb{Z} \) is multiplicative, and so if we can find an \( r' = a + b\sqrt{974170} \in \mathbb{Z}[\sqrt{974170}] \) such that \( M(r') = 1 \) we can generate an infinite stream of solutions.

However, \( M(r') = 1 \iff (a + b\sqrt{974170})(a - b\sqrt{974170}) = 1 \iff a^2 -974170b^2 = 1 \) which is just a regular Pell's equation! This has fundamental solution \( (a, b) = (1948339, 1974) \) and so we can formulate new solutions to (1) using \( x_n + y_n\sqrt{974170} = (1948339 + 1974\sqrt{974170})^n(x + y\sqrt{974170}) \) and it turns out all solutions are of this form when \( (x, y) \) are the fundamental solutions to (1).

\( (x_n, y_n) \) grow exponentially in relation to \( n \), so computing \( y \) as to make \( b \) of at least 100 digits can be done extremely quickly. The smallest such \( (x, y) \) are:

\begin{align*} x &= 358215987739182004690086378181625469679573777244481977174053689999534311804982937308746736066326142199 \\ y &= 362933945169315204684486536809603899412307756418124003349633249331941231267453010603520849884063709 \end{align*}

Finding the corresponding \( a \) and \( b \) is left to the reader (:

Text Wrap Hacks for Markdown

2022-06-18T00:00:00+00:00

The `textwrap` Module

The Python standard library ships a neat little module for line wrapping text:

>>> import textwrap
>>> textwrap.wrap("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do',
 'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad',
 'minim veniam, quis nostrud exercitation ullamco laboris nisi ut',
 'aliquip ex ea commodo consequat. Duis aute irure dolor in',
 'reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla',
 'pariatur. Excepteur sint occaecat cupidatat non proident, sunt in',
 'culpa qui officia deserunt mollit anim id est laborum.']

And it's pretty extensible too, you can subclass textwrap.TextWrapper to control how words are split amongst other things.

Hacking: Markdown

This is nice and all, but what if I was wrapping some kind of markdown, specifically Github Markdown? The biggest problem here would be wrapping links, whose character length far exceeds the length of the link description (which would be rendered):

>>> import textwrap
>>> textwrap.wrap("`avy` is a GNU Emacs package for jumping to visible text using a char-based decision tree.  See also [ace-jump-mode](https://github.com/winterTTr/ace-jump-mode) and [vim-easymotion](https://github.com/Lokaltog/vim-easymotion) - `avy` uses the same idea.")
['`avy` is a GNU Emacs package for jumping to visible text using a',
 'char-based decision tree.  See also',
 '[ace-jump-mode](https://github.com/winterTTr/ace-jump-mode) and',
 '[vim-easymotion](https://github.com/Lokaltog/vim-easymotion) - `avy`',
 'uses the same idea.']

But if we were to use this, it would be rendered as:

`avy` is a GNU Emacs package for jumping to visible text using a
char-based decision tree.  See also
ace-jump-mode and
vim-easymotion - `avy`
uses the same idea.

Essentially, we want the length of Github links to be taken from the length of their descriptions alone. It isn't immediately clear how this can be achieved looking at textwrap.TextWrapper, but digging a little deeper in the textwrap module source code we see:

while chunks:
    l = len(chunks[-1])

    # Can at least squeeze this chunk onto the current line.
    if cur_len + l <= width:
        cur_line.append(chunks.pop())
        cur_len += l

    # Nope, this line is full.
    else:
        break

And further, chunks is set from a TextWrapper._split call. This suggests if we create our own TextWrapper subclass and identify links in our _split method, we can set the length by somehow jumbling link strings so they return the length of their description rather than their true length on a len call. We can do this by subclassing str:

import re
from textwrap import TextWrapper


class MarkdownLink(str):
    def __new__(cls, url, description):
        obj = str.__new__(cls, f"[{description}]({url})")
        obj.url = url
        obj.description = description
        return obj

    def __len__(self):
        return len(self.description)


class MarkdownTextWrapper(TextWrapper):
    """A TextWrapper which handles markdown links."""

    LINK_REGEX = re.compile(r"(\[.*?\]\(\S+\))")
    LINK_PARTS_REGEX = re.compile(r"^\[(.*?)\]\((\S+)\)$")

    def _split(self, text):
        split = re.split(self.LINK_REGEX, text)
        chunks: List[str] = []
        for item in split:
            match = re.match(self.LINK_PARTS_REGEX, item)
            if match:
                chunks.append(MarkdownLink(match.group(2), match.group(1)))
            else:
                chunks.extend(super()._split(item))
        return chunks

Lets use it:

>>> import textwrap
>>> textwrap.wrap("`avy` is a GNU Emacs package for jumping to visible text using a char-based decision tree.  See also [ace-jump-mode](https://github.com/winterTTr/ace-jump-mode) and [vim-easymotion](https://github.com/Lokaltog/vim-easymotion) - `avy` uses the same idea.")
['`avy` is a GNU Emacs package for jumping to visible text using a char-based',
 'decision tree.  See also [ace-jump-mode](https://github.com/winterTTr/ace-jump-mode) and [vim-easymotion](https://github.com/Lokaltog/vim-easymotion) - `avy` uses the same',
'idea.']

Which renders as:

`avy` is a GNU Emacs package for jumping to visible text using a char-based
decision tree.  See also ace-jump-mode and vim-easymotion - `avy` uses the same
idea

Nice! 👌

Row-Spanning for ImageMagick montage, Sort of…

2021-11-05T14:24:00+00:00

Montage

ImageMagick ships the montage command as a way of creating composite images. For example (stolen from their site):

For libro-finito, I'm interested in stitching several images together on a grid to try and replicate something like the image Goodreads' year in books:

Essentially books tiled together with some appearing larger according to some heuristic (rating would be a good choice!).

Let's Start 🔨

Given some images, if we want three images per row, something like:

montage -geometry +0+0 starship-troopers.jpeg a-spell-for-a-chameleon-l.jpeg the-caves-of-steel.jpeg the-count-of-monte-cristo.jpeg a-wizard-of-earthsea.jpeg tmp.jpeg

Gives us:

ImageMagick has done its best to set the distance between images as zero, but this appears to have lead to inconsitency of column sizes. Generally different sized images aren't going to work too nicely with one another.

In the example above I've made it so that larger images are 4x the size of the smaller ones (ie 2x width and 2x height). A "nice" tiling is possible if we make all of the images the same size… by splitting up the larger images into quarters.

We split larger images into four smaller ones, taking care that we always align the bottom two images on the same columns below the top two images. I've offloaded this tiling specification of our "split" images to a script.

But does this work? Yes!

Integral Polynomial Interpolation

2021-07-30T19:48:00+00:00

Question

It's well known that given some sequence points of \( n \) points \( (x_i, y_i) \) with \( x_i \) distinct, that there exists a unique polynomial \( P \) of degree \( d < n \) such that for all \( i \), \( P(x_i) = y_i \), the Lagrange Interpolating Polynomial.

If all \( x_i, y_i \) are integers, we can glean from the Lagrange Interpolation Formula that it is guaranteed that \( P(X) \in \mathbb{Q}[X] \). But does there exist a polynomial \( F(X) \) of higher degree satisfying the same property (\( F(x_i) = y_i \)) such that its coefficients are all integers?

Solution

Strangely enough there exists no such polynomial if \( P \) has some non integer coefficient. Proof is thanks to this mathoverflow answer:

First define \( D(X) \in \mathbb{Z}[X] \) as \( D(X) = \prod (X - x_i) \), ie the monic polynomial of degree \( n \) whose roots are the \( x \) coordinates of our points.

Next, suppose \( F(X) \) has coefficients all integers. Since \( D(X) \) is monic we can write:

\[ F(X) = D(X)Q(X) + R(X) \]

Where \( Q(X), R(X) \in \mathbb{Z}[X] \), and also \( deg(R(X)) < deg(D(X)) \).

Now we must have \( F(x_i) = R(x_i) \), but this implies that \( R(X) \) is the Lagrange interpolating polynomial since it has degree less than \( n \) and satisfies \( R(x_i) = P(x_i) \) for all \( i \). Thus we can re-write:

\[ P(X) = F(X) - D(X)Q(X) \]

Which implies that \( P(X) \in \mathbb{Z}[X] \) since that latter set is closed under addition and multiplication.

An Accumulated Sequence

2020-04-08T00:00:00+00:00

Question

Define the sequence \( (a_n)_n = 1 \). Now consider taking the first 4 elements of the sequence: \[ 1 \ \ 1 \ \ 1 \ \ 1 \] Now throw away the first element to get: \[ 1 \ \ 1 \ \ 1 \] Set the new second element to the sum of the first element and old second element, and similarly set the new third element to the sum of the old third element and new second element and so on to get: \[ 1 \ \ 2 \ \ 3 \] If we then apply the same process to this new sequence we get: \[ 2 \ \ 5 \] And so if we apply the process until we are left with one element we get 5. This can easily be generalised to starting with n lots of ones, and applying the process n - 1 times to be left with one element, \( f(n) \). Can we find a closed form for \( f(n) \)?

Recursions

If we stack each pass of the process on top of one another, we obtain the diagram:

\begin{array}{|c|c|c|c|} \hline 1 & 1 & 1 & 1 \\\hline & 1 & 2 & 3 \\\hline & & 2 & 5 \\\hline & & & 5 \\\hline \end{array}

Note starting with more 1s in the first row simply extends the diagram to the right and does not change the triangle left of the new column. It can be seen from our definition above, each element in the diagram (above the first row) is equal to the sum of the element above it and the element immediately to the left of it, save for the elements we are interested in, which are equal solely to the elements above them.

However, we can incorporate these elements into the relation by making the following edit to our diagram:

\begin{array}{|c|c|c|c|} \hline 1 & 1 & 1 & 1 \\\hline 0 & 1 & 2 & 3 \\\hline & 0 & 2 & 5 \\\hline & & 0 & 5 \\\hline \end{array}

Let \( n \) span accross the columns, and \( k \) span accross the rows (like Pascal's triangle), ie so that \( f(3, 1) = 3 \) and consider the diagram again. We obtain:

Formally:

\begin{align*} &f(n, 0) = 1 &\forall n \\ &f(n, k=n + 1) = 0 &\forall n > 0 \\ &f(n, k) = f(n - 1, k) + f(n, k - 1) &1 \le k \le n \\ \end{align*}

We can reason from the above recursion that each number in the diagram can be written as some sum of the leftmost nonzero elements on each row, added to some sum of elements in the first row (\( k = 0 \)), which also happen to be equal to the leftmost element of the first row (\( f(0, 0) = 1 \)). These are the numbers we are interested in. ie:

\[ f(n) = \sum_{r=1}^{n-1} A_r * f(r) \]

Where \( A_r \) is some constant to be determined. Now we can consider how each \( f(r) \) contributes to \( f(n) \) via the following diagram:

\begin{array}{|c|c|c|c|} \hline f(r) \rightarrow & f(r) + B_1 \rightarrow \downarrow & f(r) + C_1 \rightarrow \downarrow & f(r) + D_1 \rightarrow \downarrow \\\hline 0 & f(r + 1) & f(r) + B_2 \rightarrow & 2f(r) + C_2 \rightarrow \downarrow \\\hline & 0 & f(r + 2) & 2f(r) + B_3 \rightarrow \\\hline \end{array}

Where \( C, B, D \) are constants consisting of sums of other \( f(s) \). Note we do not arrow constributions to \( f(r+1) \) as these are the source of the contributions and we need to keep these "atomic". We can observe from this diagram that:

\[ A_r = f(n - r) \]

This also works for the base row, as we can just rewrite 1 = f(1) (thus all constants are 0), and hence:

\[ f(n) = \sum_{r=1}^{n - 1} f(r)*f(n - r) \]

To Closed Form

First of all lets let \( f(0) = 0 \) which will make things a little simpler, now let's define the generating function:

\[ F(x) = \sum_{n=0} f(n)x^n \]

And now sum our recursion over all \( n \) where it's valid (\( n > 2 \)):

\[ \sum_{n=2} f(n)x^n = \sum_{n=2}\sum_{r=0}^{n} f(r)*f(n - r)x^n \]

Simplifying the LHS and using the definition of multiplication in the ring of formal power series on the RHS:

\[ F(x) - x = \sum_{n=0}f(n)x^n \sum_{n=0}f(n)x^n \]

Simplifying and collecting terms:

\[ F(x)^2 - F(x) + x = 0 \]

Quadratic formula (neglecting + sign, this will give us \( f(0) = 1 \) ):

\[ F(x) = \frac{1 - \sqrt{1 - 4x}}{2} \]

Attempt to extract \( x \) coefficients using binomial expansion:

\[ F(x) = \frac{-1}{2}\left( \frac{\frac{1}{2}}{1!}\left(-4x\right) + \frac{\frac{1}{2}\frac{-1}{2}}{2!}\left(-4x\right)^2 + ... \right) \]

After some arduous simplifications, we find the general \( x \) coefficient is given by:

\[ \frac{(2n - 2)!}{n! * (n - 1)!} \]

Which is the (n - 1)th Catalan number, ie \( f(n) = C_{n - 1} \).

Splitting Up a Number

2020-04-06T00:00:00+00:00

Question

Given some \( n \in N \), how can I partition \( n \) into other natural numbers which sum n such that the total product of all elements in the partition is maxmimised?

Solution

Consider some valid partition \( P \) of \( n \):

\[ P = (a_1, a_2, ...) \]

And let:

\[ f(P) = a_1 * a_2 * ... \]

Note that all natural numbers > 1 can be written as a sum of 2s and 3s exclusively, ie

\[ a_1 = 2 + 2 + ... + 3 + 3 + ... \]

What we will now show is that taking all the elements in the partition and breaking them up into 2s and 3s, will give a better partition.

Lemma 1: The product of some sequence of 2s and 3s is always greater or equal to the corresponding sum with equality holding iff the sequence is (2, 2), (2,) or (3,). Proof:

\[ 2 * 2 = 2 + 2 \] \[ 3 * 2 > 3 + 2 \] \[ 3 * 3 > 3 + 3 \]

Hence if:

\[ P_a_1 = (2, 2, ..., 3, 3, ...) \]

Then

\[ ka_1 \leq k f(P_a_1) \ \forall k \in R+ \]

With equality holding iff \( a_1 < 5 \). From this it is clear that the optimal partition will consist of some sequence of 2s and 3s. For most numbers however there are many ways to split a number into sums of 2s and 3s. We will now show how to find the optimal partition. Note:

\[ 3 + 3 = 2 + 2 + 2 \] \[ 3 * 3 > 2 * 2 * 2 \]

This implies that there will be no more than two lots of 2s in the optimal partition. If \( 3 | n \) then the optimal partition is three lots of \( \frac{n}{3} \). If we consider trying to improve upon this partition by breaking up a 3 into a 2 and a 1 , we decrease the product. And if we try to improve upon the partition by combining 3s we decrease the product by lemma 1. Therefore the partition is optimal in this case. Now consider:

\[ 1) n \equiv 1 \ mod \ 3 \Rightarrow \exists k \in N \ s.t \ n = 3k + 1 \] \[ 2) n \equiv 2 \ mod \ 3 \Rightarrow \exists k \in N \ s.t \ n = 3k + 2 \]

For 1) we can choose from (1, 3 k times) or (2, 2, 3 (k - 1)) times. Given 2 * 2 > 3 * 1 we can identify the latter case as optimal.

For 2) the only option containing no ones or less than three 2s is (2, 3 k times).

Question Extension

What if \( n \in R+ \) and the elements of the partition can be any positive real?

Extension Solution

Unlike the previous solution we will use some calculus. Consider:

\[ (x, x), \ (x - a, x + a) \]

Where \( x, a \in R^+ , \ x > a \).

Note the two tuples have the same sum but:

\[ x^2 > x^2 - a^2 \]

From this it is apparent that all elements of the optimal partition must be the same number. Let this number be \( k_n \) Consider the function:

\[ f(x) = x^\frac{n}{x} \]

For some constant \( n \). Then \( k_n \) is the maximum of this function with the added restriction that \( x | n \).

Rewrite \( f(x) \):

\[ f(x) = e^\frac{n\ln{x}}{x} \]

We find the stationary point by differentiating and equating to 0:

\[ \left(\frac{n}{x^2} - \frac{n\ln{x}}{x^2}\right) e^\frac{n\ln{x}}{x} = 0 \]

\( f(x) \not = 0 \ \forall x \in R \) hence we have:

\[ \left(\frac{n}{x^2} - \frac{n\ln{x}}{x^2}\right) = 0 \]

\[ \Rightarrow ln{x} = 1 \]

\[ \Rightarrow x = e \]

Hence the sole stationary point of the function occurs at \( x = e \) and is independent of \( n \). If we note that:

\[ \lim_{x \to 0+}f(x) = \lim_{x \to \infty}f(x) = 0 \]

Then we can identify the stationary point as a global maximum of the function, and reason that our desired k_n lies somewhere around \( e \).

lw’s blog

Counting Strings and the Goulden-Jackson Cluster Method

Intro

A Derivation

Computing \( \Phi_R \)

Example

The Goulden-Jackson Cluster Method

Clusters

A Formula

Examples

PGF for the First Occurrence of a Binary String

A Weighted Penney's Game

Questions

Links

Exploring Proced

Customisation

Rolling Your Own Formatting for Attributes

Remote Systems

See Also

Ponder This Nov 22

The Question

Solution

Text Wrap Hacks for Markdown

The `textwrap` Module

Hacking: Markdown

Row-Spanning for ImageMagick montage, Sort of…

Montage

Let's Start 🔨

Integral Polynomial Interpolation

Question

Solution

An Accumulated Sequence

Question

Recursions

To Closed Form

Splitting Up a Number

Question

Solution

Question Extension

Extension Solution

lw’s blog

Counting Strings and the Goulden-Jackson Cluster Method

Intro

A Derivation

Computing \( \Phi_R \)

Example

The Goulden-Jackson Cluster Method

Clusters

A Formula

Examples

PGF for the First Occurrence of a Binary String

A Weighted Penney's Game

Questions

Links

Exploring Proced

Customisation

Rolling Your Own Formatting for Attributes

Remote Systems

See Also

Ponder This Nov 22

The Question

Solution

Text Wrap Hacks for Markdown

The textwrap Module

Hacking: Markdown

Row-Spanning for ImageMagick montage, Sort of…

Montage

Let's Start 🔨

Integral Polynomial Interpolation

Question

Solution

An Accumulated Sequence

Question

Recursions

To Closed Form

Splitting Up a Number

Question

Solution

Question Extension

Extension Solution

The `textwrap` Module