## Forbidden Words

Let $\al$ denote a binary string, and let $L$ denote the set of binary of binary strings that do not contain $\al$ as a substring. How many strings are there in $L$ with length $n$?

Let $M$ denote the binary strings that contain exactly one copy of $\al$, as a suffix. If $\eps$ denotes the empty string, then $$\label{eq:eL01} \eps+L(0+1) = L+M.$$ Also $L\al$ clearly contains $M$ as a subset, but for example if $\al=111$, then $L\al =M+M1+M11.$ We develop a precise description of $L\al$. Let $\al\backslash\be$ denote the set of suffixes $\be_1$ such that $\be_0\be_1=\be$ and $\be_0$ is a suffix of $\al$. For example if $\al =100110,\quad \be=1011,$ then $\al\backslash\be = \{11\},\quad \al\backslash\al = \{\eps,0110\}$ Lemma $L\al = M\al\backslash\al.$

Note that $\al\backslash\al$ is a finite language, i.e., a finite set of strings.

If $N$ is a set of binary strings, let $N(t)$ denote its generating series. This is what we get when we substitute $t$ for $0$ and $1$, a string of length $\ell$ maps to $t^\ell$ and the language maps to its generating series. Now \eqref{eq:eL01} yields $1+2t L(t) = L(t)+M(t)$ and, if $D=\al\backslash\al$ and $|\al|$ denotes the length of $\al$, the lemma yields that $L(t)t^{|\al|} = M(t) D(t).$ Lemma $L(t) =\left(1-2t+\frac{t^{|\al|}}{D(t)}\right)^{\!-1}.$

Let's use this to compute some numbers. We first set up the ring of formal power series over the rationals.

Rt.<t> = QQ[[]]; Rt

Power Series Ring in t over Rational Field


If $\al=111$, then $D(t)=1+t+t^2$ and $L(t)$ is

L = ((1-2*t+ t^3/(1+t+t^2))^(-1)).O(9); L

1 + 2*t + 4*t^2 + 7*t^3 + 13*t^4 + 24*t^5 + 44*t^6 + 81*t^7 + 149*t^8 + O(t^9)


As an exercise, write a program which takes a binary string $\al$ and computes the generating series for $\al\backslash\al$. (The string arrives as a Python string, there is no reason why your procedure should not work for strings over an arbitrary alphabet.)

The square root of the series $L(2t)$ has non-negative integer coefficients.

L.subs(t=2*t).sqrt()

1 + 2*t + 6*t^2 + 16*t^3 + 54*t^4 + 180*t^5 + 596*t^6 + 2048*t^7 + 7062*t^8 + O(t^9)


Does this hold for other forbidden substrings?

For a second variant, suppose $\cF=\{\seq\al1d\}$ is a set of binary strings. Let $L$ be the set of strings that contain no element of $\cF$ and let $M_i$ denote the set of strings that contain one copy of $\al_i$, as a suffix, bit do not contain any other copies of strings in $\cF$. If $|\cF|=2$, we have equations \begin{align*} \eps+L(0+1) &= L +M_1 +M_2\\ L\al_1 &= M_1(\al_1\backslash\al_1) +M_2(\al_1\backslash\al_2)\\ L\al_2 &= M_1(\al_2\backslash\al_1) +M_2(\al_2\backslash\al_2). \end{align*} Write a program that, given $\al_1$ and $\al_2$, computes the generating series for $L$, $M_1$ and $M_2$.