# Documentation for the structure function calculator

$h^*_x(m)=q$ means that in order to have at most $\text{(alphabet size)}^{m}$ accepting paths of length $n=|x|$, we need $q$ states.

#### Rules

• We are considering single-run complexity where the automaton must proceed to a fresh state except
that there is one state that may be repeated.
• Thus: the automaton need not be deterministic; a run may be abandoned midway;
• the run can start before we get to the run-state;
• runs that are of less than maximal length of their “flavor” (such as 322023 which is shorter than 20322023 in the given example) do count (they are in any case insignificant for statistical model selection purposes); runs of less than maximal length for their size-of-flavor (such as 2311322 for size 3) do count; in general any subrun of a run counts.

The provided example comes from a series of codons in part of a messenger RNA (mRNA) molecule whose genetic code is $$AUG\, ACG\, GAG\, CUU\, CGG\, AGC\, UAG$$
encoded as
$$x=012032202311322023102$$
which has length $n=21$ and is a string from the alphabet $\{0,1,2,3\}$ which has size $b=4$. There is a run of length $r=8$ from the subalphabet $j=\{0,2,3\}$, namely an occurrence of 20322023. The corresponding automaton has $q=n+1-r=21+1-8=14$ many states and accepts $|j|^r=3^8$ many strings of length $n$. When $3^8\le b^m$ we can then assert that $h^*(m)\le q$. Thus
$$6561 = 3^8\le 4^m$$
Note
$$(4^2,4^3,4^4,4^5,4^6,4^7)= (16, 64, 256, 1024, 4096, 16384)$$
So $h^*(7)\le 14$.

The longest unary ($|j|=1$) run is of length $r=2$. This gives $q=n+1-r=20$ and only one string is accepted so $h^*(m)\le 20$ for all $m$.

The longest binary ($|j|=2$) run is of length $r=4$. This gives $q=n+1-r=18$ and when $2^4\le 4^m$ (i.e., $2\le m$), we can assert $h^*(m)\le q$. So $h^*(2)\le 18$.

What can we say about $h^*(6)$? $4^6=4096$ and
$(3^4,3^5,3^6,3^7,3^8)=(81, 243, 729, 2187, 6561)$. So
$$3^8\le 4^7\Longrightarrow h^*(7)\le 22-8=14$$
$$3^7=2187\le 4^6=4096\Longrightarrow h^*(6)\le 22-7=15$$
$$3^6= 729\le 4^5 = 1024\Longrightarrow h^*(5)\le 22-6=16$$
$$3^5= 243\le 4^4 = 256\Longrightarrow h^*(4)\le 22-5=17$$
$$3^4= 81 \le 4^4 = 256(\Longrightarrow h^*(4)\le 22-4=18)$$
$$3^3= 27\le 4^3 = 64\Longrightarrow h^*(3)\le 22-3=19$$
So we cannot obtain $h^*(3)\le 18$ by a partial ternary run, but we can obtain it because $h^*(2)\le 18$ because of the binary run.

Note
$$3^r\le 4^1\Longrightarrow r\le 1$$
$$2^r\le 4^1\Longrightarrow r\le 2$$
So a binary run of length 2 narrows it down to $4^1$.
And this corresponds to
$$q=n+1-r=21+1-2=20,$$
so $h^*(1)\le 20$.

Some examples of “inversion” (different explanation for the string depending on whether the subalphabet, or the size of the subalphabet, is taken into account):
21112
212220
020001
211120
1333030
32322131
201011311
1010113303
01210312303