Category Archives: research


A computable measure of irregularity

My book published by de Gruyter is scheduled for March 12, 2024.


Theorem 2.28 (The Thue word is overlap-free) is also Theorem 1.3.1 in Shallit’s new book “The logical approach to automatic sequences”. It is proved in Walnut as follows:

[Walnut]$ eval test "Ei,n (n >= 1) & Aj (j<=n) => T[i+j]=T[i+j+n]";
computed ~:1 states - 11ms
computed ~:2 states - 1ms


Theorem 2.36 (overlap-free words have an $A_N$-deficiency bound of 1) uses Theorem 2.25, one direction of which is that overlap-free words can only contain squares whose lengths are of the form $2^a$ or $3\cdot 2^a$, $a\ge 1$.
If that is true then the Thue word better only have squares of such lengths.
After running:

[Walnut]$ eval thuesquare "n >= 1 & Ei Aj (j T[i+j]=T[i+j+n]";


the file


contains some code that we further process with

brew install graphviz

(which takes a long time, and produces many error messages, but not to worry)
and then from the MacOS command line,

$ dot -Tpdf thuesquare.gv > thuesquare.pdf

The PDF looks like this:
We can ignoring the arrow from state 0 to state 0 because it corresponds to initial zeros in the binary representation of the length $n$. Then we accept any word of the form $10^n$ or $110^n$ which is binary representation of $2^n$ and $3\cdot 2^n$, respectively. Phew!

In Lemma 2.29 it is mentioned that the Thue-Morse sequence has no infinite arithmetic progressions.
An infinite arithmetic progression would be characterized by $a$ and $b$ such that
$T[a*k+b]=1$ for all sufficiently large $k$. Since multiplication is not included in the Walnut language, it is not clear that we can express this. But we can express $T[2*k+b]=1$:

eval test "Ei Aj (j>=i) => T[j+j+n]=T[j+j+2+n]"; 

Indeed, Walnut reports that this cannot happen, as follows:
It is easier to do it like this:

[Walnut]$ eval test "En Ei Aj (j>=i) => T[j+j+n]=T[j+j+2+n]";   


If we try to multiply two variables we get the following trouble:

[Walnut]$ eval test "En Ei Aj (j>=i) => T[k*j+n]=T[k*(j+1)+n]";
java.lang.Exception: the operator * cannot be applied to two variables
	at Automata.NumberSystem.arithmetic(
	at Token.ArithmeticOperator.act(
	at Main.Computer.compute(
	at Main.Computer.(
	at Main.Prover.eval_def_commands(
	at Main.Prover.dispatch(
	at Main.Prover.readBuffer(
	at Main.Prover.main(
the operator * cannot be applied to two variables
	: char at 22
	: eval test "En Ei Aj (j>=i) => T[k*j+n]=T[k*(j+1)+n]";


But we can check arithmetic sequences with longer and longer periods:

[Walnut]$ eval test "En Ei Aj (j>=i) => T[3*j+n]=T[3*(j+1)+n]";

[Walnut]$ eval test "En Ei Aj (j>=i) => T[4*j+n]=T[4*(j+1)+n]";


Let us now check if the infinite Fibonacci word is overlap-free.

[Walnut]$ eval test "Ei,n (n >= 1) & Aj (j<=n) => F[i+j]=F[i+j+n]";    


This seems to say there are no overlaps, which is false since the Fibonacci word starts:
$0, 1, 0, 0, 1, 0, 1, 0$ which has overlapping occurrences of $010$.

However, on page 136 it is suggested to add ?msd_fib so let us try that:

[Walnut]$ eval test "?msd_fib Ei,n (n >= 1) & Aj (j<=n) => F[i+j]=F[i+j+n]";


Indeed, it is explained on page 108 that “you may need to add a ?msd specification if your sequence is automatic for other than base 2″.

Now we can check whether the infinite Tribonacci word contains any overlaps:
This takes a while and surprisingly gives an error. However, making a simple change in the formula, as explained in the book, fixes the problem:

[Walnut]$ eval test "?msd_trib Ei,n (n >= 1) & Aj (j<=n) => TR[i+j]=TR[i+j+n]";
computed ~:1 states - 2ms
computed ~:2 states - 1ms
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.ArrayList.grow(
	at java.base/java.util.ArrayList.grow(
	at java.base/java.util.ArrayList.add(
	at java.base/java.util.ArrayList.add(
	at Automata.Automaton.subsetConstruction(
	at Automata.Automaton.minimize_valmari(
	at Automata.Automaton.minimize(
	at Automata.Automaton.quantifyHelper(
	at Automata.Automaton.quantify(
	at Token.LogicalOperator.actQuantifier(
	at Token.LogicalOperator.act(
	at Main.Computer.compute(
	at Main.Computer.(
	at Main.Prover.eval_def_commands(
	at Main.Prover.dispatch(
	at Main.Prover.readBuffer(
	at Main.Prover.main(
(base) Bjrns-MacBook-Pro:bin bjoernkjos-hanssen$ java Main/Prover

[Walnut]$ eval test "?msd_trib Ei,n (n >= 1) & Aj (i <= j & j <= i+n) => TR[j]=TR[j+n]";
computed ~:1 states - 9ms
computed ~:2 states - 1ms


Actually, the Tribonacci word is
$0, 1, 0, 2, 0, 1, 0, 0, 1, 0, 2, 0, 1, 0, 1, 0\dots$
so finding an overlap should be trivial. The situation is explained on page 111.
When we consider TR[i+j+n] the extra variable compared to TR[j+n] leads to an exponentially larger automaton being generated. Thus, this is not really the implementation of Walnut’s fault but a feature of the underlying algorithm.

We can ponder which prefixes of the infinite Fibonacci word are of the form $xyxyx z uvuvu$.
Arithmetically this says that we have a number $n$ (the length) and numbers $l_x, l_y>0, l_z>0, l_u, l_v>0$ such that they add up to $n$ in the right way, and all the appropriate agreements occur. This is related to automatic complexity of Fibonacci words. A simpler question is which prefixes are squares.

[Walnut]$ eval test "?msd_fib Elx (lx+lx=n & At (t F[t]=F[t+lx])";


This produces the following automaton which in Fibonacci representation gives the lengths of the prefixes of the infinite Fibonacci word that are squares. They are 6, 10, and in general any number of the form $F_{k+3}+F_k)$.
We see here the usefulness of the Fibonacci representation, since the word is not 2-automatic. If we run the same command without ?msd_fib the result is a 1-state automaton with a self-loop labeled 0.

A VS Code extension for Walnut would be great, as editing formulas is cumbersome.

In the Thue sequence $t_0t_1t_2\dots=011010011001…$, each $t_n$ is the parity of the number of 1s in the binary representation $(n)_2$ of $n$. This leads to a 2-state DFAO (deterministic finite automaton with output) that reads $(n)_2$ and keeps track of the parity so far. This is an prototypical example of an automatic sequence.

The Fibonacci word $f_0f_1f_2\dots = 01001010$ where $f_n=f_{n-1}f_{n-2}$ can be handled similarly using the Fibonacci representation of numbers. In it, we write $n$ as $a_1\cdot 1 + a_2\cdot 2 + a_3\cdot 3+a_4\cdot 5+a_5\cdot 8+\dots$ where $1,2,3,5,8,\dots$ are Fibonacci numbers and each $a_i\in\{0,1\}$. It is unique provided that no two consecutive 1s in the representation are allowed.

These two examples already hint at a rich theory of automatic sequences with respect to various representation systems. Such a theory has been established in the old-fashioned manner of mathematicians over the years. Now, due to the decidability of Presburger arithmetic and some of its extensions, and a computer implementation thereof called Walnut and due to Hamoon Mousavi, researchers in the area are able to programmatically answer many research questions and verify or correct results from the literature.

As a theoretical backbone for the approach, it is shown that a sequence is $k$-automatic if and only if it is first order definable in the structure $(\mathbb N, +, V_k)$, where $V_k(n)$ is the greatest power of $k$ that divides $n$.

The author emphasizes that Walnut computations are not formally guaranteed to be correct; however, experience has lead to large and growing confidence in its results. As a piece of software Walnut is fairly easy to use and install.

The underlying algorithm has bad worst-case complexity and this affects the use of Walnut in interesting ways. When we consider the Tribonacci word in position $i+j+n$, \texttt{TR[i+j+n]}, the three variables lead to an exponentially larger automaton being generated than if we had stuck with two, and the program crashing. Rewriting the query to use \texttt{TR[j+n]} fixes the problem, as explained in the book. Thus, this program crash is not really the implementation of Walnut’s fault but an aspect of the underlying algorithm.

The topics covered include squares, palindromes, and a wide variety of notions from combinatorics on words. The book is very well-written with few errors, and many easily stated open problems. The many entertaining examples of English words exemplifying the concepts introduced can even be appreciated by non-mathematicians: for example, the word “distinguishing” is a concatenation of two equal-length words differ in only two positions.

Screenshot 2023-05-16 at 8.36.51 PM

Math 100 student’s protein folding research

Madison Koskey had the highest score in the class on this project task:

Fold an idealized protein known as S64 to obtain a highest possible score in the hydrophobic-polar protein folding model in three dimensions.

This problem was studied by Lesh, Mitzenmacher, and Whitesides (2003).

Above we see Madison’s intricate folding of S64 which earned a score of 27.

To try your hand at rotating it in 3D, go directly to this link.

Slides for October 30


Recent PhDs

Three new papers

with my recent PhDs Birns (2023, left) and Webb (2022, right). Webb is now an Assistant Professor at Chaminade University, and Birns is a data scientist at Pacific Life in California.
  1. KL-randomness and effective dimension under strong reducibility (with David J. Webb). Computability in Europe, Lectures Notes in Computer Science, 2021.
  2. On the degrees of constructively immune sets (with Samuel D. Birns). Computability in Europe, Lectures Notes in Computer Science, 2021.
  3. Strong Medvedev reducibilities and the KL-randomness problem (with David J. Webb). Computability in Europe, Lectures Notes in Computer Science, 2022.

The number of languages with maximum state complexity

Lei Liu completed her Master’s degree with the project title Complexity of Options in 2017.
An extension to monotone options (pictured) was presented at
ALH-2018. The new paper is called The number of languages with maximum state complexity and has been accepted for TAMC 2019.

As of 2022, the paper has been through 7 revisions and has been accepted for publication in the journal Algebra Universalis.

Some of the 168 monotone Boolean functions of 4 variables


Principles-based classification

I was the discussant for the following paper at Hawaii Accounting Research Conference 2020 on the UH Hilo campus:

The Contract Disclosure Mandate and Earnings Management under External Scrutiny

by Carlos Corona and Tae-Wook Ryan Kim

Discussant slides

I learned that research in the Theory Track of the accounting discipline primarily is about mathematical modeling of the effects of government policies and business decisions. It borrows methods from economics for such modeling. In the case of the Corona-Kim paper: quadratic programming without constraints, and exponential utility functions. Usually these are not empirical papers, i.e., they don’t test the model explicitly against data. Indeed this would be hard to do with notions like “intensity of scrutiny”.

I am a discussant for “A theory of principles-based classification” by Konvalinka, Penno, and Stecher, at HARC 2021.