### Regular expression complexity of homogeneous languages

We could measure complexity by the length of the polish notation for the regex.
However, this will always be an odd number $2m+1$. Therefore we prefer to use $m$,
the number of operators, as the complexity.
We have an upper bound of $c(n-1)+c-1 = cn-1$ for the complexity for a family of cardinality $c$ consisting of words of length $n$. This is because we can do: $++..x_1x_2x_3..y_1y_2y_3$ (in the case $n=3,c=2$).
This is not sharp: for instance, when $c=2^n$ we also have the upper bound $2n-1$ due to the pattern
$..+01+01+01$ (in the case $n=3$).
Of interest: which value of $c$ maximizes the complexity given $n$?

Data below from February 20-23, 2021, pertain to a search through the space of words over {0,1,+,.}. A better, more semantic approach is to use that the complexity $R$ of a language $L=L_1\cup L_2$ or $L=L_1L_2$ (a concatenation) is at most $R(L_1)+R(L_2)+1$, and in fact $R(L)$ is realized this way for some $L_1,L_2$. With the new code it takes only 5 minutes to find all complexity for $n\ge 3$.

## Length $n=0$ (max 0)

Set | Complexity | Regex |

$\emptyset$ | 0 | $\emptyset$ |

$\{\varepsilon\}$ | 0 | $\varepsilon$ |

## Length $n=1$ (max 1)

Set | Complexity | Regex |

$\emptyset$ | 0 | $\emptyset$ |

$\{0\}$ | 0 | $0$ |

$\{1\}$ | 0 | $1$ |

$\{0,1\}$ | 1 | $+01$ |

Uniqueness is of interest. For $n=1$, +10 = +01 but we may always rule out the phrase +1 so it is essentially unique.
More genuine non-uniqueness occurs for $\{00,01,11\}$:

+.00.+011, which is by commutativity of + equal to +.+011.00

+.0+01.11, which is by commutativity of + equal to +.11.0+01

We associate to the left so that instead of .0.0 we use ..00, to eliminate duplicates.
For $\{000, 001, 011, 111\}$ we have:

+.+01.11..00+01

+..00+01.+01.11

+..00+01..+0111

+..+0111..00+01

A more complex example is $\{000,001,011,110\}$ which has:

+.0+.00.+011..110

+.0+.0+01.11..110

+.0+.11.0+01..110

+.0+.+011.00..110

+.+.00.110..0+011

+.+.11.000..0+011

+..0+011.+.00.110

+..0+011.+.11.000

+..110.0+.00.+011

+..110.0+.0+01.11

+..110.0+.11.0+01

+..110.0+.+011.00

## Length $n=2$ (max 4)

Set | Complexity | Regex |

$\emptyset$ | 0 | $\emptyset$ |

$\{00\}$ | 1 | $.00$ |

$\{01\}$ | 1 | $.01$ |

$\{10\}$ | 1 | $.10$ |

$\{11\}$ | 1 | $.11$ |

$\{00,01\}$ | 2 | .0+01 |

$\{00,10\}$ | 2 | .+010 |

$\{00,11\}$ | 3 | +.00.11 and +.11.00 |

$\{01,10\}$ | 3 | +.01.10 and +.10.01 |

$\{01,11\}$ | 2 | .+011 |

$\{10,11\}$ | 2 | .1+01 |

$\{00,01,10\}$ | 4 | +.01.+010 |

$\{00,01,11\}$ | 4 | +.00.+011 |

$\{00,10,11\}$ | 4 | +.00.1+01 |

$\{01,10,11\}$ | 4 | +.01.1+01 |

$\{00,01,10,11\}$ | 3 | .+01+01 |

## Length $n=3$, cardinality $\le 2$ (max 5)

Set | Complexity | Regex |

$\emptyset$ | 0 | $\emptyset$ |

$\{000\}$ | 2 | $..000$ |

$\{001\}$ | 2 | $..001$ |

$\{010\}$ | 2 | $..010$ |

$\{011\}$ | 2 | $..011$ |

$\{100\}$ | 2 | $..100$ |

$\{101\}$ | 2 | $..101$ |

$\{110\}$ | 2 | $..110$ |

$\{111\}$ | 2 | $..111$ |

$\{000,001\}$ | 3 | .0.0+01 |

$\{000,010\}$ | 3 | .0.+010 |

$\{000,011\}$ | 4 | .0+.00.11 |

$\{000,100\}$ | 3 | .+01.00 |

$\{000,101\}$ | 5 | +.0.00.1.01 |

$\{000,110\}$ | 4 | .+.00.110 |

$\{000,111\}$ | 5 | +.0.00.1.11 |

$\{001,010\}$ | 4 | .0+.01.10 |

$\{001,011\}$ | 3 | .0.+011 |

$\{001,100\}$ | 5 | +.0.01.1.00 |

$\{001,101\}$ | 3 | .+01.01 |

$\{001,110\}$ | 5 | +.0.01.1.10 |

$\{001,111\}$ | 4 | .+.00.111 |

$\{010,011\}$ | 3 | .0.1+01 |

$\{010,100\}$ | 4 | .+.01.100 |

$\{010,101\}$ | 5 | +.0.10.1.01 |

$\{010,110\}$ | 3 | .+01.10 |

$\{010,111\}$ | 5 | +.0.10.1.11 |

$\{011,100\}$ | 5 | +.0.11.1.00 |

$\{011,101\}$ | 4 | .+.01.101 |

$\{011,110\}$ | 5 | +.0.11.1.10 |

$\{011,111\}$ | 3 | .+01.11 |

$\{100,101\}$ | 3 | .1.0+01 |

$\{100,110\}$ | 3 | .1.+010 |

$\{100,111\}$ | 4 | .1+.00.11 |

$\{101,110\}$ | 4 | .1+.01.10 |

$\{101,111\}$ | 3 | .1.+011 |

$\{110,111\}$ | 3 | .1.1+01 |

## Length 3, cardinality 3 (max=7)

Set | Complexity | Regex |

$\{000,001,010\}$ | 5 | .0+.01.+010 |

$\{000,001,011\}$ | 5 | .0+.00.+011 |

$\{000,001,100\}$ | 6 | +.0.01.+01.00 |

$\{000,001,101\}$ | 6 | +.0.00.+01.01 |

$\{000,001,110\}$ | 6 | +.0.0+01.1.10 |

$\{000,010,011\}$ | 5 | .0+.00.1+01 |

$\{000,010,100\}$ | 5 | .+.01.+0100 |

$\{000,010,101\}$ | 6 | +.0.+010.1.01 |

$\{000,010,110\}$ | 5 | .+.00.+0110 |

$\{000,010,111\}$ | 6 | +.0.+010.1.11 |

$\{000,011,100\}$ | 6 | +.0.11.+01.00 |

$\{000,011,101\}$ | 7 | +.0+.00.11.1.01 |

The example $\{000,011,101\}$ illustrates that 7 is the maximum complexity for a family of 3 binary words of length 3: there will always be two words that start with the same letter.
## Length 3, cardinality 4

Set | Complexity | Regex |

$\{000,001,010,011\}$ | 4 | .0.+01+01 |

$\{000,001,010,100\}$ | 8 | +. +01 .00 .0+ .01 .10 |

$\{000,001,010,101\}$ | 7 | +. +01 .01 .0. +01 0 |

$\{000,001,010,110\}$ | 7 | +. .00 +01 . +01 .10 |

$\{000,001,010,111\}$ | 8 | + .+ .00 .11 1 .0. +01 0 |

$\{000,001,011,100\}$ | 7 | +.0.+011.+01.00 |

$\{000,001,011,101\}$ | 8 | + ..101 .0 + .11 .0 +01 |

$\{000,001,011,110\}$ | 8 | + ..110 .0 + .11 .0 +01 |

$\{000,001,011,111\}$ | 7 | +.0.0+01.+01.11 |

etc.
With the current code it takes quite long to show that a complexity is $\ge 9$,
and already 10 minutes or so to show it is $\ge 8$.

## Phrases that are needed, and the minimum length at which they occur

Phrase | $n$ | $c$ |

0 | 1 | 1 |

1 | 1 | 1 |

+ | 1 | 2 |

. | 2 | 1 |

00 | 2 | 1 |

01 | 1 | 2 |

0+ | 2 | 2 |

0. | 2 | 2 |

10 | 2 | 1 |

11 | 2 | 1 |

1+ | 2 | 2 |

1. | 2 | 2 |

+0 | 1 | 2 |

+1 | not needed | |

++ | not found yet | |

+. | 2 | 2 |

.0 | 2 | 1 |

.1 | 2 | 1 |

.+ | 2 | 2 |

.. | 3 | 1 |

000 | yes $\{000\}$ | |

001 | yes $\{001\}$ | |

00+ | | |

00. | | |

010 | yes $\{010\}$ | |

011 | yes $\{011\}$ | |

01+ | | |

01. | | |

0+0 | | |

0+1 | no need +1 | |

0++ | | |

0+. | | |

0.0 | | |

0.1 | | |

0.+ | | |

0.. | | |

100 | yes $\{100\}$ | |

101 | yes $\{101\}$ | |

10+ | | |

10. | | |

110 | yes $\{110\}$ | |

111 | yes $\{111\}$ | |

11+ | | |

11. | | |

1+0 | | |

1+1 | no need +1 | |

1++ | | |

1+. | | |

1.0 | | |

1.1 | | |

1.+ | | |

1.. | | |

+00 | | |

+01 | | |

+0+ | | |

+0. | | |

+10 | no need +1 | |

+11 | no need +1 | |

+1+ | no need +1 | |

+1. | no need +1 | |

++0 | | |

++1 | no need +1 | |

+++ | | |

++. | | |

+.0 | | |

+.1 | | |

+.+ | | |

+.. | | |

.00 | | |

.01 | | |

.0+ | | |

.0. | | |

.10 | | |

.11 | | |

.1+ | | |

.1. | | |

.+0 | | |

.+1 | no need +1 | |

.++ | | |

.+. | | |

..0 | | |

..1 | | |

..+ | | |

… | | |