Category Archives: research

Research statement

The primary focus of my research is algorithmic randomness, especially its relation to randomness extraction and to probability and statistics. More generally I have worked in various areas of computability theory (recursion theory).

The basic idea of algorithmic randomness is to declare an object to be algorithmically random if it belongs to no “computable” set of measure zero. Since only computable sets can truly be understood by humans, an algorithmically random object will be as random as one can desire. This way one can formulate questions such as: can one compute a random object from an object that is almost random? Can one compute anything useful from a random object? etc. So algorithmic randomness is a natural setting for randomness extraction (although not the only natural setting).

Martin-Löf (ML-) randomness is a notion of algorithmic randomness, arguably the most natural one. It basically corresponds to the class of all Turing machines, whereas other randomness notions correspond to those Turing machines that always halt, say. It turns out that the collection of all Martin-Löf random sets is simpler in terms of definability than rival notions. It is a \(\Sigma^0_2\) class, whereas other notions tend to give \(\Pi^0_3\) classes or worse. It is an \(F_\sigma\) set rather than \(F_{\sigma\delta}\). Thus each ML-random set belongs to a closed set containing nothing but ML-random sets, and this is a basic tool in the area.

Open problem: Let \(\text{MLR}\) denote the collection of all Martin-Löf random subsets of \(\mathbb N\) and let \(\le_T\) denote Turing reducibility.

Suppose \(A\subseteq B\subseteq C\), \(A\in\text{MLR}\), and \(C\in\text{MLR}\). Then is it true that \((\exists X\in\text{MLR})(X\le_T B)\)?

Note: Hertling (2005) showed that it is not necessarily true that \(B\in\text{MLR}\).

This is one of many questions of the form: if a set is close to being random, then can we extract randomness from it? I have worked primarily on two notions of being close to random: being close in asymptotic Hamming distance, and being an infinite subset.

The current state of knowledge about infinite subsets is that each Schnorr 2-random set \(X\) has an infinite subset \(Y\) such that no Martin-Löf random set is Turing reducible to \(Y\). This is the main result of the paper A strong law of computationally weak subsets. The main idea of the proof is to combine the Extinction Criterion for Galton-Watson processes with a construction from the paper Comparing DNR and WWKL.

In the case of asymptotic Hamming distance, the paper Recovering randomness from an asymptotic Hamming distance has some results. One can define asymptotic Hamming distance in two ways: either require that the Hamming distance of the prefixes of \(X\) and \(Y\) of length \(n\) is small for all \(n\), or merely for infinitely many \(n\). It turned out to be most natural, in a way, to require small distance on an infinite computable set of values of \(n\). This is related to Michel Weber’s strengthening of the Law of the Iterated Logarithm. This law has a somewhat strange form, involving the logarithm of the logarithm function. However, when one looks at what happens at an infinite sparse set of times, this strangeness disappears. In terms of this notion of asymptotic Hamming distance I obtained the following type of result:

For each Turing reduction \(\Phi\) and each sufficiently random set \(X\), there is a set \(Y\) which is close to \(X\), such that \(\Phi^Y\) is not random.

This has the corollary that from a set \(Y\) that is complex in the sense of Kjos-Hanssen, Merkle, and Stephan (TAMS, 2011), and simultaneously has positive effective packing dimension, one cannot uniformly compute any set \(Z\) that is even Mises-Wald-Church stochastic. A goal is to strengthen this so that \(Y\) has positive effective Hausdorff dimension. Another possible way it can be strengthened is to deal with all Turing reductions at once. Conceivably there is a route here to prove that for each sufficiently random set \(X\), there is a set \(Y\) that is so close to \(X\) as to force \(Y\) to have effective Hausdorff dimension 1; and such that no 1-random set is Turing reducible to \(Y\). This would strengthen a result of Greenberg and Miller which asserts that such sets \(Y\) exist, if we remove the assumption of being close to a random set.

Excerpts of a talk from Luminy, 2009


Randomness with respect to a measure

Martin-Löf randomness for arbitrary measures on $2^\omega$:

Definition.
A test for $\mu$-randomness is a uniformly $\Sigma^0_1(\mu)$ sequence $\{U_n\}_{n\in\omega}$ with $\mu(U_n)\le 2^{-n}$.

The test $\{U_n\}_{n\in\mathbb{N}}$ defines a null set $\cap_n U_n$. $X$ passes the test for randomness $\{U_n\}_{n\in\mathbb{N}}$ if $X\not\in \cap_n U_n$. If $X$ passes all $\mu$-randomness tests then $X$ is $\mu$-random.

Bernoulli measures

For each $n\in\omega$,
$$\mu_p(\{X: X(n)=1\})=p$$
$$\mu_p(\{X:X(n)=0\})=1-p$$
and $X(0),X(1),X(2),\ldots$ are mutually independent random variables.

Example.
The Strong Law of Large Numbers for $\mu_p$ states that for almost all $X$ according to the measure $\mu_p$, we have
$$\forall\varepsilon>0\,\exists N\,\forall n>N\, \left| \frac{\text{the $\#$ of 1s up to $n$ in $X$}}{n}-p\right| <\varepsilon.$$
Suppose $X$ does not satisfy the SLLN, as witnessed by a number $\varepsilon_0$. Let
$$U_N=\{Z: \exists n>N \, \left| \frac{\text{the $\#$ of 1s up to $n$ in $X$}}{n}-p\right| \ge \varepsilon_0\}.$$
Then $U_N$ is open and in fact $\Sigma^0_1(\mu_p)$; and $\mu(U_N)$ goes effectively to $0$. Thus, $X$ is not Martin-Löf random with respect to $\mu_p$.

Definition.
A set $Y\subseteq \mathbb{N}$ is feeble if it is infinite, and no Martin-Löf random set is computable from $Y$.

Theorem[Law of Feeble Subsets]
Almost every $X\subseteq\mathbb{N}$, according to $\mu:=\mu_{1/2}$, has a feeble subset $Y\subseteq X$.

(Passing from $X$ to $Y$ we suffer a “loss of randomness beyond algorithmic repair.”)

Example.
Let $X$ be Martin-Löf random and let $Y$ be a computably chosen subset of $X$. Say,
$$Y=\langle X(0),0,X(2),0,X(4),0,\ldots\rangle.$$
Then $Y$ is an infinite subset of $X$, but $Y$ does compute a Martin-Löf random set, namely
$$Z=\langle X(2),X(4),X(6),X(8),\ldots\rangle$$
so $Y$ is not feeble.

Generally, if $X$ is Martin-Löf random and $R$ is computable ($R\le_T\emptyset$) then $X\cap R$ is not feeble.

Question.
If there a Martin-Löf random set $X$ and a set $R\le_T X$ such that $R\cap X$ is feeble?

Partial negative result: such an $X$ can not be hyperimmune-free.

von Neumann’s randomness extractor

Let $X$ be Martin-Löf random let $Y$ be a “randomly chosen” subset of $X$. That, is each 1 in $X$ is converted to a 0 with probability $\frac{1}{2}$. Then $Y$ does compute a Martin-Löf random set. Namely, let $Z$ be obtained from $X$ by making the following replacements:
$$\langle X(2n),X(2n+1)\rangle \mapsto Z(n)$$
$$\langle 0,0\rangle \mapsto \langle \rangle$$
$$\langle 1,1\rangle \mapsto \langle \rangle$$
$$\langle 0,1\rangle \mapsto \langle 0\rangle$$
$$\langle 1,0\rangle \mapsto \langle 1\rangle$$

How dense can a feeble subset $Y\subseteq X$ be?

The principal function of a set $Y$ is given by $p(n)=$ the smallest element of $Y$ that is $>p(n-1)$.

  • K., 2007: principal function bounded by $4^n$.
  • Miller, 2008: principal function bounded by $n^{2+\varepsilon}$.

Question.
Can a Martin-Löf random set $X$ have a feeble subset $Y$ of linear density, i.e. principal function bounded by $c\cdot n$?

One way to approach this question is to start with stronger reducibilities.

Definition.
Let $r$ be a reducibility. $Y$ is $r$-feeble if $Y$ is infinite and $Y$ does not $r$-compute any ML-random set.

$c$, $d$, and $btt$-feeble subsets of linear density

Take $Y$ to be a $\mu_p$-random subset of $X$. Then $Y$ has principal function bounded by $({p}^{-1}+\varepsilon)n$ ($Y$ has linear density) and $Y$ is $c$-feeble, $d$-feeble, and $btt$-feeble.

However, using von Neumann’s extractor one can show that $Y$ is not $tt$-feeble. By using different algorithms one can show that $Y$ is also not $l$-feeble nor $p$-feeble.

Question.
Is there an $r$-feeble subset of $X$ of linear density, for $r\in\{p,l\}$? Or even for $r\in\{tt,T\}$?

The 7 clones in Post’s lattice containing the constant functions correspond to reducibilities $m$, $btt(1)$, $c$, $d$, $p$, $l$, $tt$ and out of these 7 a $\mu_p$-random set is feeble for exactly 4.

Reducibilities for which a $\mu_p$-random set is feeble:

  • $m$: $\{0,1\}$
  • $btt(1)$: $\{0,\neg\}$
  • $c$: $\{0,1,\wedge\}$ (conjunctive)
  • $d$: $\{0,1,\vee\}$ (disjunctive)

and those for which it is not:

  • $p$: $\{0,1,\wedge,\vee\}$ (positive)
  • $\ell$: $\{0,\iff\}$
  • $tt$: $\{\neg,\vee\}$

When is a measure “close enough” to uniform?

Theorem[Kakutani 1948]
Let $\mu$ and $\nu$ be strongly positive (i.e. the $p_i$, $q_i$ are bounded away from $0$ and $1$) generalized Bernoulli measures of parameters $\{p_i\}_{i\in\mathbb{N}}$ and $\{q_i\}_{i\in\mathbb{N}}$, respectively. If $\sum_i (p_i-q_i)^2<\infty$ then $\mu$ and $\nu$ are equivalent (i.e. have the same null sets).

Theorem[Vovk 1987]
Let $\mu$ and $\nu$ be computable strongly positive generalized Bernoulli measures of parameters $\{p_i\}_{i\in\mathbb{N}}$ and $\{q_i\}_{i\in\mathbb{N}}$, respectively. If $\sum_i (p_i-q_i)^2<\infty$ then $\mu$MLR=$\nu$MLR.

Linear reducibility

$Z$ is linearly reducible to $Y$ ($Z\le_\ell Y$) if the truth of the equation $Z(n)=1$ can effectively be reduced to the truth of a linear equation in the variables $Y(0), Y(1), \ldots$ over the finite field $\mathbb Z/2\mathbb Z = (\{0,1\},+)$:

$$Z(43)=1\Leftrightarrow Y(3)+Y(5)=0$$
$$Z(44)=1\Leftrightarrow Y(1)+Y(6)+Y(8)=1$$
$$\ldots$$
One way to state this:

Definition.
$Z\le_{\ell}Y$ if there is a recursive function $f$ such that for all $n$,
$$n\in Z\iff |D_{f(n)}\cap Y|\equiv 1(\text{mod }2).$$

An improved von Neumann extractor (in a sense)

Theorem
A $\mu_p$-random set is not $\ell$-feeble.

Proof.
Suppose $Y$ is $\mu_p$-random. The $\mu_p$-probability that $|D_{f(n)}\cap Y|\equiv 1(\text{mod }2)$ is computable from $p$ and converges effectively to $1/2$ as $|D_{f(n)}|\to\infty$. \footnote{\tiny{The reason is that if $D_{f(n)}$ is very large, then it is likely to contain a large amount of numbers from $Y$, and this amount is about equally likely to be odd or even. }}

Let $D_{f(n)}$ be disjoint finite sets that are so large that if $p_i:=\mu_p(\{Y:|D_{f(i)}\cap Y|\equiv 1(\text{mod }2)\})$ then $\sum_i (p_i-\frac{1}{2})^2<\infty$.

If $Z$ is reduced to $Y$ via $D_{f(n)}$, then $Z$ is ML-random.

Positive reducibility

$Z\le_p Y$ iff there is a recursive function $f$ such that for all $n$, $n\in Z$ iff $\exists i\le k_i$ $D_{f(n,i)}\subseteq Y$.

We can rewrite this:
\begin{equation}(*)
\tag{*}\neg \forall i\le k_i [D_{f(n,i)}\not\subseteq Y]
\end{equation}

If the $D_{f(n,i)}$ are disjoint then the events $[D_{f(n,i)}\not\subseteq Y]$ are independent and so we can calculate the probability of (*).

By taking $|D_{f(n,i)}|$ large enough and taking $k_i$ large enough we can get this probability as close to $1/2$ as desired. Thus,

Theorem[A second improved von Neumann extractor]
Let $0<q<1$. A $\mu_q$-random set is not $p$-feeble.

Cannot reduce meet to join

Here is an argument from Culver’s MA thesis from 2010.

Suppose the sequence $(A_0\wedge A_1,A_2\wedge A_3,\ldots)$ is $\vee$-reducible to $A$.
Let’s look at the use of the $\vee$-truth-table reduction for computing $A_8\wedge A_9$.
We claim that after seeing only $A_0,\ldots,A_7$ we can rule out one possibility for $(A_8,A_9)$, thereby making money on $A$ and thus showing $A$ is not random.

If the use is only $A_0,\ldots,A_7$ then let’s say the reduction predicts $A_8\wedge A_9=1$. Then we can rule out the pair $(0,1)$. If the reduction predicts $0$ we can rule out $(1,1)$.

If the use is only $A_0,\ldots,A_8$ then based on us looking at $A_0,\ldots,A_7$ we can determine that the reduction predicts $1\wedge A_9=1$ or we can determine that the reduction predicts $1\wedge A_9=0$. Then we can rule out the pair $(1,0)$, or $(1,1)$, respectively.

Finally, if the reduction uses all of $A_0,\ldots,A_9$ then based on us looking at $A_0,\ldots,A_7$ we can determine that the reduction predicts $1\wedge 1=1$ and we can determine that the reduction predicts $1\wedge 0=1$ and… the reduction cannot correctly predict the whole $\wedge$ function or else $\wedge$ would be an element of the clone $\{\vee\}$.

Transition to Advanced Math: Take-home midterm

Do the following problems from Humphreys and Prest:
1.4#2
2.2#5
2.4#6

For the next and last two problems refer to the page Automaton complexity.

Prove that
4. $A(x_1,\ldots,x_k)\le k+1$ for all $k$.
5. $A(x_1,\ldots,x_k)\le k$ for $k\ge 3$.

6. Extra credit (up to 4% of the course): improve some of the $\le$ estimates on the table of complexities.

(I proved in class that $A(x_1,\ldots,x_k)\le k+2$ for all $k$.)

It is essential to write legibly and show your work. If your work is absent or illegible, and
at the same time your answer is not perfectly correct, then no partial credit can be awarded.
Completely correct answers which are given without justification may receive little or no credit.

During this exam, you are permitted to use calculators, notes, or books.

You are not allowed to collaborate with others, except on the extra credit problem.

Special Session on Computability and Complexity

Cameron Freer and I organized a special session at the AMS meeting at UH during March 3-4, 2012.

Program

Speakers

Logan Axon (Gonzaga University, Washington)Axon’s slides
Jennifer Chubb Reimann (University of San Francisco)
Denis Hirschfeldt (University of Chicago)Hirschfeldt’s slides
John Hitchcock (University of Wyoming)
Greg Igusa (UC Berkeley)Igusa’s slides
Joseph S. Miller (University of Wisconsin, Madison)Miller’s slides
Mia Minnes (UCSD) Minnes’ slides
Pavel Semukhin (Pacific Institute for Mathematical Sciences (PIMS), University of Regina, Saskatchewan, Canada)Semukhin’s slides


View Special Session speakers in a larger map

Abstracts

1078-03-400 Algorithmic Randomness and Computational 13-dec-2011
Learning.
John M Hitchcock*, jhitchco@cs.uwyo.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-400.pdf

1078-03-399 Cupping with random sets. 13-dec-2011
Joseph S. Miller*, jmiller@math.wisc.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-399.pdf

1078-03-377 Algorithmic randomness and complexity 13-dec-2011
via automata.
Mia Minnes*, minnes@math.ucsd.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-377.pdf

1078-03-363 The reverse mathematics of homogeneous 13-dec-2011
models.
Denis R. Hirschfeldt*, drh@math.uchicago.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-363.pdf

1078-03-354 Effective properties of approximable 13-dec-2011
functions.
Jennifer Chubb Reimann*, jcc31@psu.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-354.pdf

1078-03-313 Martin-L\”of random closed sets. 12-dec-2011
Logan Axon*, axon@gonzaga.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-313.pdf

1078-03-237 Nonexistence of Minimal Pairs for 10-dec-2011
Generic Computation.
Gregory Igusa*, igusa@math.berkeley.edu
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-237.pdf

1078-03-127 Automatic models of first-order theories. 01-dec-2011
Pavel Semukhin*, pavel@semukhin.name
Frank Stephan
http://www.ams.org/amsmtgs/2190_abstracts/1078-03-127.pdf

Semantic distance using search engines

Google Distance Between Words (with Alberto J. Evangelista).
Frontiers in Undergraduate Research, University of Connecticut, Spring 2006.
arXiv: 0901.4180 [cs.CL]
The normalized Google distance of Cilibrasi and Vitanyi was studied. An explicit example of the failure of the triangle inequality was found:
\[d(\text{Rolling Stones}, \text{salmonflies})>d(\text{Rolling Stones}, \text{Beatles})+d(\text{Beatles}, \text{salmonflies}).\]
(This is probably due to people misspelling “beetles”.)

Surprisingly (?) this paper was cited by:
Wu, Lin, and Liu: An exploratory study of navigating Wikipedia semantically: model and application (published in LNCS, Online Communities and Social Computing: 4th International Conference).

Worawitphinyo, Gao and Jabeen: Improving Suffix Tree Clustering with New Ranking and Similarity Measures (published in LNCS, Advanced Data Mining and Applications
7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II)

Lee Jun Choi,; Rashid, Nur’Aini Abdul;
Adapting normalized google similarity in protein sequence comparison (This paper appears in: Information Technology, 2008. ITSim 2008. International Symposium on)

Khushboo Thakkar et al. / International Journal on Computer Science and Engineering (IJCSE): Test Model for Text Categorization and
Text Summarization

http://efreedom.com/Question/9-19264/Determining-Similarity-Words

Kolmogorov Complexity in perspective Part II: Classification, InformationProcessing and Duality, by Marie Ferbus-Zanda (published in Synthese)

The language specialisation of the Google search
engine, by Volker Schatz

Computability menagerie

The Computability Menagerie houses animals of varying degrees of non-computability. The project started during my Marie Curie fellowship in Heidelberg, where I was working on a huge diagram of downward closed classes of Turing degrees, using JFig.

Mushfeq Khan and Joe Miller have written a web app version of the menagerie.

Related research on André Nies’ home page:
Logic Blog

Tamer (computable) creatures are on display at the Complexity Zoo.