Consider normal random variables $Y_i$ with means $\mu_i$, $1\le i\le k$.
The Tukey test often seems overly complicated, but it becomes clearer if we first assume that variance of $Y_i$, $\sigma^2$, is known.
Let’s draw $r$ samples from each, called $Y_{i,j}$, $1\le j\le r$.
Denote the sample averages by $\overline Y_{i+}$.
Let $W_i = \overline Y_{i+}-\mu_i$.
Let $Q=R$, the range, be defined by
$$
R = \max_i W_i – \min_i W_i = \max_{i,j} |W_i-W_j|.
$$
Suppose we understand the distribution of $R$ well enough to find a number $Q_\alpha$ such that
$$
\Pr(R\le Q_\alpha) = 1-\alpha
$$
Define the intervals
$$
I_{i,j} = (\overline Y_{i+}-\overline Y_{j+} – Q_\alpha, \overline Y_{i+}-\overline Y_{j+} + Q_\alpha)
$$
Then it is easily seen that
$$
\Pr(\text{for all $i$, $j$, }\mu_i-\mu_j \in I_{i,j}) = 1-\alpha.
$$
and hence for all $i$, $j$,
$$
\Pr(\mu_i-\mu_j\in I_{i,j}) \ge 1-\alpha.
$$
(Larsen and Marx make a mistake here, mixing up the last two equations.)
Thus, the hypothesis that $\mu_i=\mu_j$ can be rejected if we observe $0\not\in I_{i,j}$.
Now, if the variance is unknown, we instead define $Q=R/S$ where $S$ is a certain estimator of the variance.
An important point is that we want to understand the joint distribution of $R$ and $S$, which is easiest if they are independent.
We do have an estimator of $\sigma^2$ that’s independent of the $W_i$, namely the residual sum of squares (a.k.a. sum of squares for error),
$$
\mathrm{SSE} = \sum_i \sum_j (Y_{i,j}-\overline Y_{i+})^2
$$
So we take $S^2$ to be a suitable constant time $\mathrm{SSE}$.
Namely, we want $S^2$ to be an unbiased estimator of $\sigma^2/r$, the variance of $W_i$. And we know that $\mathrm{SSE}/\sigma^2$ is $\chi^2(rk-k)$ distributed.
This leads us to define
$$
S^2 = \mathrm{MSE}/r
$$
where $\mathrm{MSE} = \mathrm{SSE}/(rk-k)$.
A point here is that $\mathrm{SSE}/\sigma^2$ is a sum of $k$ independent $\chi^2(r-1)$ random variables (one for each $i$), hence is itself $\chi^2(rk-k)$.