Computing the Integer Square Root

Fred Akalin

1. The algorithm

Today I’m going to talk about a fast algorithm to compute the integer square root of a non-negative integer nn, isqrt(n)=n\isqrt(n) = \lfloor \sqrt{n} \rfloor, or in words, the greatest integer whose square is less than or equal to nn.[1] Most sources that describe the algorithm take it for granted that it is correct and fast. This is far from obvious! So I will prove both correctness and speed below.

One simple fact is that isqrt(n)n/2\isqrt(n) \le n/2, so a straightforward algorithm is just to test every non-negative integer up to n/2n/2. This takes O(n)O(n) arithmetic operations, which is bad since it’s exponential in the size of the input. That is, letting Bits(n)\Bits(n) be the number of bits required to store nn and letting lgn\lg n be the base-22 logarithm of nn, Bits(n)=O(lgn)\Bits(n) = O(\lg n), and thus this algorithm takes O(2Bits(n))O(2^{\Bits(n)}) arithmetic operations.

We can do better by doing binary search; start with the range [0,n/2][0, n/2] and adjust it based on comparing the square of an integer in the middle of the range to nn. This takes O(lgn)=O(Bits(n))O(\lg n) = O(\Bits(n)) arithmetic operations.

However, the algorithm below is even faster:[2]
  1. If n=0n = 0, return 00.
  2. Otherwise, set ii to 00 and set x0x_0 to 2Bits(n)/22^{\lceil \Bits(n) / 2\rceil}.
  3. Repeat:
    1. Set xi+1x_{i+1} to (xi+n/xi)/2\lfloor (x_i + \lfloor n/x_i \rfloor) / 2 \rfloor.
    2. If xi+1xix_{i+1} \ge x_i, return xix_i. Otherwise, increment ii.
Call this algorithm NEWTON-ISQRT\NewtonSqrt, since it’s based on Newton’s method. It’s not obvious, but this algorithm returns isqrt(n)\isqrt(n) using only O(lglgn)=O(lg(Bits(n)))O(\lg \lg n) = O(\lg(\Bits(n))) arithmetic operations, as we will prove below. But first, here’s an implementation of the algorithm in Javascript:[3]
// isqrt returns the greatest number x such that x^2 <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
//   isqrt(new BigInteger("64")).toString()
function isqrt(n) {
  var s = n.signum();
  if (s < 0) {
    throw new Error('negative radicand');
  }
  if (s == 0) {
    return n;
  }

  // x = 2^ceil(Bits(n)/2)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
  while (true) {
    // y = floor((x + floor(n/x))/2)
    var y = x.add(n.divide(x)).shiftRight(1);
    if (y.compareTo(x) >= 0) {
      return x;
    }
    x = y;
  }
}

2. Correctness

The core of the algorithm is the iteration rule: xi+1=xi+nxi2 x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i} \rfloor}{2} \right\rfloor where the floor functions are there only because we’re using integer division. Define an integer-valued function f(x)f(x) for the right side. Using basic properties of the floor function, you can show that you can remove the inner floor: f(x)=12(x+n/x) f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor which makes it a bit easier to analyze. Also, the properties of f(x)f(x) are closely related to its equivalent real-valued function: g(x)=12(x+n/x). g(x) = \frac{1}{2} (x + n/x)\text{.}

For starters, again using basic properties of the floor function, you can show that f(x)g(x)f(x) \le g(x), and for any integer mm, mf(x)m \le f(x) if and only if mg(x)m \le g(x).

Finally, let’s give a name to our desired output: let s=isqrt(n)=ns = \isqrt(n) = \lfloor \sqrt{n} \rfloor.[4]

Intuitively, f(x)f(x) and g(x)g(x) “average out” however far away their input xx is from n\sqrt{n}. Conveniently, this “average” is never an undereestimate:
(Lemma 1.) For x>0x \gt 0, f(x)sf(x) \ge s.

Proof. By the basic properties of f(x)f(x) and g(x)g(x) above, it suffices to show that g(x)sg(x) \ge s. g(x)=(1n/x2)/2g'(x) = (1 - n/x^2)/2 and g(x)=n/x3g''(x) = n/x^3. Therefore, g(x)g(x) is concave-up for x>0x \gt 0; in particular, its single positive extremum at x=nx = \sqrt{n} is a minimum. But g(n)=nsg(\sqrt{n}) = \sqrt{n} \ge s. ∎

(You can also prove Lemma 1 without calculus; show that g(x)sg(x) \ge s if and only if x22sx+n0x^2 - 2sx + n \ge 0, which is true when s2ns^2 \le n, which is true by definition.)
Furthermore, our initial estimate is always an overestimate:
(Lemma 2.) x0>sx_0 \gt s.

Proof. Bits(n)=lgn+1>lgn\Bits(n) = \lfloor \lg n \rfloor + 1 \gt \lg n. Therefore, x0=2Bits(n)/22Bits(n)/2>2lgn/2=ns.   \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &\ge 2^{\Bits(n) / 2} \\ &\gt 2^{\lg n / 2} \\ &= \sqrt{n} \\ &\ge s\text{.} \; \blacksquare \end{aligned}

(Note that any number greater than ss, say nn or n/2\lceil n/2 \rceil, can be chosen for our initial guess without affecting correctness. However, the expression above is necessary to guarantee performance. Another possibility is 2lgn/22^{\lceil \lceil \lg n \rceil / 2 \rceil}, which has the advantage that if nn is an even power of 22, then x0x_0 is immediately set to n\sqrt{n}. However, this is usually not worth the cost of checking that nn is a power of 22, as is required to compute lgn\lceil \lg n \rceil.)

An easy consequence of Lemmas 1 and 2 is that the invariant xisx_i \ge s holds. That lets us prove partial correctness of NEWTON-ISQRT\NewtonSqrt:
(Theorem 1.) If NEWTON-ISQRT\NewtonSqrt terminates, it returns the value ss.

Proof. Assume it terminates. If it terminates in step 11, then we are done. Otherwise, it can only terminate in step 3.23.2 where it returns xix_i such that xi+1=f(xi)xix_{i+1} = f(x_i) \ge x_i. This implies that g(xi)=(xi+n/xi)/2xig(x_i) = (x_i + n/x_i) / 2 \ge x_i. Rearranging yields nxi2n \ge x_i^2 and combining with our invariant we get nxis\sqrt{n} \ge x_i \ge s. But s+1>ns + 1 \gt \sqrt{n}, so that forces xix_i to be ss, and thus NEWTON-ISQRT\NewtonSqrt returns ss if it terminates. ∎

For total correctness we also need to show that NEWTON-ISQRT\NewtonSqrt terminates. But this is easy:
(Theorem 2.) NEWTON-ISQRT\NewtonSqrt terminates.

Proof. Assume it doesn’t terminate. Then we have a strictly decreasing infinite sequence of integers {x0,x1,}\{ x_0, x_1, \dotsc \}. But this sequence is bounded below by ss, so it cannot decrease indefinitely. This is a contradiction, so NEWTON-ISQRT\NewtonSqrt must terminate. ∎

We are done proving correctness, but you might wonder if the check xi+1xix_{i+1} \ge x_i in step 3.23.2 is necessary. That is, can it be weakened to the check xi+1=xix_{i+1} = x_i? The answer is “no”; to see that, let k=ns2k = n - s^2. Since n<(s+1)2n \lt (s+1)^2, k<2s+1k \lt 2s + 1. On the other hand, consider the inequality f(xi)>xif(x_i) \gt x_i. Since that would cause the algorithm to terminate and return xix_i, that implies that xi=sx_i = s. Therefore, that inequality is equivalent to f(s)>sf(s) \gt s, which is equivalent to f(s)s+1f(s) \ge s + 1, which is equivalent to g(s)=(s+n/s)/2s+1g(s) = (s + n/s) / 2 \ge s + 1. Rearranging yields ns2+2sn \ge s^2 + 2s. Substituting in n=s2+kn = s^2 + k, we get s2+ks2+2ss^2 + k \ge s^2 + 2s, which is equivalent to k2sk \ge 2s. But since k<2s+1k \lt 2s + 1, that forces kk to equal 2s2s. That is the maximum value kk can be, so therefore nn must be one less than a perfect square. Indeed, for such numbers, weakening the check would cause the algorithm to oscillate between ss and s+1s + 1. For example, n=99n = 99 would yield the sequence {16,11,10,9,10,9,}\{ 16, 11, 10, 9, 10, 9, \dotsc \}.

3. Run-time

We will show that NEWTON-ISQRT\NewtonSqrt takes O(lglgn)O(\lg \lg n) arithmetic operations. Since each loop iteration does only a fixed number of arithmetic operations (with the division of nn by xx being the most expensive), it suffices to show that our algorithm performs O(lglgn)O(\lg \lg n) loop iterations.

It is well known that Newton’s method converges quadratically sufficiently close to a simple root. We can’t actually use this result directly, since it’s not clear that the convergence properties of Newton’s method are preserved when using integer operations, but we can do something similar.

Define Err(x)=x2/n1\Err(x) = x^2/n - 1 and let ϵi=Err(xi)ϵ_i = \Err(x_i). Intuitively, Err(x)\Err(x) is a conveniently-scaled measure of the error of xx: it is less than 11 for most of the values we care about and it bounded below for integers greater than our target ss. Also, we will show that the ϵiϵ_i shrink quadratically. These facts will then let us show our bound for the iteration count.

First, let’s prove our lower bound for ϵiϵ_i:
(Lemma 3.) xis+1x_i \ge s + 1 if and only if ϵi1/nϵ_i \ge 1/n.

Proof. n<(s+1)2n \lt (s + 1)^2, so n+1(s+1)2n + 1 \le (s + 1)^2, and therefore (s+1)2/n11/n(s + 1)^2/n - 1 \ge 1/n. But the expression on the left side is just Err(s+1)\Err(s + 1). xis+1x_i \ge s + 1 if and only if ϵiErr(s+1)ϵ_i \ge \Err(s + 1), so the result immediately follows. ∎

Then we can use that to show that the ϵiϵ_i shrink quadratically:
(Lemma 4.) If xis+1x_i \ge s + 1, then ϵi+1<(ϵi/2)2ϵ_{i+1} \lt (ϵ_i/2)^2.

Proof. ϵi+1ϵ_{i+1} is just Err(f(xi))Err(g(xi))\Err(f(x_i)) \le \Err(g(x_i)), so it suffices to show that Err(g(xi))<(ϵi/2)2\Err(g(x_i)) \lt (ϵ_i/2)^2. Inverting Err(x)\Err(x), we get that xi=(ϵi+1)nx_i = \sqrt{(ϵ_i + 1) \cdot n}. Expressing g(xi)g(x_i) in terms of ϵiϵ_i we get g(xi)=n2(ϵi+2ϵi+1) g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{ϵ_i + 2}{\sqrt{ϵ_i + 1}} \right) and Err(g(xi))=(ϵi/2)2ϵi+1. \Err(g(x_i)) = \frac{(ϵ_i/2)^2}{ϵ_i+1}\text{.} Therefore, it suffices to show that the denominator is greater than 11. But xis+1x_i \ge s + 1 implies ϵi>0ϵ_i \gt 0 by Lemma 3, so that follows immediately and the result is proved. ∎

Then let’s bound our initial values:
(Lemma 5.) x02sx_0 \le 2s, ϵ03ϵ_0 \le 3, and ϵ11ϵ_1 \le 1.

Proof. Let’s start with x0x_0: x0=2Bits(n)/2=2(lgn+1+1)/2=2lgn/2+1=22lgn/2. \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + 1)/2 \rfloor} \\ &= 2^{\lfloor \lg n / 2 \rfloor + 1} \\ &= 2 \cdot 2^{\lfloor \lg n / 2 \rfloor}\text{.} \end{aligned} Then x0/2=2lgn/22lgn/2=nx_0/2 = 2^{\lfloor \lg n / 2 \rfloor} \le 2^{\lg n / 2} = \sqrt{n}. Since x0/2x_0/2 is an integer, x0/2nx_0/2 \le \sqrt{n} if and only if x0/2n=sx_0/2 \le \lfloor \sqrt{n} \rfloor = s. Therefore, x02sx_0 \le 2s.

As for ϵ0ϵ_0: ϵ0=Err(x0)Err(2s)=(2s)2/n1=4s2/n1. \begin{aligned} ϵ_0 &= \Err(x_0) \\ &\le \Err(2s) \\ &= (2s)^2/n - 1 \\ &= 4s^2/n - 1\text{.} \end{aligned} Since s2ns^2 \le n, 4s2/n44s^2/n \le 4 and thus ϵ03ϵ_0 \le 3.

Finally, ϵ1ϵ_1 is just Err(f(x0))\Err(f(x_0)). Using calculations from Lemma 4, ϵ1Err(g(x0))=(ϵ0/2)2/(ϵ0+1)(3/2)2/(3+1)=9/16. \begin{aligned} ϵ_1 &\le \Err(g(x_0)) \\ &= (ϵ_0/2)^2/(ϵ_0 + 1) \\ &\le (3/2)^2/(3 + 1) \\ &= 9/16\text{.} \end{aligned} Therefore, ϵ11ϵ_1 \le 1. ∎

Finally, we can show our main result:
(Theorem 3.) NEWTON-ISQRT\NewtonSqrt performs O(lglgn)O(\lg \lg n) loop iterations.

Proof. Let kk be the number of loop iterations performed when running the algorithm for nn (i.e., xkxk1x_k \ge x_{k-1}) and assume k4k \ge 4. Then xis+1x_i \ge s + 1 for i<k1i \lt k - 1. Since ϵ11ϵ_1 \le 1 by Lemma 5, ϵ21/2ϵ_2 \le 1/2 and ϵi(ϵ2)2i2ϵ_i \le (ϵ_2)^{2^{i-2}} for 2i<k12 \le i \lt k - 1 by Lemma 4, then ϵk222k4ϵ_{k-2} \le 2^{-2^{k-4}}. But 1/nϵk21/n \le ϵ_{k-2} by Lemma 3, so 1/n22k41/n \le 2^{-2^{k-4}}. Taking logs to bring down the kk yields k4lglgnk - 4 \le \lg \lg n. Then klglgn+4k \le \lg \lg n + 4, and thus k=O(lglgn)k = O(\lg \lg n). ∎

Note that in general, an arithmetic operation is not constant-time, and in fact has run-time Ω(lgn)\Omega(\lg n). Since the most expensive arithmetic operation we do is division, we can say that NEWTON-ISQRT\NewtonSqrt has run-time that is both Ω(lgn)\Omega(\lg n) and O(D(n)lglgn)O(D(n) \cdot \lg \lg n), where D(n)D(n) is the run-time of dividing nn by some number n\le n.[5]

4. The Initial Guess

It’s also useful to show that if the initial guess x0x_0 is bad, then the run-time degrades to Θ(lgn)Θ(\lg n). We’ll do this by defining the function NEWTON-ISQRT\NewtonSqrt except that it takes a function INITIAL-GUESS\mathrm{I{\small NITIAL}\text{-}G{\small UESS}} that is called with nn and assigned to x0x_0 in step 1. Then, we can treat ϵ0ϵ_0 as a function of nn and analyze how long ϵiϵ_i stays above 11 to show that NEWTON-ISQRT\NewtonSqrt uses an initial guess such that ϵ0(n)=Θ(1)ϵ_0(n) = Θ(1), then Theorem 4 reduces to Theorem 3 in that case. However, if x0x_0 is chosen to be Θ(n)Θ(n), e.g. the initial guess is just nn or n/kn/k for some kk, then ϵ0(n)ϵ_0(n) will also be Θ(n)Θ(n), and so the run time will degrade to Θ(lgn)Θ(\lg n). So having a good initial guess is important for the performance of NEWTON-ISQRT\NewtonSqrt!


Like this post? Subscribe to my feed RSS icon or follow me on Twitter Twitter icon.

Footnotes

[1] Aside from the Wikipedia article, the algorithm is described as Algorithm 9.2.11 in Prime Numbers: A Computational Perspective.

[2] Note that only integer operations are used, which makes this algorithm suitable for arbitrary-precision integers.

[3] Go and JS implementations are available on my GitHub.

[4] Here, and in most of the article, we’ll implicitly assume that n>0n \gt 0.

[5] D(n)D(n) is Θ(lg2n)Θ(\lg^2 n) using long division, but fancier division algorithms have better run-times.