More On Squaring and Multiplying Large Integers
Dan Zuras
Hewlett-Packard Co.
ABSTRACT
Methods of squaring and multiplying large integers are discussed. The
obvious O(n2) method turns out to be
best for small numbers. Existing O(nlog(3) /
log(2)) = O(n1.585) methods become better as
the numbers get bigger. New methods that are
O(nlog(5) / log(3)) = O(n1.465), O(nlog(7) /
log(4)) = O(n1.404), and
O(nlog(9) / log(5)) = O(n1.365) are
presented. In actual experiments, all of these methods turn out to be
faster than FFT multiplies for numbers that can be quite large
(> 37,000,000 bits). Squaring seems to be fundamentally faster than
multiplying but it is shown that Tmultiply
< 2Tsquare + O(n).
Introduction
This is an expansion of the paper On Squaring
and Multiplying Large Integers presented at the IEEE Symposium on
Computer Arithmetic (ARITH-11) in July 1993. This paper presents more
of the ideas behind these methods; some new results; and some
speculations for areas of future work.
Much of the work presented here is based on methods that appear in
Don Knuth's classic work The Art of Computer
Programming as well as the work of
A. L. Toom.
In actual experiments, some of the simpler methods of squaring and
multiplying are shown to be best in a surprisingly large number of
cases. Some new methods are also discussed which are useful out to
quite large numbers. In light of these results it seems that methods
such as the Schönhage and Strassen FFT
multiply, while of theoretical interest, may never be best for any
reasonably sized numbers.
The Problem
The problem to be discussed here is how to find the best (fastest)
ways to square and multiply large numbers in software. The methods
presented here were implemented in C and assembly language on an
HP-9000/720 but the general observations and conclusions should be true
in wider areas of application.
The approach used was to write a collection of routines for squaring
and multiplying w-word numbers producing 2w-word results for various specific values of w. These were general purpose routines in the sense
that w was a parameter.
In the course of writing these routines, it became clear that some
common operations on large integers stored as arrays of unsigned 32-bit
words would be needed. These operations were written in assembly
language and performed the functions detailed in Table
1.
Table 1: These arrays were oriented "big-endian".
That is, the most significant word of a[0,w-1] was
stored in a[0] and the least significant word was
stored in a[w-1]. While the author preferred
"little-endian", both C and the HP-9000 PA-RISC architecture
disagreed. They won, in the end. I will not,
however, inflict the big-endian notation on the reader. In this
paper, I will use the somewhat more conventional notation of
mathematics. That is, little-endian. So there.
Given operands a[0,w-1], b[0,w-1], producing result r[0,w-1]
|
vzconst(r,w,c) |
r[0,w-1] = ccc...c (w c's) |
vzcopy(r,w,a) |
r[0,w-1] = a[0,w-1] |
vzneg(r,w,a) |
r[0,w-1] = -a[0,w-1] |
vzshlM(r,w,a,c) |
r[0,w-1] = a[0,w-1]|c << M for M in [1,2,4,8,16,31]
|
vzshr(r,w,a,M) |
r[0,w-1] = a[0,w-1] >> M for M in [0:31] |
vzadd(r,w,a,b) |
r[0,w-1] = a[0,w-1] + b[0,w-1] |
vzsub(r,w,a,b) |
r[0,w-1] = a[0,w-1] - b[0,w-1] |
Cout = vzaddwco(r,w,a,b) |
r[0,w-1] + Cout = a[0,w-1] + b[0,w-1] |
vzshMadd(r,w,a,b,M) |
r[0,w-1] = a[0,w-1] << M + b[0,w-1] |
vzshMsub(r,w,a,b,M) |
r[0,w-1] = a[0,w-1] << M - b[0,w-1] |
t = vzge(b,w,a) |
t = (b[0,w-1] >= a[0,w-1]) |
A collection of routines for squaring and multiplying "small"
integers were also written in assembly. These provided a basis upon
which the larger routines could be built.
All other routines were written in C and compiler optimized.
Running time was measured for w in the
range 1 to 1,175,553 words (37,617,696 bits) by counting instruction
cycles on a 50MHz HP-9000/720.
Finding the Best
We will define TSi(w) as the
time required for method i to square a number
of length w and
TMi(w) as the time required for method i to multiply a number of length
w. (We will use the expression Ti(w)
when discussing either method or method i
in general.)
Method i is considered best for
some length w if
Ti(w) <= Tj(w')
for all j and for all w'
>= w.
That is, a method is best for a given length if there is no faster
way of squaring or multiplying numbers at least as large as this
one.
Method 1: The w2
Basis
To start the ball rolling, basis routines for operating on small
integers were written using the obvious method.
Roughly speaking, if the algorithm for a multiply is
FOR i = w-1 to
0 :
FOR j = w-1 to
0 :
(r[i+j,i+j+1] and
CarryOut) += a[i] × b[i],
then squaring can be made strictly faster than multiply by
accumulating the upper half off-diagonal elements separately and
combining them with the diagonals according to the formula
Result = 2 ×
Offdiagonals + Diagonals.
Obviously, the running time,
T1(w), increases quadratically.
T1(w), in cycles, is well approximated on the
HP-9000/720 by the least squares formulas
TS1(w) = 4.53w2 + 10w + 30.71
and
TM1(w) =
9.82w2 - 9.06w + 90.26.
These routines turned out to be best for all w
up to 33 for square (23 for multiply).
The 2-Way Method
It seems that the first discussion of a method of multiplication
(squaring, actually) that was better than w2
in the width of the operand was a short paper written by Karatsuba and Ofman that appeared in the Soviet
journal Doklady in 1963. The method involves splitting the operand into
2 parts: the high order bits and the low order bits, based on the
polynomial
(A1x + A0)2 =
A12(x2 - x) + (A1 +
A0)2x + A02(1 - x)
where x = 2n.
Knuth presented a minor improvement on this based on the slightly
different polynomial
(A1x + A0)2 =
A12(x2 + x) - (A1 -
A0)2x + A02(x +
1).
Both of these may be regarded as special cases of solving for the
Ci in the polynomial
(A1x + A0)2 =
C2x2 + C1x + C0.
Karatsuba and Ofman solve this equation by letting
x take on the values {Infinity, 1, 0}
where x = Infinity corresponds to
|
|
(A1x +
A0)2/x2 =
|
|
(C2x2 + C1x +
C0)/x2
|
in this context.
Rather than express this system in the usual polynomial form, I will
express it as a linear transformation of the operand in the following
way
(Please forgive the fairly primitive look of these matrix
expressions. It is the best I have been able to do under the
limitations of html.)
After squaring each element of the vector, the solution to the
resulting system
may be expressed as
This method has a small problem in that the term
A1 + A0 requires one more bit than the
other two. Karatsuba and Ofman address this problem by showing how you
can square an n+1-bit number using an n-bit algorithm together with a couple of shifts and
adds.
Knuth solves the equation by letting x take
on the values {Infinity, -1, 0}. A similar
transformation
results, after squaring, in a similar system
which has the solution
Knuth presented this method as Karatsuba and Ofman's but Knuth's
variation leads to a real improvement. By rearranging the algebra a
bit, he was able to replace the A1 +
A0 term with A1 -
A0. Since the only use of the term is to square it,
its sign is unimportant. Therefore, one can always subtract the lesser
from the greater and save that extra bit.
The test and branch involved results in much less overhead than
Karatsuba and Ofman's method.
This method can be turned into a multiply method by replacing the
Si with the
Pi in the formula
|
|
|
= |
|
= |
|
|
|
|
|
|
|
A1B1
|
|
(A1 - A0)(B1 -
B0)
|
|
A0B0
|
|
|
|
|
|
|
and doing the implied extra work.
These methods are diagrammed in Figure 1.
Figure 1: The 2-way Method, O(w1.585)
| |
Square |
|
Multiply |
| Initial Transform: |
V1 = |A1 - A0|
|
|
V1 = |A1 - A0|
W1 = |B1 - B0|
|
| Squaring/Multiplication: |
C2 = A12
S1 = V12
C0 = A02
|
|
C2 = A1B1
P1 = V1W1
C0 = A0B0
|
| Final Transform: |
t1 = C2 + C0
C1 = t1 - S1
|
|
t1 = C2 + C0
C1 = t1 +/- P1
|
|
Assuming that the time for an add (or subtract) of length w is O(w) , the time to
square a number (or multiply two numbers) of length 2w
would be
T2(2w) = 3Tbest(w) + O(w).
The overhead involved, one add before the squaring step (two adds
before the multiplying step) and two after, is strictly larger than the
overhead for a w2 method. Thus,
although a w2 squaring (or multiply)
of length 2w would take
T1(2w) = 4Tbest(w) + O(w),
there is a definite crossover point as w
increases where T1(w) is roughly the
same as T2(w). This point occurs
where the time involved in the extra overhead of Knuth's method (which I
will call the 2-way method from this point on)
is roughly equivalent to the time for the 4th multiply in the
w2 method.
Below this point the w2 method
wins because of its simplicity and above this point the
2-way method wins because of its better asymptotic behavior.
Figure 2 illustrates this by plotting the
ratio T2(w) / T1(w) for
some 2-way routines. (T1(w) is actually the least squares
approximation.)
Ratio of T2(w), the running
time of Knuth's 2-way multiply to T1(w), the running time of the
w2 basis.
|
In the limit, the running time of the 2-way
method triples each time the size of the problem doubles. Therefore,
its time increases as wlog(3) / log(2) =
w1.585.
The running time of the 2-way routines (in
cycles) is approximated by the least squares formulas
TS2(w) = 26.64wlog(3) / log(2)
- 54.63w + 227 and
TM2(w) =
46.2wlog(3) / log(2) - 84.11w + 979.
Building on a basis set of w2
routines up to 33 words for square (23 words for multiply), the 2-way method takes over at 34 words (24 words) and
generally dominates in these experiments up to 768 words (384
words).
Mark Shand has pointed out that this
crossover point is a strong function of the ratio of the multiply time
to add time on a given machine. The HP-9000/720 performs an unsigned
32 × 32 --> 64 multiply in 2 cycles (5 to
7 cycles if you include load/store overhead). On a machine with a
relatively slower multiply the crossover point can be much lower.
The General Idea
We now have enough background to show the general idea behind these
squaring (and multiplying) methods.
In each squaring case, the operand is split into k
parts Aj. These parts are
transformed via a Qij (derived from
the original polynomial) into m parts Vi = QijAj. Then,
each of the Vi is squared with the
best method available, producing Si =
Vi2.
In each multiply case, the operands are split into
k parts Aj and Bj. These parts are transformed via Qij into m parts
Vi = QijAj and
Wi = QijBj.
Then, each of the Vi is multiplied
by Wi with the best method
available, producing Pi =
ViWi.
Now, the solution to the system
MijCj = Si (
MijCj = Pi) (derived from the
square of the original polynomial) can be expressed as
Ci = Dik-1RkjSj
( Ci =
Dik-1RkjPj) with all
the coefficients in the integers. (Up to now,
Dik-1, which expresses the integer divides,
has not been needed as it was the identity matrix.) Finally, the Ci are shifted and added up to produce the
result.
The 3-Way Method
Toom showed that circuits could be
constructed for squaring integers where the size of the circuit was
bounded by O(ncSqrt(log(n))) and the
delay was bounded by
O(cSqrt(log(n))).
Cook showed that an algorithm for squaring
integers could be realized which had a running time of
O(n25Sqrt(log(n))).
This algorithm, which is actually a collection of algorithms, has
come to be known as the Toom-Cook method.
The first method in this collection is equivalent to Karatsuba's
method.
The second method divides a number into three parts and involves
solving for the Ci in the
polynomial
(A2x2 + A1x +
A0)2 = C4x4 +
C3x3 + C2x2 +
C1x + C0.
Cook solves this equation by letting x take
on the values {4, 3, 2, 1, 0} in the
polynomial, leading to the transformation
|
|
= |
|
|
|
|
|
|
| 16 |
4 |
1 |
| 9 |
3 |
1 |
| 4 |
2 |
1 |
| 1 |
1 |
1 |
| 0 |
0 |
1 |
|
|
|
|
|
|
|
which, after squaring, yields the system
|
|
|
|
|
|
| 256 |
64 |
16 |
4 |
1 |
| 81 |
27 |
9 |
3 |
1 |
| 16 |
8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
= |
|
= |
|
|
|
|
|
|
|
(16A2 + 4A1 +
A0)2
|
|
(9A2 + 3A1 +
A0)2
|
|
(4A2 + 2A1 +
A0)2
|
|
(A2 + A1 +
A0)2
|
|
A02
|
|
|
|
|
|
|
the solution of which I will express as
|
|
= |
|
|
|
|
|
-1 |
|
| 24 |
0 |
0 |
0 |
0 |
| 0 |
12 |
0 |
0 |
0 |
| 0 |
0 |
24 |
0 |
0 |
| 0 |
0 |
0 |
12 |
0 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
-4 |
6 |
-4 |
1 |
| -3 |
14 |
-24 |
18 |
-5 |
| 11 |
-56 |
114 |
-104 |
35 |
| -3 |
16 |
-36 |
48 |
-25 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
The diagonal terms in the first matrix are the denominators in
divides of intermediate results. In practice, there turn out to be four
divides by 3 since the rest amounts to shifts of 2 or 3 bits.
Knuth solves this with x taken from {2, 1, 0, -1, -2} leading to the transformation
|
|
= |
|
|
|
|
|
|
| 4 |
2 |
1 |
| 1 |
1 |
1 |
| 0 |
0 |
1 |
| 1 |
-1 |
1 |
| 4 |
-2 |
1 |
|
|
|
|
|
|
|
and the system
|
|
|
|
|
|
| 16 |
8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
0 |
1 |
| 1 |
-1 |
1 |
-1 |
1 |
| 16 |
-8 |
4 |
-2 |
1 |
|
|
|
|
|
|
|
= |
|
= |
|
|
|
|
|
|
|
(4A2 + 2A1 +
A0)2
|
|
(A2 + A1 +
A0)2
|
|
A02
|
|
(A2 - A1 +
A0)2
|
|
(4A2 - 2A1 +
A0)2
|
|
|
|
|
|
|
which has the somewhat more symmetrical solution
|
|
= |
|
|
|
|
|
-1 |
|
| 24 |
0 |
0 |
0 |
0 |
| 0 |
12 |
0 |
0 |
0 |
| 0 |
0 |
24 |
0 |
0 |
| 0 |
0 |
0 |
12 |
0 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
-4 |
6 |
-4 |
1 |
| 1 |
-2 |
0 |
2 |
-1 |
| -1 |
16 |
-30 |
16 |
-1 |
| -1 |
8 |
0 |
-8 |
1 |
| 0 |
0 |
1 |
0 |
0 |
|
|
|
|
|
|
|
Although both of these solutions look similar, Knuth's is somewhat
better than the previous one. The sign symmetry simplifies the system
and greatly reduces the overhead (in the form of add, subtract, shift,
and the like).
If, however, one uses reciprocal symmetry as well as sign symmetry
in the choices of x, as in the set {Infinity, 2, 1, 1/2, 0}, then the greater symmetry in
the system results in even greater simplification and greater reduction
in overhead. (In general, x = p/q corresponds
to q2n(An(p/q)n + ...
+ A0)2 =
q2n(C2n(p/q)2n + ... +
C0).)
This system
|
|
= |
|
|
|
|
|
|
| 1 |
0 |
0 |
| 4 |
2 |
1 |
| 1 |
1 |
1 |
| 1 |
2 |
4 |
| 0 |
0 |
1 |
|
|
|
|
|
|
|
and
|
|
|
|
|
|
| 1 |
0 |
0 |
0 |
0 |
| 16 |
8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
1 |
| 1 |
2 |
4 |
8 |
16 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
= |
|
= |
|
|
|
|
|
|
|
A22
|
|
(4A2 + 2A1 +
A0)2
|
|
(A2 + A1 +
A0)2
|
|
(A2 + 2A1 +
4A0)2
|
|
A02
|
|
|
|
|
|
|
has the solution
|
|
= |
|
|
|
|
|
-1 |
|
| 1 |
0 |
0 |
0 |
0 |
| 0 |
6 |
0 |
0 |
0 |
| 0 |
0 |
2 |
0 |
0 |
| 0 |
0 |
0 |
6 |
0 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
0 |
0 |
0 |
0 |
| -21 |
2 |
-12 |
1 |
-6 |
| 7 |
-1 |
10 |
-1 |
7 |
| -6 |
1 |
-12 |
2 |
-21 |
| 0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
(Oddly enough, it appears that reciprocal symmetry is more important
than sign symmetry in the sense that, if one must break one or the
other, the system seems simpler if sign symmetry is broken rather than
reciprocal symmetry.)
This solution has only 2 divides by 3 and 3 shifts as well as much
less overhead in the computation of the intermediate results.
The computation of S2 requires 2
extra bits and S1 and S3 each require 3.
The squaring method implied by this solution is shown in Figure 3.
Figure 3: The 3-way Method, O(w1.465)
| | | | |
| | | |
| |
| Initial Transform: |
|
|
t02 = 2A2 + A1
V3 = 2t02 + A0
|
|
t01 = A2 + A0
V2 = t01 + A1
|
|
t00 = A1 + 2A0
V1 = A2 + 2t00
|
|
|
| Squaring Stage: |
C4 = A22
|
|
S3 = V32
|
|
S2 = V22
|
|
S1 = V12
|
|
C0 = A02
|
| Final Transform: |
t34 = 4C4 - C0
|
|
t33 = 4S2 - S3
t42 = t33 + t34
t52 = 2t34 + t42
-6C3 = 2t52 + t50
|
|
t32 = C4 + C0
t41 = S2 - t32
t60 = t52 + t50
2C2 = 2t41 + t60
|
|
t31 = 4S2 - S1
t40 = t31 + t30
t50 = 2t30 + t40
-6C1 = 2t50 + t52
|
|
t30 = 4C0 - C4
|
|
Notice that it comes out looking like a complicated version of the
2-way method in that there is an initial
transformation stage, a squaring stage, and a final transformation
stage.
All of the operations shown can be made
O(w). Shifts can be done separately or as part of the combined
adds and subtracts.
The time to square (multiply) a number of length 3w
becomes
T3(3w) = 5Tbest(w) + O(w).
This suggests that the asymptotic running time of the 3-way method increases as
wlog(5) / log(3) = w1.465.
The improvement is not nearly so dramatic as the improvement in the
2-way method over the
w2 method. The advantages of more complex routines
will be even less dramatic for much more work.
The running time of the 3-way method is
approximated by the formulas
TS3(w) = 64.59wlog(5) / log(3)
- 243.51w + 33913 and
TM3(w) = 90.15wlog(5) / log(3) + 75.91w -
44016.
As before, to turn this method into a multiply, one duplicates those
operations before the squaring stage and multiplies the corresponding
terms instead of squaring them.
Running Time of Toom-Cook Style Methods
The running time of a Toom-Cook, Knuth, and
k-way methods will all have three components:
m = 2k - 1 squarings of elements of size
w/k; some O(w) overhead; and some fixed
overhead.
Assuming that one starts with a basis routine of length w0 that takes
t0 time, we have
Tk(w0) = t0
Tk(kr+1w0) =
mTk(krw0) +
c1krw0 + c0
Tk(kw0) = mt0 +
c1w0 + c0
Tk(k2w0) =
m2t0 + c1w0(m + k) +
c0(m + 1)
Tk(k3w0)
= m3t0 +
c1w0(m2 + mk + k2) +
c0(m2 + m + 1)
...
Tk(krw0) =
mrt0 +
c1w0(mr - kr) / (m - k) +
c0(mr - 1) / (m - 1).
Rearranging terms in this last equation gives
Tk(krw0) = (t0 +
c1w0 / (m - k) +
c0 / (m - 1))mr -
c1w0kr / (m - k) -
c0 / (m - 1)
or, in terms of w = krw0
becomes
Tk(w) = (t0 +
c1w0 / (m - k) + c0 / (m -
1))(w/w0)log(m) / log(k) - c1w / (m -
k) - c0 / (m - 1).
This formula has the expected form: a dominant
(w/w0)log(m) / log(k) term, an O(w) term, and a constant term, both of which may be
neglected for large w.
But, and this is important, the dominant term contains terms
proportional to the overhead. Thus, while the overhead terms may
be neglected for large w, the overhead
itself may not as it directly contributes to the
magnitude of the dominant term.
The 4-Way Method and
Beyond
Extension to a k-way method involves solving
for the Ci in the polynomial
(Akxk + ... +
A0)2 = C2kx2k + ... +
C0.
Cook would solve this polynomial by letting x
take on the values {0, ..., 2k}.
Knuth would solve this polynomial by letting x
take on the values {-k, ..., k}.
But if one chooses x from the reciprocally
symmetric set {1, Infinity, 0, 2, 1/2, -2, -1/2, 3,
1/3, -3, -1/3, 3/2, 2/3, ...}, that is, the set of small rational
numbers, +/- p/q, such that the
GCD(p,q) = 1, the resulting system of equations is very
symmetrical and the algorithm is simpler.
I will illustrate the various methods for k =
4.
Cook:
|
|
= |
|
|
|
|
|
|
| 216 |
36 |
6 |
1 |
| 125 |
25 |
5 |
1 |
| 64 |
16 |
4 |
1 |
| 27 |
9 |
3 |
1 |
| 8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46656 |
7776 |
1296 |
216 |
36 |
6 |
1 |
| 15625 |
3125 |
625 |
125 |
25 |
5 |
1 |
| 4096 |
1024 |
256 |
64 |
16 |
4 |
1 |
| 729 |
243 |
81 |
27 |
9 |
3 |
1 |
| 64 |
32 |
16 |
8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
= |
|
= |
|
|
|
|
|
|
|
(256A3 + 36A2 + 6A1 +
A0)2
|
|
(125A3 + 25A2 + 5A1 +
A0)2
|
|
(64A3 + 16A2 + 4A1 +
A0)2
|
|
(27A3 + 9A2 + 3A1 +
A0)2
|
|
(8A3 + 4A2 + 2A1 +
A0)2
|
|
(A3 + A2 + A1 +
A0)2
|
|
A02
|
|
|
|
|
|
|
|
|
= |
|
|
|
|
|
-1 |
|
| 720 |
0 |
0 |
0 |
0 |
0 |
0 |
| 0 |
240 |
0 |
0 |
0 |
0 |
0 |
| 0 |
0 |
144 |
0 |
0 |
0 |
0 |
| 0 |
0 |
0 |
48 |
0 |
0 |
0 |
| 0 |
0 |
0 |
0 |
360 |
0 |
0 |
| 0 |
0 |
0 |
0 |
0 |
60 |
0 |
| 0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
-6 |
15 |
-20 |
15 |
-6 |
1 |
| -5 |
32 |
-85 |
120 |
-95 |
40 |
-7 |
| 17 |
-114 |
321 |
-484 |
411 |
-186 |
35 |
| -15 |
104 |
-307 |
496 |
-461 |
232 |
-49 |
| 137 |
-972 |
2970 |
-5080 |
5265 |
-3132 |
812 |
| -10 |
72 |
-225 |
400 |
-450 |
360 |
-147 |
| 0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
|
|
|
|
|
|
Knuth:
|
|
= |
|
|
|
|
|
|
| 27 |
9 |
3 |
1 |
| 8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
1 |
| -1 |
1 |
-1 |
1 |
| -8 |
4 |
-2 |
1 |
| -27 |
9 |
-3 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 729 |
243 |
81 |
27 |
9 |
3 |
1 |
| 64 |
32 |
16 |
8 |
4 |
2 |
1 |
| 1 |
1 |
1 |
1 |
1 |
1 |
1 |
| 0 |
0 |
0 |
0 |
0 |
0 |
1 |
| 1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
| 64 |
-32 |
16 |
-8 |
4 |
-2 |
1 |
| 729 |
-243 |
81 |
-27 |
9 |
-3 |
1 |
|
|
|
|
|
|
|
= |
|
= |
|
|
|
|
|
|
|
(27A3 + 9A2 + 3A1 +
A0)2
|
|
(8A3 + 4A2 + 2A1 +
A0)2
|
|
(A3 + A2 + A1 +
A0)2
|
|
A02
|
|
(-A3 + A2 - A1 +
A0)2
|
|
(-8A3 + 4A2 - 2A1 +
A0)2
|
|
(-27A3 + 9A2 - 3A1 +
A0)2
|
|
|
|
|
|
|
|
|
= |
|
|
|
|
|
-1 |
|
| 720 |
0 |
0 |
0 |
0 |
0 |
0 |
| 0 |
240 |
0 |
0 |
0 |
0 |
0 |
| | |