100% found this document useful (1 vote)
423 views

Numerical Methods Pdfs

The document discusses numerical errors that occur in numerical computations. It identifies three main sources of errors - inherent errors due to assumptions in modeling, round-off errors due to finite representation of numbers in computers, and truncation errors due to finite terms in infinite processes like series approximations. It also defines exact and approximate numbers and discusses absolute, relative and percentage errors in numerical solutions. Rules for rounding numbers are provided along with examples.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
423 views

Numerical Methods Pdfs

The document discusses numerical errors that occur in numerical computations. It identifies three main sources of errors - inherent errors due to assumptions in modeling, round-off errors due to finite representation of numbers in computers, and truncation errors due to finite terms in infinite processes like series approximations. It also defines exact and approximate numbers and discusses absolute, relative and percentage errors in numerical solutions. Rules for rounding numbers are provided along with examples.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 275

Numerical Analysis

by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]
.

Chapter 1

Numerical Errors

Module No. 1

Errors in Numerical Computations


......................................................................................

Two major techniques are used to solve any mathematical problem − analytical and
numerical. The analytical solution is obtained in a compact form and generally it is free
from error. On the other hand, numerical method is a technique which is used to solve
a problem with the help of computer or calculator. In general, the solution obtained by
this method contains some error. But, for some class of problems it is very difficult to
obtain an analytical solution. For these problems we generally use numerical methods.
For example, the solutions of complex non-linear differential equations cannot be de-
termined by analytical methods, but these problems can easily be solved by numerical
methods. In numerical method there always be a scope to occur errors and hence it
is important to understand the source, propagation, magnitude, and rate of growth of
these errors.
To solve a problem with the help of computer, a special method is required and
this method is known as numerical method. Analytical methods are not suitable to
solve a problem by computer. Thus, the numerical methods are highly appreciated and
extensively used by scientists and engineers.
Let us discuss sources of error.

1.1 Sources of error

It is well known that the solution of a problem obtained by numerical method contains
some errors. But, our intension is to minimize the error. To minimize it, the most
essential thing is to identify the causes or sources of the error. Three sources of errors,
viz. inherent errors, round-off errors and truncation errors occur to find a solution of a
problem by using numerical method. They are discussed below.

(i) Inherent errors: These type of errors occur due to the simplified assumptions
made during mathematical modelling of the problem. These errors also occur when the
data is obtained from certain physical measurements of the parameters of the proposed
problem.

(ii) Round-off errors: Generally, the numerical methods are performed using com-
puter. In numerical computation, all the numbers are represented as decimal fraction.
Again, a computer can store finite number of digits for a number. Some numbers viz.
1/3, 1/6, 1/7 etc. can not be represented by decimal fraction in finite numbers of digits.
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors in Numerical Computations

Thus, to represent these numbers some digits must be discarded and hence the numbers
should be rounded-off into some finite number of digits. So in arithmetic computation,
some errors will occur due to the finite representation of the numbers; these errors are
called round-off errors. These errors depend on the word length of the used computer.

(iii) Truncation errors: These errors occur due to the finite representation of an
inherently infinite process. These types of errors are explained by an example. Let us
consider the cosine series.
The Taylor’s series expansion of cos x is

x2 x4 x6
cos x = 1 − + − + ··· .
2! 4! 6!

This is well known that this series s infinite. If we consider the first five terms to
calculate the value of cos x for a given x, then we obtained an approximate value. The
error occurs due to the truncation of the remaining terms of the series and it is called
the truncation of error.
Note that the truncation error is independent of the computational machine.

1.2 Exact and approximate numbers

In numerical computation, a number is consider as either exact or approximate value


of a solution of a problem. Exact number represents the true value of a result while the
approximate number represents the value which is closed to the true value.
For example, in the statements ‘a book has 134 pages’, ’the population of a locality
is 15000’ the numbers 134, 15000 are exact numbers. But, in the assertions ‘the time
taken to fly from Kolkata to New Delhi is 2 hrs’, ‘the number of leaves of a mango tree
is 150000’, the numbers 2 and 150000 are approximate numbers, as time to fly from
Kolkata to New Delhi is approximately 2 hrs and similarly, the number of leaves of the
tree is approximately 150000, because it is not possible to count exact number of leaves
of a big tree.
These approximations are coming either from the imperfection of measuring instru-
ments or the measurement depends on other parameters. There are no absolutely exact
measuring instruments; each of them has its own accuracy.
2
......................................................................................

It may be noted that same number may be exact as well as approximate. For ex-
ample, the number 3 is exact when it represents the number of rooms of a house and
approximate when it represents the number π.
The accuracy of a solution is defined in terms of number of digits used in the com-
putation. The significant digits or significant figures of a number are all its digits,
except for zeros which appear to the left of the first non-zero digit. But, the zeros at
the end of a number are always significant digit. The numbers 0.000342 and 8921.2300
have 3 and 8 significant digits respectively.
Some times we need to cut off usable digits. The number of digits to be cut off depends
on the problem. This process to cut off digits from a number is called rounding-off
of numbers. That is, in rounding process the number is approximated to a very close
number consisting of a smaller number of digits. In that case, one or more digits are
kept with the number, taken from left to right, and all other digits are discarded.

Rules of rounding-off
(i) If the discarded digits constitute a number which is larger than half the unit in the
last decimal place that remains, then the last digit that is left is increased by one.
If the discarded digits constitute a number which is smaller than half the unit in
the last decimal place that remains, then the digits that remain do not change.

(ii) If the discarded digits constitute a number which is equal to half the unit in the
last decimal place that remains, then the last digit that is half is increased by one,
if it is odd, and is unchanged if it is even.
This rule is often called a rule of an even digit.

In Table 1.1, we consider different cases to illustrate the round-off process. In this
table the numbers are rounded-off to the six significant figures. But, computer kept
more number of digits during round-off. It depends on the computer and the type of
the number declared in a programming language.
Note that the round-off numbers contain errors and this errors are called round-off
errors.

3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors in Numerical Computations

Exact number Round-off number to six significant figures


26.0123728 26.0124 (added 1 in the last digit)
23.12432615 23.1243 (last digit remains unchanged)
30.455354 30.4554 (added 1 in the last digit)
19.652456 19.6525 (added 1 in the last digit)
126.3545 126.344 (last digit remains unchanged)
34.4275 34.4280 (added 1 in the last digit to make even digit)
8.999996 9.00000 (added 1 in the last digit)
9.999997 10.0000 (added 1 in the last digit)
0.0023456573 0.00234566 (added 1 in the last digit)
6.237 6.23700 (added two 0’s to make six figures)
67542159 675422×102 (integer is rounded to six digits)

Table 1.1: Different cases of round-off numbers

1.3 Absolute, relative and percentage errors

Let xA be the approximate value of the exact number XT .


The difference between the exact value xT and its approximate value xA is an error.
But, by principle it is not possible to determine the value of the error xT − xA and even
its sign, when the exact number xT is unknown.
The errors are designated as absolute error, relative error and percentage error.

Absolute error:
Let xA be the approximate value of the exact number xT . Then the absolute error is
denoted by (∆x) and satisfies the relation

∆x ≥ |xT − xA |.

Note that the absolute error is the upper bound of the difference between xT and
xA . This definition is applicable when there are many approximate values of the exact
number xT . Otherwise, ∆x = |xT − xA |.
4
......................................................................................

Also, the exact value xT lies between xA − ∆x and xA + ∆x. It can be written as

xT = xA ± ∆x. (1.1)

The upper bound of the absolute error is 1


absolute error ≤ × 10−m , (1.2)
2
when the number is rounded to m decimal places.

Note that the absolute error measures the total error and hence this error measures
only the quantitative side of the error. It does not measure the qualitative, i.e. how
much the measurement is accurate. For example, the length and the width of a pond
are determined by a tape in meter. Suppose that width w = 50 ± 2 m and the length
l = 250 ± 2 m. In both the measurements the absolute error is 2 m, but it is obvious
that the second measure is more accurate.
To determine the quality of measurements, we introduced a new concept called rela-
tive error.

Relative error:
The relative error is denoted by δx and is defined by
∆x ∆x
δx = or , |xT | 6= 0 and |xA | 6= 0.
|xA | |xT |
This expression can also be written as

xT = xA (1 ± δx) or xA = xT (1 ± δx).

Note that the absolute error is the total error when whole thing is measured, while
relative error is the error when we measure 1 unit. That is, the relative error is the error
per unit measurement.
2 2
In case of above example, the relative errors are δw = 50 = 0.04 and δl = 250 =
0.008. Thus, the second measurement is more accurate.
In general, the relative error measures the quantity of error and quality of the mea-
surement. Thus, the relative error is a better measurement of error than absolute error.

Percentage error:
The relative error is measured in 1 unit scale while the percentage error is measured
5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors in Numerical Computations

in 100 unit scale. The percentage error is measured by δx × 100%. This error is
sometimes called relative percentage error. Percentage error measures both the
quantity and quality. Generally, when relative error is very small then the percentage
error is determined.
Note that the relative and percentage errors are free from the unit of measurement,
while absolute error depends on the measuring unit.
1
Example 1.1 Find the absolute, relative and percentage error in xA when xT = and
7
xA = 0.1429.
Solution. The absolute error

1 1 − 1.0003
∆x = |xT − xA | = − 0.1429 =

7 7
0.0003
= = 0.000043 rounding up to two significant figures.
7
The relative error
∆x 0.000043
δx = = = 0.000329 ' 0.0003.
xT 1/7
The percentage error is δx × 100% = 0.0003 × 100% ' 0.03%.

Example 1.2 Find the absolute error and the exact number corresponding to the ap-
proximate number xA = 7.543. Assume that the percentage error is 0.1%.

Solution. The relative error is δx = 0.1% = 0.001.


Therefore, the absolute error is ∆x = |xA × δx| = 7.543 × 0.001 = 0.007543 ' 0.0075.
Thus, the exact value is = 7.543 ± 0.0075.

Example 1.3 Suppose two exact numbers and their approximate values are given by
17 √
xT = ' 0.8947 and yT = 71 ' 8.4261.
19
Find out which approximation is better.

Solution. To find the absolute error, we take the numbers xA and yA with a larger

number of decimal digits as xA ' 0.894736 · · · , yA = 71 ' 8.426149 · · · .
Therefore, the absolute error in xT is ∆x = |0.894736 · · · − 0.8947| ' 0.000036,
and ∆y = |8.426149 · · · − 8.4261| ' 0.000049.
6
......................................................................................

Thus, δx = 0.000036/0.8947 ' 0.000040 = 0.0040%


δy = 0.000049/8.4261 = 0.0000058 = 0.00058%.
The percentage error in second case is 0.00058 while in first case it is 0.0040. Thus
the second measurement is more better than the first one.

1.4 Valid significant digits

A decimal integer can be represented in many ways. For example, the number 7600000
can be written as 760 × 104 or 76.0 × 105 or 0.7600000 × 107 . Note that each number has
two parts, the first part is called mantissa and second part is called exponent. In last
form, the mantissa is a proper fraction and first digit after decimal point is non-zero.
This form is known as normalize form and it is commonly used in computer.
Every positive decimal number a can be expressed as

a = d1 × 10m + d2 × 10m−1 + · · · + dn × 10m−n+1 + · · · ,

where di are the digits constituting the number (i = 1, 2, . . .). The digit d1 6= 0 and
10m−i+1 is the value of the ith decimal place starting from left.
Let dn be the nth digit of the approximate number x. This digit is called valid
significant digit (or simply a valid digit) if it satisfies the following condition

∆x ≤ 0.5 × 10m−n+1 . (1.3)

If the inequality of (1.3) does not satisfied, the digit dn is said to be doubtful. If dn
is a valid digit then all the digits preceding to dn are also valid.
Theorem 1.1 If a number is correct up to n significant figures and the first significant
digit is k, then the relative error is less than
1
.
k × 10n−1
Proof. Let xA and xT be the approximate and exact values. Also, assume that xA is
correct up to n significant figures and m decimal places. There are three cases arise:
(i) m < n
(ii) m = n and
(iii) m > n.
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors in Numerical Computations

From (1.2) it is known that the absolute error ∆x ≤ 0.5 × 10−m .

(i) When m < n.


In this case, the total number of digits in integral part is n − m. Let k be the first
significant digit in xT . Therefore,

∆x ≤ 0.5 × 10−m and |xT | ≥ k × 10n−m−1 − 0.5 × 10−m .

Thus, the relative error


∆x 0.5 × 10−m
δx = ≤
|xT | k × 10n−m−1 − 0.5 × 10−m
1
= .
2k × 10n−1 − 1
Since, n is a positive integer and k is an integer lies between 1 and 9,

2k × 10n−1 − 1 > k × 10n−1

for all k and n except k = n = 1.


Hence,
1
δx < .
k × 10n−1
(ii) When m = n.
In this case, the first significant digit is same as first digit after decimal point, i.e. the
number is proper fraction.
As in previous case,
0.5 × 10−m
δx =
k × 10n−m−1 − 0.5 × 10−m
1 1
= < .
2k × 10n−1 − 1 k × 10n−1
(iii) When m > n.
In this case, the first significant digit k is at the (n−m+1) = −(m−n−1)th position and
the integer part is zero. Then ∆x ≤ 0.5 × 10−m and |xT | ≥ k × 10−(m−n+1) − 0.5 × 10−m .
Thus, 0.5 × 10−m
δx =
k × 10−(m−n+1) − 0.5 × 10−m
1 1
= < .
2k × 10n−1 − 1 k × 10n−1
Hence the theorem.
8
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]
.

Chapter 1

Numerical Errors

Module No. 2

Propagation of Errors and Computer Arithmetic


......................................................................................

This is the continuation of Module 1. In this module, the propagation of error during
arithmetic operations are discussed in details. Also, the representation of numbers in
computer and their arithmetic calculations are explained.

2.1 Propagation of errors in arithmetic operations

In numerical computation, it is always assumed that there is an error in every number,


it may be very small or large. The errors present in the numbers are propagated during
arithmetic process. But, the rate of propagation depends on the type of arithmetic
operation. This case is discussed in the subsequent sections.

2.1.1 Errors in sum and difference

Let us consider the exact numbers X1 , X2 , . . . , Xn and their corresponding approx-


imate number be respectively x1 , x2 , . . . , xn . Assumed that ∆x1 , ∆x2 , . . . , ∆xn be the
absolute errors in x1 , x2 , . . . , xn . Therefore, Xi = xi ± ∆xi , i = 1, 2, . . . , n.
Let X = X1 + X2 + · · · + Xn and x = x1 + x2 + · · · + xn .
The total absolute error is

|X − x| = |(X1 − x1 ) + (X2 − x2 ) + · · · + (Xn − xn )|


≤ |X1 − x1 | + |X2 − x2 | + · · · + |Xn − xn |.

This shows that the total absolute error in the sum is

∆x = ∆x1 + ∆x2 + · · · + ∆xn . (2.1)

Thus the absolute error in sum of approximate numbers is equal to the sum of the
absolute errors of all the numbers.
The following points should keep in mind during addition of numbers.
(i) identify a number (or numbers) of the least accuracy,

(ii) round-off the numbers to the nearest exact numbers and retain one digit more
than in the identified number,

(iii) perform addition for all retained digits,

(iv) round-off the result by discarding last digit.


1
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

Subtraction
The case for subtraction is similar to addition. Let x1 and x2 be two approximate values
of the exact numbers X1 and X2 respectively and X = X1 − X2 , x = x1 − x2 .
Therefore, one can write X1 = x1 ± ∆x1 and X2 = x2 ± ∆x2 .
Now, |X − x| = |(X1 − x1 ) − (X2 − x2 )| ≤ |X1 − x1 | + |X2 − x2 |. Hence,

∆x = ∆x1 + ∆x2 . (2.2)

It may be noted that the absolute error in difference of two numbers is equal to the
sum of individual absolute errors.

2.1.2 The error in product

Let us consider two exact numbers X1 and X2 with their approximate values x1 and
x2 . Let, X1 = x1 ± ∆x1 and X2 = x2 ± ∆x2 , where ∆x1 and ∆x2 are the absolute
errors in x1 and x2 .
Now, X1 X2 = x1 x2 ± x1 ∆x2 ± x2 ∆x1 ± ∆x1 · ∆x2 .
Therefore, |X1 X2 −x1 x2 | ≤ |x1 ∆x2 |+|x2 ∆x1 |+|∆x1 ·∆x2 |. Both the terms |∆x1 | and
|∆x2 | represent the errors and they are small, so their product is also small. Therefore,
we discard it and dividing both sides by |x| = |x1 x2 | to get the relative error.
Hence, the relative error is

X1 X2 − x1 x2 ∆x2 ∆x1
=
x2 + x1 . (2.3)

x1 x2

From this expression we conclude that the relative error in product of two numbers
is equal to the sum of the individual relative errors.
This result can be extended for n numbers as follows: Let X = X1 X2 · · · Xn and
x = x1 x2 · · · xn . Then

X − x ∆x1 ∆x2
+ · · · + ∆xn .

= + (2.4)
x x1 x2 xn

That is, the total relative error in product of n numbers is equal to the sum of
individual relative errors.
In particular, let all approximate values x1 , x2 , . . . , xn be positive and x = x1 x2 · · · xn .
Then log x = log x1 + log x2 + · · · + log xn .
2
......................................................................................

In this case,
∆x ∆x1 ∆x2 ∆xn
= + + ··· + .
x x1 x2 xn
∆x ∆x ∆x ∆x
1 2 n
Hence, = + + ··· + .

x x1 x2 xn
Let us consider another particular case. Suppose, x = kx1 , where k is a non-zero real
number. Now,
∆x k ∆x1 ∆x1
δx =
= = = δx1 .
x k x 1 x1
Also, |∆x| = |k ∆x1 | = |k| |∆x1 |.
Observed that the relative errors in both x and x1 are same, while absolute error in
x is |k| times the absolute error in x1 .

2.1.3 The error in quotient

Let X1 and X2 be two exact numbers and their approximate values be x1 and x2 .
X1 x1
Again, let X = and x = .
X2 x2
If ∆x1 and ∆x2 are the absolute errors, then X1 = x1 + ∆x1 , X2 = x2 + ∆x2 .
Suppose both x1 and x2 are non-zeros.
Now,
x1 + ∆x1 x1 x2 ∆x1 − x1 ∆x2
X −x= − = .
x2 + ∆x2 x2 x2 (x2 + ∆x2 )
Dividing both sides by x and taking absolute values:

X − x x2 ∆x1 − x1 ∆x2 x2 ∆x1 ∆x2
x x1 (x2 + ∆x2 ) x2 + ∆x2 x1 − x2 .
= =

Since the error ∆x2 is small as compared to x2 , therefore


x2
' 1.
x2 + ∆x2
Thus,

∆x X − x ∆x1 ∆x2 ∆x1 ∆x2
δx =
= = − ≤ + , (2.5)
x x x1 x2 x1 x2

i.e., δx = δx1 + δx2 .


This expression shows that the total relative error in quotient is equal to the sum of
their individual relative errors.
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

The relative error δx of (2.5) can also be expressed as



∆x ∆x1 ∆x2 ∆x1 ∆x2
x x 1 − x 2 ≥ x 1 − x 2 .
= (2.6)

It may be observed that the relative error in quotient is greater than or equal to the
difference of their individual relative errors.
In case of positive numbers one can determine the error of logarithm function. Let
x1 and x2 be the approximate numbers and x = x1 /x2 .
Now, log x = log x1 − log x2 . Thus,

∆x ∆x1 ∆x2 ∆x ∆x1 ∆x2
= − i.e., ≤ + .
x x1 x2 x x1 x2
Example 2.1 Find the sum of the approximate numbers 120.237, 0.8761, 78.23, 0.001234,
234.3, 128.34, 35.4, 0.0672, 0.723, 0.08734. It is known that in each of which all the
written digits are valid. Find the absolute error in sum.
Solution. The least exact numbers are 234.3 and 35.4. The maximum error of each of
them is 0.05. Now, rounding-off all the numbers in two decimal places (one digit more
than the least exact numbers).
Their sum is 120.24 + 0.88 + 78.23 +0.00 + 234.3 + 128.34 +35.4 + 0.07 + 0.72 + 0.09 =
598.27.
Now, rounding-off the sum to one decimal place and it becomes 598.3.
There are two types of errors in the sum. The first one is the initial error. This is
the sum of the errors of the least exact numbers and the rounding errors of the other
numbers, which is equal to 0.05 × 2 + 0.0005 × 8 = 0.104 ' 0.10.
The second one is the error in rounding-off the sum which is 598.3 − 598.27 = 0.03.
Thus, the total absolute error in the sum is 0.10 + 0.03 = 0.13.
Finally, the sum can be expressed as 598.3 ± 0.13.
Example 2.2 Let x1 = 43.5 and x2 = 76.9 be two approximate numbers and 0.02
and 0.008 be the corresponding absolute errors respectively. Find the difference between
these numbers and evaluate absolute and relative errors.
Solution. Here, x = x1 −x2 = −33.4 and the total absolute error is ∆x = 0.02+0.008 =
0.028.
Hence, the difference is 33.4 and the absolute error is 0.028.
The relative error is 0.028/| − 33.4| ' 0.00084 = 0.084%.
4
......................................................................................

Example 2.3 Let x1 = 12.4 and x2 = 45.356 be two approximate numbers and all
digits of both the numbers are valid. Find the product and the relative and absolute
errors.
Solution. The number of valid decimal places in first and second approximate numbers
are one and three respectively. So we round-off the second number to one decimal place.
After rounding-off the numbers become x1 = 12.4 and x2 = 45.4.
Now, the product is x = x1 x2 = 12.4 × 45.4 = 562.96 ' 56.0 × 10.
The result is rounded in two significant figures, because the least number of valid
significant digits of the given numbers is 3.
The relative error in product is

∆x ∆x1 ∆x2 0.05 0.0005
δx = = + = + = 0.004043 ' 0.40%.
x x1 x2 12.4 45.356

The absolute error is (56.0 × 10) × 0.004043 = 2.26408 ' 2.3.

Example 2.4 Let x1 = 7.235 and x2 = 8.72 be two approximate numbers, where all
the digits of the numbers are valid. Find the quotient and also relative and the absolute
errors.
Solution. Here, x1 = 7.235 and x2 = 8.72 have four and three valid significant digits
respectively. Now,
x1 7.235
= = 0.830.
x2 8.72
We consider three significant digits, since the least exact number contains three valid
significant digits.
The absolute error in x1 and x2 are respectively ∆x1 = 0.0005 and ∆x2 = 0.005.
The relative error in quotient is

∆x1 ∆x2 0.0005 0.005
x1 + x2 = 7.235 + 8.72 = 0.000069 + 0.000573

' 0.001 = 0.1%.

The absolute error is



x1
× 0.001 = 0.830 × 0.001 = 0.00083 = 0.001.
x2

5
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

2.1.4 The errors in power and in root

Let x1 be an approximate value of an exact number X1 and its relative error be δx1 .
Now, we determine the relative error of x = xk1 , where k is a real number.
Then
x = xk1 = x1 · x1 · · · k times.

According to the formula (2.4), the relative error δx is given by

δx = δx1 + δx1 + · · · + δx1 + k times = k δx1 . (2.7)

Thus, the relative error of the approximate number x is k times the relative error of
x1 .
Let us consider the case, the kth root of a positive approximate value x1 , i.e. the

number x = k x1 .
Since x1 > 0,
1
log x = log x1 .
k
Therefore,
∆x 1 ∆x1 ∆x 1 ∆x1
= or
= .
x k x1 x k x1

Thus, the relative error in k x1 is

1
δx = δx1 .
k

Example 2.5 Let a = 5.27, b = 28.61, c = 15.8 be the approximate values of some
numbers and let the absolute
√ errors in a, b, c be 0.01, 0.04 and 0.02 respectively. Calcu-
2 3
a b
late the value of E = and the error in the result.
c3
Solution. It is given that the absolute error ∆a = 0.01, ∆b = 0.04 and ∆c = 0.02. One
more significant figure retain to intermediate calculation. Now, the approximate values

of the terms a2 , 3 b, c3 are 27.77, 3.0585, 3944.0 respectively.
The approximate value of the expression is
27.77 × 3.0585
E= = 0.0215.
3944.0
6
......................................................................................

Three significant digits are taken in the result, since, the least number of significant
digits in the numbers is three.
The relative error is given by
1 0.01 1 0.04 0.02
δE = 2 δa + δb + 3 δc = 2 × + × +3×
3 5.27 3 28.61 15.8
' 0.0038 + 0.00047 + 0.0038 ' 0.008 = 0.8%.

The absolute error ∆E in E is 0.0215 × 0.008 = 0.0002.


Hence, A = 0.0215 ± 0.0002 and the relative error is 0.0002.
In the above example, E is an expression of three variables a, b, c, and the error
presents in E is illustrated. The general rule to calculate an error in a function of
several variables are determined below:

Error in function of several variables

Let y = f (x1 , x2 , . . . , xn ) be a differentiable function containing n variables x1 , x2 , . . . , xn .


Also, let ∆xi be the error in xi , for i = 1, 2, . . . , n.
Now, the absolute error ∆y in y is given by

y + ∆y = f (x1 + ∆x1 , x2 + ∆x2 , . . . , xn + ∆xn )


n
X ∂f
= f (x1 , x2 , . . . , xn ) + ∆xi + · · ·
∂xi
i=1
(by Taylor’s series expansion)
n
X ∂f
=y+ ∆xi
∂xi
i=1
(neglecting second and higher powers terms of ∆xi )
n
X ∂f
i.e., ∆y = ∆xi
∂xi
i=1

This is the formula to calculate the total absolute error to compute a function of
several variables.
The relative error can be calculated as
n
∆y X ∂f ∆xi
= .
y ∂xi y
i=1
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

2.2 Significant error

It may be remembered that some significant digits are lost during arithmetic calcu-
lation, due to the finite representation of computing instruments. This error is called
significant error.
In the following two cases, there are high chances to loss of more significant digits
and care should be taken in these situations:
(i) When two nearly equal numbers are subtracted, and

(ii) When division is made by a very small divisor compared to the dividend.

It should be remembered that the significant error is more serious than round-off
error. These are illustrated in the following examples:
√ √
Example 2.6 Find the difference 10.23 − 10.21 and calculate the relative error in
the result.
√ √
Solution. Let X1 = 10.23 and X2 = 10.21 and their approximate values be
x1 = 3.198 and x2 = 3.195. Let X = X1 − X2 .
Then the absolute errors are ∆x1 = 0.0005 and ∆x2 = 0.0005 and the approximate
difference is x = 3.198 − 3.195 = 0.003.
Thus, the total absolute error in the subtraction is ∆x = 0.0005 + 0.0005 = 0.001
0.001
and the relative error is δx = = 0.3333.
0.003
But, by changing the calculation scheme one can obtained more accurate result. For
example,
√ √ 10.23 − 10.21
X= 10.23 − 10.21 = √ √
10.23 + 10.21
0.02
= ' 0.003128 = x (say).
3.198 + 3.195
The relative error is
∆x1 + ∆x2 0.001
δx = = = 0.0002 = 0.02%.
x1 + x2 3.198 + 3.195
Observed that the relative error is much less that the previous case.
8
......................................................................................

Example 2.7 Find the roots of the equation x2 − 1500x + 0.5 = 0.

Solution. To illustrate the difficulties of the problem, let us assumed that the com-
puting machine using four significant digits for all arithmetic calculation. The roots of
this equation are

15002 − 2
1500 ±
.
2
Now, 1500 − 2 = 0.2250 × 10 − 0.0000 × 10 = 0.2250 × 107 .
2 7 7

Thus 15002 − 2 = 0.1500 × 104 .
Hence, the roots are
0.1500 × 104 ± 0.1500 × 104
= 0.1500 × 104 , 0.0000 × 104 .
2
That is, the smaller root is zero (correct up to four decimal places), this occur due
to the finite representation of the numbers.
But, it is noted that 0 is not a root of the given equation.
To get the more accurate result, we use the transformation on arithmetic calculation.
The smaller root of the equation is now calculated as follows:
√ √ √
1500 − 15002 − 2 (1500 − 15002 − 2)(1500 + 15002 − 2)
= √
2 2(1500 + 15002 − 2)
2
= √ = 0.0003333.
2(1500 + 15002 − 2)

Hence, the smaller root of the equation is 0.0003333 and it is more closed to the exact
root. The other root is 0.1500 × 104 .
The situation may aries when |4ac|  b2 .

So it is suggested that a care should be taken when nearly two equal numbers are
subtracted. It is done by taking sufficient number of reserve valid digits.

2.3 Representation of numbers in computer

It is mentioned earlier that the numerical methods are used to solve problems using
computer. But, the computer has a limitation to store number either it is an integer
or a real (or floating point) number. Generally, two bytes memory space is used to
9
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

store an integer and four bytes space is used to store a floating point number. Due to
the limitation of space, the rules for arithmetic operations used in mathematics do not
always hold in computer arithmetic.
The representation of a floating point number in computer is different from our con-
ventional technique. In computer representation, the technique is used to preserve the
maximum number of significant digits and increase the range of values of the real num-
bers. This representation is known the normalized floating point mode. In this
representation, the whole number is converted to a proper fraction in such a way that
the first digit after decimal point should be non-zero and is adjusted by multiplying
some power of 10. For example, the number 3876.23 is represented in the normalized
form as .387623 × 104 , and in computer representation it is written as .387623E4 (E4
is used to denote 104 ). It is observed that in normalized floating point representation,
a number has two parts – mantissa and exponent. In this example, .387623 is the
mantissa and 4 is the exponent. According to the representation the mantissa is always
greater than or equal to .1 and exponent is an integer.
To explain the computer arithmetic, in this section, it is assumed that the computer
uses only four digits to store mantissa and two digits for exponent. The mantissa and
the exponent have their own signs. In this assumption, the range of floating point
numbers (magnitudes) is .9999 × 1099 to .1000 × 10−99 .

2.4 Arithmetic of normalized floating point numbers

In this section, the four basic arithmetic operations on normalized floating point
numbers are discussed.

2.4.1 Addition

The addition of two normalized floating point numbers is done by using the following
rules:
(i) If two numbers have same exponent, then the mantissas are added directly and
the exponent of the added number is the either exponent.

(ii) If the exponents are different, then lower exponent is shifted to higher exponent
by adjusting mantissa and then the above rule is used to add them.
10
......................................................................................

All the possible cases are discussed in the following examples.

Example 2.8 Add the following normalized floating point numbers.


(i) .2678E15 and .4876E15 (same exponent)
(ii) .7487E10 and .6712E10 (same exponent)
(iii) .3451E3 and .3218E8 (different exponents)
(iv) .3876E25 and .8541E27 (different exponents)
(v) .8231E99 and .6541E99 (overflow condition)

Solution. (i) Here the exponents are same. So using first rule one can add the numbers
by adding mantissa. Therefore, the sum is .7554E15.

(ii) In this case also, the exponent are equal and in previous case the sum is 1.4199E10.
Notice that the sum contains five significant figures, but it is assumed that the computer
can store only four significant figures. So, the number is shifted right one place before
storing it to the computer memory. To convert it to four significant figures, the exponent
is increased by 1 and the last digit is truncated. Hence, finally the sum is .1419E11.

(iii) For this problem, the exponents are different and the difference is 8 − 3 = 5. The
mantissa of smaller number (low exponent) is shifted 5 places and the number becomes
.0000E8. Now, the numbers have same exponent and hence the final result is .0000E8
+ .3218E8 = .3218E8.

(iv) In this case, the exponents are also different and the difference is 27 − 25 = 2.
So the mantissa of the smaller number (here first number) is shifted right by 2 places
and it becomes .0038E27. Now the sum is .0038E27 + .8541E27 = .8579E27.

(v) This case is different. The exponents are same and the sum is 1.4772E99. Here,
the mantissa has five significant digits, so it is shifted right and the exponent is increased
by 1. Then the exponent becomes 100. Since as per our assumption, the maximum value
of the exponent is 99, so the number is larger than the capacity of the floating number of
the assumed computer. This number cannot store in the computer and this situation is
called an overflow condition. In this case, the computer will generate an error message.

11
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

2.4.2 Subtraction

The subtraction is a special type of addition. In subtraction one positive number is


added with a negative number. The different cases of subtraction are illustrated in the
following examples.
Example 2.9 Subtract the normalized floating point numbers indicated below:
(i) .2832E10 from .8432E10
(ii) .2693E15 from .2697E15
(iii) .2786E–17 from .2134E–16
(iv) .7224E–99 from .7273E–99.

Solution. (i) Here the exponents are equal, and the hence the mantissas are directly
added. Thus, the result is
.8432E10 – .2832E10 = .5600E10.

(ii) Here also the exponents are equal. So the result is .2697E15 – .2693E15 = .0004E15.
The mantissa is not in normalised form. Since the computer always store normalised
numbers, we have to convert it to the normalised number. The normalised number
corresponding to .0004E15 is .4000E12. This is the final answer.

(iii) In these numbers the exponents are different. The number with smaller exponent is
shifted right and the exponent increased by 1 for every right shift. The second number
becomes .0278E–16. Thus the result is .2134E–16 – .0278E–16 = .1856E–16.

(iv) The result is .7273E–99 – .7224E–99=.0049E–99=.4900E–101 (In normalised form).


Note that the number of digits in exponent is 3, but our hypothetical computer can
store only two digits.
In this case, the result is smaller than the smallest number which could be stored in
our computer. This situation is called the underflow condition and the computer will
give an error message.

2.4.3 Multiplication

The multiplication of normalised float point numbers are same as multiplication of


ordinary numbers.
12
......................................................................................

Two normalized floating point numbers are multiplied by multiplying the mantissas
and adding the exponents. After multiplication, the mantissa is converted into nor-
malized floating point form and the exponent is adjusted accordingly. Multiplication is
illustrated in the following examples.
Example 2.10 Multiply the following floating point numbers:
(i) .2198E6 by .5671E12
(ii) .2318E17 by .8672E–17
(iii) .2341E52 by .9231E51
(iv) .2341E–53 by .7652E-51.
Solution. (i) In this case, .2198E6 × .5671E12 = .12464858E18.
Note that the mantissa has 8 significant figures, but as per our computer the result
will be .1246E18 (last four significant figures are truncated).
(ii) Here, .2318E17 × .8672E–17 = .20101696E0 = .2010E0.
(iii) .2341E52 × .9231E51 = .21609771E103.
In this case, the exponent has three digits and it is not allowed in our assumed
computer. The overflow condition occurs, so an error message will generate.
(iv) .2341E–53 × .7652E-51 = .17913332E–104 = .1791E–104 and an error message will
come.

2.4.4 Division

Also, the division of normalised floating point number is similar to division of ordinary
number. Only the difference is that the mantissa retains only four significant digits (as
per our assumed computer) instead of all digits. The quotient mantissa must be written
in the normalized form and the exponent is adjusted accordingly.
Example 2.11 Perform the following divisions
(i) .8765E43 ÷ .3131E21
(ii) .9999E5 ÷ .1452E–99
(iii) .3781E–18 ÷ .2871E94.
Solution. (i) .8765E43 ÷ .3131E21 = 2.7994251038E22 = .2799E23.
(ii) In this case, the number is divided by a small number.
.9999E5 ÷ .1452E–99 = 6.8863636364E104 =.6886E105.
13
. . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation of Errors and Computer Arithmetic

The overflow situation occurs.

(iii) In this case, the number is divided by a large number.


.3781E–18 ÷ .2871E94 = 1.3169627307E–112 = .1316E–111.
As per our computer, underflow condition occurs.

2.5 Effect of normalized floating point arithmetics

Sometimes floating point arithmetics give unpredictable results, due to the truncation
of mantissa. To illustrate this situation, let us consider the following example. It is well
known that 61 ×12 = 2. But, in the case of floating point arithmetic 1
6 = .1667 and hence
1 1
6 × 12 = .1667 × 12 = .2000E1. Also, one can determine the value of 6 × 12 by repeated
addition. Note that .1667 + .1667 + .1667 + .1667 + .1667 + .1667 = 1.0002 =.1000E1,
but .1667 + .1667 + .1667 + · · · 12 times gives 0.1996E1.
Thus, in floating point arithmetics multiplication is not always same as repeated

{z· · · + x} is not true always.


addition, i.e. 12x = |x + x +
12 times
Also, in floating point arithmetics the associative and distributive laws do not hold
always, due to the truncation of mantissa.
That is,
(i) (a + b) + c 6= a + (b + c)
(ii) (a + b) − c 6= (a − c) + b
(iii) a(b − c) 6= ab − ac.
These results are illustrated in the following examples:

(i) Suppose, a =.6889E2, b =.7799E2 and c =.1008E2. Now, a + b =.1468E3


(a + b) + c = .1468E3 + .1008E2 = .1468E3 + .0100E3 = .1568E3.
Again, b + c =.8807E2.
a + (b + c)=.6889E2+.8807E2=.1569E3.
Hence, for this example, (a + b) + c 6= a + (b + c).

(ii) Let a =.7433E1, b =.6327E–1, c =.6672E1.


Then a + b =.7496E1 and (a + b) − c =.7496E1 – .6672E1 = .8240E0.
Again, a − c =.7610E0 and (a − c) + b =.7610E0 + .0632E0 = .8242E0.
Thus, (a + b) − c 6= (a − c) + b.
14
......................................................................................

(iii) Let a =.6683E1, b =.4684E1, c =.4672E1.


b − c =.1200E–1.
a(b − c) =.6683E1 × .1200E–1 = .0801E0 = .8010E–1.
ab =.3130E2, ac =.3122E2.
ab − ac =.8000E–1.
Thus, a(b − c) 6= ab − ac.

From these examples we can think numerical computation is very dangerous. But, it
is not such dangerous, as the actual computer generally stores seven digits as mantissa
(in single precision). The larger length of mantissa gives more accurate result.

2.5.1 Zeros in floating point numbers

There is a definite meaning of zero in mathematics, but, in computer arithmetic exact


equality of a number to zero can never be guaranteed. Because, most of the numbers
in floating point representation are approximate. The behaviour of zero is illustrated in
the following example.

The exact roots of the equation x2 + 2x − 5 = 0 are x = −1 ± 6.
In floating point representation (4 digits mantissa) these are .1449E1 and –.3449E1.
When x =.1449E1, then the left hand side of the equation is
.1449E1 × .1449E1 + .2000E1 × .1449E1 – .5000E1
= .0209E2 + .2898E1 – .5000E1 = .0209E2 + .0289E2 – .0500E2 = –.0002E2.
When x =–.3449E1, then left hand side of the equation is
(–.3449E1) × (–.3449E1) + .2000E1 × (–.3449E1) – .5000E1
= .1189E2 – .6898E1 – .5000E1 = .1189E2 – .0689E2 – .0500E2 = .0000E2, which
is equal to 0.
It is interesting to see that one root perfectly satisfies the equation while other root
does not, though they are roots of the equation. Since .1449E1 is a root, one can say
that 0.02 is a zero. Thus, we can conclude the following:
Note 2.1 There is no fixed value of zero in computer arithmetic like mathematical
calculation. Thus, it is not advisable to give any instruction based on testing whether a
floating point number is zero or not. But, it is suggested that a number is zero if it is
less than a given (very) small number.

15
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]
.

Chapter 1

Numerical Errors

Module No. 3

Operators in Numerical Analysis


......................................................................................

Lot of operators are used in numerical analysis/computation. Some of the frequently


used operators, viz. forward difference (∆), backward difference (∇), central difference
(δ), shift (E) and mean (µ) are discussed in this module.
Let the function y = f (x) be defined on the closed interval [a, b] and let x0 , x1 , . . . , xn
be the n values of x. Assumed that these values are equidistance, i.e. xi = x0 + ih,
i = 0, 1, 2, . . . , n; h is a suitable real number called the difference of the interval or
spacing. When x = xi , the value of y is denoted by yi and is defined by yi = f (xi ). The
values of x and y are called arguments and entries respectively.

3.1 Finite difference operators

Different types of finite difference operators are defined, among them forward dif-
ference, backward difference and central difference operators are widely used. In this
section, these operators are discussed.

3.1.1 Forward difference operator

The forward difference is denoted by ∆ and is defined by

∆f (x) = f (x + h) − f (x). (3.1)

When x = xi then from above equation

∆f (xi ) = f (xi + h) − f (xi ), i.e. ∆yi = yi+1 − yi , i = 0, 1, 2, . . . , n − 1. (3.2)

In particular, ∆y0 = y1 − y0 , ∆y1 = y2 − y1 , . . . , ∆yn−1 = yn − yn−1 . These are called


first order differences.
The differences of the first order differences are called second order differences. The
second order differences are denoted by ∆2 y0 , ∆2 y1 , . . ..
Two second order differences are

∆2 y0 = ∆y1 − ∆y0 = (y2 − y1 ) − (y1 − y0 ) = y2 − 2y1 + y0


∆2 y1 = ∆y2 − ∆y1 = (y3 − y2 ) − (y2 − y1 ) = y3 − 2y2 + y1 .

The third order differences are also defined in similar manner, i.e.

∆3 y0 = ∆2 y1 − ∆2 y0 = (y3 − 2y2 + y1 ) − (y2 − 2y1 + y0 ) = y3 − 3y2 + 3y1 − y0


∆3 y1 = y4 − 3y3 + 3y2 − y1 .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

Similarly, higher order differences can be defined.


In general,

∆n+1 f (x) = ∆[∆n f (x)], i.e. ∆n+1 yi = ∆[∆n yi ], n = 0, 1, 2, . . . . (3.3)

Again, ∆n+1 f (x) = ∆n [f (x + h) − f (x)] = ∆n f (x + h) − ∆n f (x)


and

∆n+1 yi = ∆n yi+1 − ∆n yi , n = 0, 1, 2, . . . . (3.4)

It must be remembered that ∆0 ≡ identity operator, i.e. ∆0 f (x) = f (x) and ∆1 ≡ ∆.


All the forward differences can be represented in a tabular form, called the forward
difference or diagonal difference table.
Let x0 , x1 , . . . , x4 be four arguments. All the forwarded differences of these arguments
are shown in Table 3.1.

x y ∆ ∆2 ∆3 ∆4
x0 y0
∆y0
x1 y1 ∆ 2 y0
∆y1 ∆ 3 y0
x2 y2 ∆ 2 y1 ∆4 y0
∆y2 ∆ 3 y1
x3 y3 ∆ 2 y2
∆y3
x4 y4

Table 3.1: Forward difference table.

3.1.2 Error propagation in a difference table

If any entry of the difference table is erroneous, then this error spread over the table
in convex manner.
The propagation of error in a difference table is illustrated in Table 3.2. Let us
assumed that y3 be erroneous and the amount of the error be ε.
Following observations are noted from Table 3.2.
2
......................................................................................

x y ∆y ∆2 y ∆3 y ∆4 y ∆5 y
x0 y0
∆y0
x1 y1 ∆2 y0
∆y1 ∆3 y0 + ε
x2 y2 ∆2 y1 + ε ∆4 y0 − 4ε
∆y2 + ε ∆3 y1 − 3ε ∆5 y0 + 10ε
x3 y3 + ε ∆2 y2 − 2ε ∆4 y1 + 6ε
∆y3 − ε ∆3 y2 + 3ε ∆5 y1 − 10ε
x4 y4 ∆2 y3 + ε ∆4 y2 − 4ε
∆y4 ∆3 y3 − ε
x5 y5 ∆2 y4
∆y5
x6 y6

Table 3.2: Error propagation in a finite difference table.

(i) The error increases with the order of the differences.

(ii) The error is maximum (in magnitude) along the horizontal line through the erro-
neous tabulated value.

(iii) In the kth difference column, the coefficients of errors are the binomial coefficients
in the expansion of (1 − x)k . In particular, the errors in the second difference
column are ε, −2ε, ε, in the third difference column these are ε, −3ε, 3ε, −ε, and
so on.

(iv) The algebraic sum of errors in any complete column is zero.

If there is any error in a single entry of the table, then we can detect and correct
it from the difference table. The position of the error in an entry can be identified by
performing the following steps.

(i) If at any stage, the differences do not follow a smooth pattern, then there is an
error.
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

(ii) If the differences of some order (it is generally happens in higher order) becomes
alternating in sign then the middle entry contains an error.

Properties

Some common properties of forward difference operator are presented below:

(i) ∆c = 0, where c is a constant.

(ii) ∆[f1 (x) + f2 (x) + · · · + fn (x)]


= ∆f1 (x) + ∆f2 (x) + · · · + ∆fn (x).

(iii) ∆[cf (x)] = c∆f (x).


Combining properties (ii) and (iii), one can generalise the property (ii) as

(iv) ∆[c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x)]


= c1 ∆f1 (x) + c2 ∆f2 (x) + · · · + cn ∆fn (x).

(v) ∆m ∆n f (x) = ∆m+n f (x) = ∆n ∆m f (x) = ∆k ∆m+n−k f (x),


k = 0, 1, 2, . . . , m or n.

(vi) ∆[cx ] = cx+h − cx = cx (ch − 1), for some constant c.

(vii) ∆[x Cr ] = xCr−1 , where r is fixed and h = 1.


∆[x Cr ] = x+1C
r − xCr = xCr−1 as h = 1.

Example 3.1

∆[f (x)g(x)] = f (x + h)g(x + h) − f (x)g(x)


= f (x + h)g(x + h) − f (x + h)g(x) + f (x + h)g(x) − f (x)g(x)
= f (x + h)[g(x + h) − g(x)] + g(x)[f (x + h) − f (x)]
= f (x + h)∆g(x) + g(x)∆f (x).
Also, it can be shown that

∆[f (x)g(x)] = f (x)∆g(x) + g(x + h)∆f (x)


= f (x)∆g(x) + g(x)∆f (x) + ∆f (x)∆g(x).

4
......................................................................................

 
f (x) g(x)∆f (x) − f (x)∆g(x)
Example 3.2 ∆ = , g(x) 6= 0.
g(x) g(x + h)g(x)
 
f (x) f (x + h) f (x)
∆ = −
g(x) g(x + h) g(x)
f (x + h)g(x) − g(x + h)f (x)
=
g(x + h)g(x)
g(x)[f (x + h) − f (x)] − f (x)[g(x + h) − g(x)]
=
g(x + h)g(x)
g(x)∆f (x) − f (x)∆g(x)
= .
g(x + h)g(x)

In particular, when the numerator is 1, then


 
1 ∆f (x)
∆ =− .
f (x) f (x + h)f (x)

3.1.3 Backward difference operator

The symbol ∇ is used to represent backward difference operator. The backward


difference operator is defined as

∇f (x) = f (x) − f (x − h). (3.5)

When x = xi , the above relation reduces to

∇yi = yi − yi−1 , i = n, n − 1, . . . , 1. (3.6)

In particular,

∇y1 = y1 − y0 , ∇y2 = y2 − y1 , . . . , ∇yn = yn − yn−1 . (3.7)

These are called the first order backward differences. The second order differences
are denoted by ∇2 y2 , ∇2 y3 , . . . , ∇2 yn . First two second order backward differences are
∇2 y2 = ∇(∇y2 ) = ∇(y2 − y1 ) = ∇y2 − ∇y1 = (y2 − y1 ) − (y1 − y0 ) = y2 − 2y1 + y0 , and
∇2 y3 = y3 − 2y2 + y1 , ∇2 y4 = y4 − 2y3 + y2 .
The other second order differences can be obtained in similar manner.
5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

In general,

∇k yi = ∇k−1 yi − ∇k−1 yi−1 , i = n, n − 1, . . . , k, (3.8)

where ∇0 yi = yi , ∇1 yi = ∇yi .
Like forward differences, these backward differences can be written in a tabular form,
called backward difference or horizontal difference table.
All backward difference table for the arguments x0 , x1 , . . . , x4 are shown in Table 3.3.

x y ∇ ∇2 ∇3 ∇4
x0 y0
x1 y1 ∇y1
x2 y2 ∇y2 ∇ 2 y2
x3 y3 ∇y3 ∇ 2 y3 ∇ 3 y3
x4 y4 ∇y4 ∇ 2 y4 ∇ 3 y4 ∇4 y4

Table 3.3: Backward difference table.

It is observed from the forward and backward difference tables that for a given table
of values both the tables are same. Practically, there are no differences among the values
of the tables, but, theoretically they have separate significant.

3.1.4 Central difference operator

There is another kind of finite difference operator known as central difference operator.
This operator is denoted by δ and is defined by

δf (x) = f (x + h/2) − f (x − h/2). (3.9)

When x = xi , then the first order central difference, in terms of ordinates is

δyi = yi+1/2 − yi−1/2 (3.10)

where yi+1/2 = f (xi + h/2) and yi−1/2 = f (xi − h/2).


In particular, δy1/2 = y1 − y0 , δy3/2 = y2 − y1 , . . . , δyn−1/2 = yn − yn−1 .
The second order central differences are
δ2y i = δyi+1/2 − δyi−1/2 = (yi+1 − yi ) − (yi − yi−1 ) = yi+1 − 2yi + yi−1 .
6
......................................................................................

In general,

δ n yi = δ n−1 yi+1/2 − δ n−1 yi−1/2 . (3.11)

All central differences for the five arguments x0 , x1 , . . . , x4 is shown in


Table 3.4.

x y δ δ2 δ3 δ4
x0 y0
δy1/2
x1 y1 δ 2 y1
δy3/2 δ 3 y3/2
x2 y2 δ 2 y2 δ 4 y2
δy5/2 δ 3 y5/2
x3 y3 δ 2 y3
δy7/2
x4 y4

Table 3.4: Central difference table.

It may be observed that all odd (even) order differences have fraction suffices (integral
suffices).

3.1.5 Shift, average and differential operators

Shift operator, E:

The shift operator is denoted by E and is defined by

Ef (x) = f (x + h). (3.12)

In terms of y, the above formula becomes

Eyi = yi+1 . (3.13)

Note that shift operator increases subscript of y by one. When the shift operator is
applied twice on the function f (x), then the subscript of y is increased by 2.
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

That is,

E 2 f (x) = E[Ef (x)] = E[f (x + h)] = f (x + 2h). (3.14)

In general,

E n f (x) = f (x + nh) or E n yi = yi+nh . (3.15)

The inverse shift operator can also be find in similar manner. It is denoted by E −1
and is defined by

E −1 f (x) = f (x − h). (3.16)

Similarly, second and higher order inverse operators are defined as follows:

E −2 f (x) = f (x − 2h) and E −n f (x) = f (x − nh). (3.17)

The general definition of shift operator is

E r f (x) = f (x + rh), (3.18)

where r is positive as well as negative rational numbers.

Properties

Few common properties of E operator are given below:

(i) Ec = c, where c is a constant.

(ii) E{cf (x)} = cEf (x).

(iii) E{c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x)]


= c1 Ef1 (x) + c2 Ef2 (x) + · · · + cn Efn (x).

(iv) E m E n f (x) = E n E m f (x) = E m+n f (x).

(v) E n E −n f (x) = f (x).


In particular, EE −1 ≡ I, I is the identity operator and it is some times denoted
by 1.
8
......................................................................................

(vi) (E n )m f (x) = E mn f (x).


 
f (x) Ef (x)
(vii) E = .
g(x) Eg(x)
(viii) E{f (x) g(x)} = Ef (x) Eg(x).

(ix) E∆f (x) = ∆Ef (x).

(x) ∆m f (x) = ∇m E m f (x) = E m ∇m f (x)


and ∇m f (x) = ∆m E −m f (x) = E −m ∆m f (x).

Average operator, µ:

The average operator is denoted by µ and is defined by


1 
µf (x) = f (x + h/2) + f (x − h/2)
2
In terms of y, the above definition becomes
1 
µyi = yi+1/2 + yi−1/2 .
2
Here the average of the values of f (x) at two points (x + h/2) and f (x − h/2) is taken
as the value of µf (x).

Differential operator, D:

The differential operator is well known from differential calculus and it is denoted by
D. This operator gives the derivative. That is,
d
Df (x) = f (x) = f 0 (x) (3.19)
dx
d2
D2 f (x) = 2 f (x) = f 00 (x) (3.20)
dx
······ ···············
dn
Dn f (x) = n f (x) = f n (x). (3.21)
dx

9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

3.1.6 Factorial notation

The factorial notation is a very useful notation in calculus of finite difference. Using
this notation one can find all order differences by the rules used in differential calculus.
It is also very useful and simple notation to find anti-differences. The nth factorial of x
is denoted by x(n) and is defined by

x(n) = x(x − h)(x − 2h) · · · (x − n − 1h), (3.22)

where, each factor is decreased from the earlier by h; and x(0) = 1.


Similarly, the nth negative factorial of x is defined by
1
x(−n) = . (3.23)
x(x + h)(x + 2h) · · · (x + n − 1h)

A very interesting and obvious relation is x(n) .x(−n) 6= 1.


Following results show the similarity of factorial notation and differential operator.
Property 3.1 ∆x(n) = nhx(n−1) .
Proof.

∆x(n) = (x + h)(x + h − h)(x + h − 2h) · · · (x + h − n − 1h)


−x(x − h)(x − 2h) · · · (x − n − 1h)
= x(x − h)(x − 2h) · · · (x − n − 2h)[x + h − {x − (n − 1)h}]
= nhx(n−1) .

Note that this property is analogous to the differential formula D(xn ) = nxn−1 when
h = 1.
The above formula can also be used to find anti-difference (like integration in integral
calculus), as
1 (n)
∆−1 x(n−1) = x . (3.24)
nh

10
......................................................................................

3.2 Relations among operators

Lot of useful and interesting results can be derived among the operators discussed
above. First of all, we determine the relation between forward and backward difference
operators.

∆yi = yi+1 − yi = ∇yi+1 = δyi+1/2


∆2 yi = yi+2 − 2yi+1 + yi = ∇2 yi+2 = δ 2 yi+1

etc.
In general,
∆n yi = ∇n yi+n , i = 0, 1, 2, . . . . (3.25)

There is a good relation between E and ∆ operators.


∆f (x) = f (x + h) − f (x) = Ef (x) − f (x) = (E − 1)f (x).

From this relation one can conclude that the operators ∆ and E − 1 are equivalent.
That is,

∆≡E−1 or E ≡ ∆ + 1. (3.26)

The relation between ∇ and E operators is derived below:


∇f (x) = f (x) − f (x − h) = f (x) − E −1 f (x) = (1 − E −1 )f (x).

That is,
∇ ≡ 1 − E −1 . (3.27)

The expression for higher order forward differences in terms of function values can
be derived as per following way:

∆3 yi = (E − 1)3 yi = (E 3 − 3E 2 + 3E − 1)yi = y3 − 3y2 + 3y1 − y0 .

The relation between the operators δ and E is given below:

δf (x) = f (x + h/2) − f (x − h/2) = E 1/2 f (x) − E −1/2 f (x) = (E 1/2 − E −1/2 )f (x).

That is,
11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

δ ≡ E 1/2 − E −1/2 . (3.28)

The average operator µ is expressed in terms of E and δ as follows:


1 
µf (x) = f (x + h/2) + f (x − h/2)
2
1  1
= E 1/2 f (x) + E −1/2 f (x) = (E 1/2 + E −1/2 )f (x).
2 2
Thus, 1
µ ≡ E 1/2 + E −1/2 .

(3.29)
2

1  1/2 2
µ2 f (x) = E + E −1/2 f (x)
4
1 1
= (E 1/2 − E −1/2 )2 + 4 f (x) = δ 2 + 4 f (x).
 
4 4
Hence,
r
1
1 + δ2.
µ≡ (3.30)
4
Every operator defined earlier can be expressed in terms of other operator(s). Few
more relations among the operators ∆, ∇, E and δ are deduced in the following.

∇Ef (x) = ∇f (x + h) = f (x + h) − f (x) = ∆f (x).


Also,
δE 1/2 f (x) = δf (x + h/2) = f (x + h) − f (x) = ∆f (x).
Thus,

∆ ≡ ∇E ≡ δE 1/2 . (3.31)

There is a very nice relation among the operators E and D, deduced below.
h2 00 h3
Ef (x) = f (x + h) = f (x) + hf 0 (x) + f (x) + f 000 (x) + · · ·
2! 3!
[by Taylor’s series]
h2 h3
= f (x) + hDf (x) + D2 f (x) + D3 f (x) + · · ·
2! 3!
h2 2 h3 3
 
= 1 + hD + D + D + · · · f (x)
2! 3!
= ehD f (x).
12
......................................................................................

Hence,
E ≡ ehD . (3.32)

This result can also be written as


hD ≡ log E. (3.33)

The relation between the operators D and δ is deduced below:

δf (x) = [E 1/2 − E −1/2 ]f (x) = ehD/2 − e−hD/2 f (x)


 
 hD 
= 2 sinh f (x).
2
Thus,  hD   hD 
δ ≡ 2 sinh . Similarly, µ ≡ cosh . (3.34)
2 2
Again,  hD   hD 
µδ ≡ 2 cosh sinh = sinh(hD). (3.35)
2 2
This relation gives the inverse result,

hD ≡ sinh−1 (µδ). (3.36)

From the relation (3.33) and using the relations E ≡ 1 + ∆ and E −1 ≡ 1 − ∇ we


obtained,

hD ≡ log E ≡ log(1 + ∆) ≡ − log(1 − ∇) ≡ sinh−1 (µδ). (3.37)

Some of the operators are commutative with other operators. For example, µ and E
are commutative, as
1 
µEf (x) = µf (x + h) = f (x + 3h/2) + f (x + h/2) ,
2
and
h1 i 1  
Eµf (x) = E f (x + h/2) + f (x − h/2) = f (x + 3h/2) + f (x + h/2) .
2 2
Hence,
µE ≡ Eµ. (3.38)

13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

Example 3.3 Prove the following relations.


(i) (1 + ∆)(1 − ∇) ≡ 1
 hD 
(ii) µ ≡ cosh
2
∆+∇
(iii) µδ ≡
2
(iv) ∆∇ ≡ ∇∆ ≡ δ 2
∆E −1 ∆
(v) µδ ≡ +
2 2
1/2 δ
(vi) E ≡µ+
2
δ2 2

2 2
(vvi) 1 + δ µ ≡ 1 +
r 2
δ 2 δ2
(viii) ∆ ≡ +δ 1+ .
2 4

Solution. (i) (1 + ∆)(1 − ∇)f (x) = (1 + ∆)[f (x) − f (x) + f (x − h)]


= (1 + ∆)f (x − h) = f (x − h) + f (x) − f (x − h)
= f (x).
Therefore,

(1 + ∆)(1 − ∇) ≡ 1. (3.39)

(ii)
1 1
µf (x) = [E 1/2 + E −1/2 ]f (x) = ehD/2 + e−hD/2 f (x)

2 2
 hD 
= cosh f (x).
2
(iii)  
∆+∇ 1
f (x) = [∆f (x) + ∇f (x)]
2 2
1
= [f (x + h) − f (x) + f (x) − f (x − h)]
2
1 1
= [f (x + h) − f (x − h)] = [E − E −1 ]f (x)
2 2
= µδf (x) (as in previous case).

Thus,
∆+∇
µδ ≡ . (3.40)
2
14
......................................................................................

(iv) ∆∇f (x) = ∆[f (x) − f (x − h)] = f (x + h) − 2f (x) + f (x − h).


Again,

∇∆f (x) = f (x + h) − 2f (x) + f (x − h) = (E − 2 + E −1 )f (x)


= (E 1/2 − E −1/2 )2 f (x) = δ 2 f (x).

Hence,
∆∇ ≡ ∇∆ ≡ (E 1/2 − E −1/2 )2 ≡ δ 2 . (3.41)

(v)
∆E −1 ∆
 
1
+ f (x) = [∆f (x − h) + ∆f (x)]
2 2 2
1
= [f (x) − f (x − h) + f (x + h) − f (x)]
2
1 1
= [f (x + h) − f (x − h)] = [E − E −1 ]f (x)
2 2
1 1/2
= (E + E −1/2 )(E 1/2 − E −1/2 )f (x)
2
= µδf (x).

Hence
∆E −1 ∆
+ ≡ µδ. (3.42)
2 2
   
δ 1 1/2 −1/2 1 1/2 −1/2
(vi) µ + f (x) = [E + E ] + [E − E ] f (x) = E 1/2 f (x).
2 2 2
Thus
δ
E 1/2 ≡ µ + . (3.43)
2
(vii) δµf (x) = 12 (E 1/2 + E −1/2 )(E 1/2 − E −1/2 )f (x) = 12 [E − E −1 ]f (x).
Therefore,
 
1
(1 + δ 2 µ2 )f (x) = 1 + (E − E −1 )2 f (x)
4
 
1 1
= 1 + (E 2 − 2 + E −2 ) f (x) = (E + E −1 )2 f (x)
4 4
2
δ2 2
  
1 1/2 −1/2 2
= 1 + (E − E ) f (x) = 1 + f (x).
2 2

15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

Hence 2
δ2

1 + δ 2 µ2 ≡ 1+ . (3.44)
2

(viii)  r
δ2 δ2

+δ 1+ f (x)
2 4
 r 
1 1/2 −1/2 2 1/2 −1/2 1 1/2 −1/2 2
= (E − E ) f (x) + (E − E ) 1 + (E − E ) f (x)
2 4
1 1
= [E + E −1 − 2]f (x) + (E 1/2 − E −1/2 )(E 1/2 + E −1/2 )f (x)
2 2
1 1
= [E + E − 2]f (x) + (E − E −1 )f (x)
−1
2 2
= (E − 1)f (x).

Hence, δ2
r
δ2
+δ 1+ ≡ E − 1 ≡ ∆. (3.45)
2 4

In Table 3.5, it is shown that any operator can be expressed with the help of another
operator.

E ∆ ∇ δ hD
r
δ2 δ2
E E ∆+1 (1 − ∇)−1 1+
+ δ 1+ ehD
2 r 4
δ2 δ2
∆ E−1 ∆ (1 − ∇)−1 − 1 +δ 1+ ehD − 1
2 r 4
δ2 δ2
∇ 1 − E −1 1 − (1 + ∆)−1 ∇ − +δ 1+ 1 − e−hD
2 4
δ E 1/2−E −1/2 ∆(1 + ∆)−1/2 ∇(1 − ∇)−1/2 δ 2 sinh(hD/2)
E 1/2+E −1/2 δ 2
µ (1 + ∆/2) (1−∇/2)(1−∇)−1/2 1+ cosh(hD/2)
2 4
×(1 + ∆)−1/2
hD log E log(1 + ∆) − log(1 − ∇) 2 sinh−1 (δ/2) hD

Table 3.5: Relationship between the operators.

From earlier discussion we noticed that there is an approximate equality between ∆


operator and derivative. These relations are presented below.
16
......................................................................................

By the definition of derivative,


f (x + h) − f (x) ∆f (x)
f 0 (x) = lim = lim .
h→0 h h→0 h
Thus, ∆f (x) ' hf 0 (x) = hDf (x).
Again,
f 0 (x + h) − f 0 (x)
f 00 (x) = lim
h→0 h
∆f (x + h) ∆f (x)

' lim h h
h→0 h
∆f (x + h) − ∆f (x) ∆2 f (x)
= lim = lim .
h→0 h2 h→0 h2
Hence, ∆2 f (x) ' h2 f 00 (x) = h2 D2 f (x).
In general, ∆n f (x) ' hn f n (x) = hn Dn f (x). That is, for small values of h, the
operators ∆ and hD are almost equal.

3.3 Polynomial using factorial notation

According to the definition of factorial notation, one can write

x(0) = 1
x(1) = x
x(2) = x(x − h) (3.46)
x(3) = x(x − h)(x − 2h)
x(4) = x(x − h)(x − 2h)(x − 3h)

and so on.
From these equations it is obvious that the base terms (x, x2 , x3 , . . .) of a polynomial
can be expressed in terms of factorial notations x(1) , x(2) , x(3) , . . ., as shown below.

1 = x(0)
x = x(1)
(3.47)
x2 = x(2) + hx(1)
x3 = x(3) + 3hx(2) + h2 x(1)
x4 = x(4) + 6hx(3) + 7h2 x(2) + h3 x(1)
17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

and so on.
Note that the degree of xk (for any k = 1, 2, 3, . . .) remains unchanged while expressed
it in factorial notation. This observation leads to the following lemma.
Lemma 3.1 Any polynomial f (x) in x can be expressed in factorial notation with same
degree.
Since all the base terms of a polynomial are expressed in terms of factorial notation,
every polynomial can be written with the help of factorial notation. Once a polynomial
is expressed in a factorial notation, then its differences can be determined by using the
formula like differential calculus.

Example 3.4 Express f (x) = 10x4 − 41x3 + 4x2 + 3x + 7 in factorial notation and
find its first and second differences.
Solution. For simplicity, we assume that h = 1.
Now by (3.47), x = x(1) , x2 = x(2) + x(1) , x3 = x(3) + 3x(2) + x(1) ,
x4 = x(4) + 6x(3) + 7x(2) + x(1) .
Substituting these values to the function f (x) and we obtained

f (x) = 10 x(4) + 6x(3) + 7x(2) + x(1) − 41 x(3) + 3x(2) + x(1) + 4 x(2) + x(1) + 3x(1) + 7
     

= 10x(4) + 19x(3) − 49x(2) − 24x(1) + 7.

Now, the relation ∆x(n) = nx(n−1) (Property 3.1) is used to find the first and second
order differences. Therefore,
∆f (x) = 10.4x(3) + 19.3x(2) − 49.2x(1) − 24.1x(0) = 40x(3) + 57x(2) − 98x(1) − 24
= 40x(x − 1)(x − 2) + 57x(x − 1) − 98x − 24 = 40x3 − 63x2 − 75x − 24
and ∆2 f (x) = 120x(2) + 114x(1) − 98 = 120x(x − 1) + 114x − 98 = 120x2 − 6x − 98.
The above process to convert a polynomial in a factorial notation is a very labourious
task when the degree of the polynomial is large. There is a systematic method, similar to
Maclaurin’s formula in differential calculus, is used to convert a polynomial in factorial
notation. This technique is also useful for a function which satisfies the Maclaurin’s
theorem for infinite series.
Let f (x) be a polynomial in x of degree n. We assumed that in factorial notation
f (x) is of the following form

f (x) = a0 + a1 x(1) + a2 x(2) + · · · + an x(n) , (3.48)


18
......................................................................................

where ai ’s are unknown constants to be determined and an 6= 0.


Now, we determine the different differences of (3.48) as follows.

∆f (x) = a1 + 2a2 x(1) + 3a3 x(2) + · · · + nan x(n−1)


∆2 f (x) = 2.1a2 + 3.2a3 x(1) + · · · + n(n − 1)an x(n−2)
∆3 f (x) = 3.2.1a3 + 4.3.2.x(1) + · · · + n(n − 1)(n − 2)an x(n−3)
······ ··· ·············································
∆n f (x) = n(n − 1)(n − 2) · · · 3 · 2 · 1an = n!an .

Substituting x = 0 to the above relations and we obtained


a0 = f (0), ∆f (0) = a1 ,
∆2 f (0)
∆2 f (0) = 2.1.a2 or, a2 =
2!
3 f (0)

∆3 f (0) = 3.2.1.a3 or, a3 =
3!
·················· ······ ···············
∆n f (0)
∆n f (0) = n!an or, an = .
n!
Using these results equation (3.48) transferred to

∆2 f (0) (2) ∆3 f (0) (3) ∆n f (0) (n)


f (x) = f (0) + ∆f (0)x(1) + x + x + ··· + x .
2! 3! n!
(3.49)

Observed that this formula is similar to Maclaurin’s formula of differential calculus.


This formula can also be used to expand a function in terms of factorial notation. To
expand a function in terms of factorial notation different forward differences are needed
at x = 0. These differences can be determined using the forward difference table and
the entire method is explained with the help of the following example.

Example 3.5 Express f (x) = 15x4 − 3x3 − 6x2 + 11 in factorial notation.

Solution. Let h = 1. For the given function, f (0) = 11, f (1) = 17, f (2) = 203, f (3) =
1091, f (4) = 3563.
19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

x f (x) ∆f (x) ∆2 f (x) ∆3 f (x) ∆4 f (x)


0 11
6
1 17 180
186 522
2 203 702 360
888 882
3 1091 1584
2472
4 3563
Thus by formula (3.49) ∆2 f (0) (2) ∆3 f (0) (3) ∆4 f (0) (4)
f (x) = f (0) + ∆f (0)x(1) + x + x + x
2! 3! 4!
= 15x(4) + 87x(3) + 90x(2) + 6x(1) + 11.

There is another method to find the coefficients of a polynomial in factorial notation,


presented below.

Example 3.6 Find f (x), if ∆f (x) = x4 − 10x3 + 11x2 + 5x + 3.


Solution. The synthetic division is used to express ∆f (x) in factorial notation.
1 1 −10 11 5 3
1 −9 2
2 1 −9 2 7
2 −14
3 1 −7 −12
3
4 1 −4
1
Therefore, ∆f (x) = x(4) − 4x(3) − 12x(2) + 7x(1) + 3.
Hence,
1 4 12 7
f (x) = x(5) − x(4) − x(3) + x(2) + 3x(1) + c, [using Property 1]
5 4 3 2
1
= x(x − 1)(x − 2)(x − 3)(x − 4) − x(x − 1)(x − 2)(x − 3)
5
7
−4x(x − 1)(x − 2) + x(x − 1) + 3x + c, where c is arbitrary constant.
2
20
......................................................................................

3.4 Difference of a polynomial

Let f (x) = a0 xn + a1 xn−1 + · · · + an−1 x + an be a polynomial in x of degree n, where


ai ’s are the given coefficients.
Suppose, f (x) = b0 x(n) + b1 x(n−1) + b2 x(n−2) + · · · + bn−1 x(1) + bn be the same
polynomial in terms of factorial notation. The coefficients bi ’s can be determined by
using any method discussed earlier.
Now,
∆f (x) = b0 nhx(n−1) + b1 h(n − 1)x(n−2) + b2 h(n − 2)x(n−3) + · · · + bn−1 h.
Clearly this is a polynomial of degree n − 1.
Similarly,

∆2 f (x) = b0 n(n − 1)h2 x(n−2) + b1 (n − 1)(n − 2)h2 x(n−3) + · · · + bn−2 h2 ,


∆3 f (x) = b0 n(n − 1)(n − 2)h3 x(n−3) + b1 (n − 1)(n − 2)(n − 3)h3 x(n−4)
+ · · · + bn−3 h3 .

In this way, ∆k f (x) = b0 n(n − 1)(n − 2) · · · (n − k + 1)hk x(n−k) .


Thus finally,
∆k f (x), k < n is a polynomial of degree n − k,
∆n f (x) = b0 n!hn = n!hn a0 is constant, and
∆k f (x) = 0, if k > n.

In particular, ∆n+1 f (x) = 0.

Example 3.7 Let ui (x) = (x − x0 )(x − x1 ) · · · (x − xi ), where xi = x0 + ih, i =


0, 1, 2, . . . , n; h > 0. Prove that
∆k ui (x) = (i + 1)i(i − 1) · · · (i − k + 2)hk (x − x0 )(x − x1 ) · · · (x − xi−k ).

Solution. Let ui (x) = (x − x0 )(x − x1 ) · · · (x − xi ) be denoted by (x − x0 )(i+1) .


21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators in Numerical Analysis

Therefore,

∆ui (x) = (x + h − x0 )(x + h − x1 ) · · · (x + h − xi ) − (x − x0 ) · · · (x − xi )


= (x + h − x0 )(x − x0 )(x − x1 ) · · · (x − xi−1 )
−(x − x0 )(x − x1 ) · · · (x − xi )
= (x − x0 )(x − x1 ) · · · (x − xi−1 )[(x + h − x0 ) − (x − xi )]
= (x − x0 )(x − x1 ) · · · (x − xi−1 )(h + xi − x0 )
= (x − x0 )(x − x1 ) · · · (x − xi−1 )(i + 1)h [since xi = x0 + ih]
= (i + 1)h(x − x0 )(i) .

By similar way,

∆2 ui (x) = (i + 1)h[(x + h − x0 )(x + h − x1 ) · · · (x + h − xi−1 )


−(x − x0 )(x − x1 ) · · · (x − xi−1 )]
= (i + 1)h(x − x0 )(x − x1 ) · · · (x − xi−2 )[(x + h − x0 ) − (x − xi−1 )]
= (i + 1)h(x − x0 )(i−1) ih
= (i + 1)ih2 (x − x0 )(i−1) .

Also, ∆3 ui (x) = (i + 1)i(i − 1)h3 (x − x0 )(i−2) .


Hence, in this way

∆k ui (x) = (i + 1)i(i − 1) · · · (i − k − 2)hk (x − x0 )(i−k−1)


= (i + 1)i(i − 1) · · · (i − k + 2)hk (x − x0 )(x − x1 ) · · · (x − xi−k ).

22
NON LINEAR EQUATIONS
A problem that most students should be familiar
with from ordinary algebra is that of finding the
root of an equation f(x) = 0, i.e., the value of the
argument that makes f zero.
More precisely, if the function is defined as y =
f(x), we seek the value  such that f (  ) = 0.
The precise terminology is that  is a zero of the
function f , or a root of the equation f(x) = 0
Note that it have not yet specified what kind of
function f is. The obvious case is when f is an
ordinary real-valued function of a single real
variable x, but we can also consider the problem
when f is a vector-valued function of a vector-
valued variable, in which case the expression
above is a system of equations.
Broadly it can be classified:
• Bracketing method
• Open end methods
Bracketing methods comprises of:
• Bisection method
• Regula falsi or False position method
Open end methods
• Newton Raphson method
• Secant method
• Muller’s method
• Fixed point method
• Bairstow’s method
• Ramanujan’s Method
• Graeffe’s Root-squaring Method
• Quotient–difference Method
BISECTION METHOD
It is based on theorem that if function f(x) is
continuous between a and b and f(a) and f (b)
are of opposite signs then there must be at least
one root.
Algorithm
Choose two point a and b such that f(a)f(b) < 0.
This means that f is negative at one point and
positive at the other.
Let c be the midpoint of the interval [a, b], i.e.,
c = 1/2*(a + b) and consider the product f(a)f(c).
There are three possibilities:
1. f(a)f(c) < 0; this means that a root (there
might be more than one) is between a
and c, i.e.,  ε [a,c].
2. f(a)f(c) = 0; if we assume that we already
know f(a) 0 this means that f(c) = 0, thus  =
c and we have found a root.
3. f(a)f(c) > 0; this means that a root must lie
in the other half of the interval, i.e., a ε [c,b].
At first glance, this is helpful only if we get
the second case and land right on top of a
root, and this does not seem very likely.
However, a second look reveals that if (1) or
(3) hold, we now have a root localized to an
interval ([a, c] or [c, b]) that is half the
length of the original interval [a, b]. If we
now repeat the process, the interval of
uncertainty is again decreased in half, and so
on, until we have the root localized to within
any tolerance we desire.
EXAMPLE 3.1
If f(x) = 2 - ex, and we take the original
interval to be [a, b] = [0,1], then the first
several steps of the computation are as
follows:
f(a) = 1, f(b) = -0.7183  c =[0 + l]/2 = 1/2;
f(c) = 0.3513 > 0
 [a,b]  [1/2,1];
f(a) = 0.3513, f(b) = -0.7183  c = [1/2+l]/2
= 3/4; f(c) =-0.1170 < 0
 [a,b]  [1/2,3/4];
f(a) = 0.3513, f(b) = -0.1170 = c = [1/2 +
3/4]/2 = 5/8; f(c) = 0.1318 > 0
 [a,b] [5/8,3/4].
Thus, we have reduced the "interval of
uncertainty" from [0,1], which has length 1,
to [5/8,3/4], which has length of 1/8 = 0.125.
If we were to continue the process we would
eventually have the root localized to within
an interval of length as small as we want,
since each step cuts the interval of
uncertainty in half.
Bisection Convergence and Error:
Let [a0, b0] = [a, b] be the initial interval,
with f(a)f(b) <0. Define the approximate
root as
xn = cn = (bn-1 + an-1)/2.
Then there exists a root  ε [a, b] such that
|  — xn|  (1/2)n (b-a)
Moreover, to achieve an accuracy of
|  — xn|  ε
log  b  a   log 
n
log 2
(bn- an) = ½* ( bn-1 – an-1)
(bn- an) = (½)n * ( b0 – a0)
|  — xn|  ½ * ( bn-1 – an-1)
= ½ (½)n-1 * ( b0 – a0)
= (½)n * ( b0 – a0)
False Position / Regula Falsi Method
• It is also termed as linear interpolation
method.
• Bracketing method.
Straight line is drawn from A (a1, f(a1)) to B
(b1, f(b1)) the point of intersection with
abscissa at c1 is improved estimate of the
root.
• If f(a1)f(c1) < 0, then [a1, c1] brackets the
root. Otherwise, the root is in [c1, b1]. In
Figure it just so happens that [a1, c1]
brackets the root. This means the left end
is unchanged, while the right end is
adjusted to c1. Therefore, the interval that
is used in the next iteration is [a2, b2]
where a2 = a1 and b2 = c1.Continuing this
process generates a sequence c2, c3, …
The equation of the line connecting points A
and B is f (b1 )  f (a1 )
y  f (b1 )  ( x  b1 )
b1  a1
To find the x-intercept, set y = 0 and solve
for x = c1:
b1  a1 simplify
a1 f b1   b1 f a1 
c1  b1  f (b1 ) 
f (b1 )  f (a1 ) f (b1 )  f (a1 )

Generalizing this result, the sequence of


points that converges to the root is generated
via
an f  bn   bn f  a n 
cn 
f  bn   f  an 
for n=1,2,3,….
That is, the terminating condition is
|cn+1 − cn| < ε, where ε is the imposed
tolerance.
Example:
Reconsider the equation x cos x + 1 = 0 and
the interval [−2, 4] that contains its root.
Letting f(x) = x cos x + 1, we have f(−2) > 0
and f(4) < 0. We will perform two steps of
the regula falsi method. First, set [a1, b1] =
[−2, 4]. Then, c  a f  b   b f  a  =1.189493
1
1 1 1 1

f (b1 )  f (a1 )
Since f(c1) > 0, the root must lie in [c1, b1]
so that the left endpoint is adjusted to a2 = c1
and the right end remains unchanged, b2 =
b1. Therefore, the updated interval is
[a2, b2] = [1.189493, 4]. Next,
a2 f  b2   b1 f  a2 
c2 
f (b2 )  f ( a2 )
= 2.515720
•Newton’s Method
• Open solution, that requires only one
current guess.
• Root does not need to be bracketed.
• Consider some point x0.
• If we approximate f(x) as a line about x0,
then we can again solve for the root of
the line.

f ( x )  f ( x0 )( x  x0 )  f ( x0 )
Newton’s Method
• Solving, leads to the following iteration:

f ( x)  0
f ( x0 )
x1  x0 
f ( x0 )
f ( xi )
xi 1  xi 
f ( xi )
•Newton’s Method
• This can also be seen from Taylor’s
Series.
• Assume we have a guess, x0, close to the
actual root. Expand f(x) about this point.
x  xi  x
x 2
f ( xi  x)  f ( xi )  xf ( xi )  f ( xi )    0
2!
• If dx is small, then dxn quickly goes to
zero.
f ( xi )
x  xi 1  xi  
f ( xi )
Newton’s Method
• Graphically, follow the tangent vector
down to the xx-axis intersection.
•Newton’s Method
• Problems
•Newton’s Method
• Need the initial guess to be close, or, the
th
function to behave nearly linear within
the range.
Finding a square-root
• Ever wonder why they call this a square-
root?
• Consider the roots of the equation:
• f(x) = x2-a
• This of course works for any power:

p
a  x  a  0, p  R
p
Finding a square-root
• Example:
2=1.4142135623730950488016887242
097
• Let x0 be one and apply Newton’s
method.
f ( x)  2 x
xi  2 1  2
2

xi 1  xi    xi  
2 xi 2 xi 
x0  1
1 2 3
x1   1     1.5000000000
2 1 2
1  3 4  17
x2       1.4166666667
2  2 3  12
Finding a square-root
• Example: 2 =
1.4142135623730950488016887242097
• Note the rapid convergence
1  17 24  577
x3       1.414215686
2  12 17  408
x4  1.4142135623746

x5  1.4142135623730950488016896

x6  1.4142135623730950488016887242097
• Note, this was done with the standard
Microsoft calculator to maximum
precision.
Finding a square-root
• Can we come up with a better initial
guess?
• Sure, just divide the exponent by 2.
• Remember the bias offset
• Use bit-masks to extract the exponent to
an integer, modify and set the initial
guess.
• For 2, this will lead to x0 = 1 (round
down).
Convergence Rate of Newton’s

en  x  xn or x  xn  en
0  f ( x )  f ( xn  en )
1
f ( xn  en )  f ( xn )  en f ( xn )  en2 f ( n ), for some  n   x , xn 
2
1
 f ( xn )  en f ( xn )   en2 f ( n )
2

• Now,

f ( xn ) f ( xn )
en 1  x  xn 1  x  xn   en 
f ( xn ) f ( xn )
en f ( xn )  f ( xn )

f ( xn )
1  f ( n )  2
 en 1     en
2  f ( xn ) 
Convergence Rate of Newton’s

Converges quadratically.

k
if en  10 then,
2 k
en 1  c10
Newton’s Algorithm
• Requires the derivative function to be
evaluated, hence more function
evaluations per iteration.
• A robust solution would check to see if
the iteration is stepping too far and limit
the step.
• Most uses of Newton’s method assume
the approximation is pretty close and
apply one to three iterations blindly.
Division by Multiplication

• Newton’s method has many uses in


computing basic numbers.
• For example, consider the equation:

1
a 0
x
• Newton’s method gives the iteration:

1
a
xk
xk 1  xk   xk  xk  axk2
1
 2
xk
 xk  2  axk 
Reciprocal Square Root

• Another useful operator is the reciprocal-


square root.
• Needed to normalize vectors
• Can be used to calculate the square-root

1
a  a
a
Reciprocal Square Root

1
Let f ( x )  2
a  0
x
2
f ( x)  
x3

Newton’s iteration yields:

xk xk3
xk 1  xk   a
2 2
1
 xk  3  axk2 
2
1/Sqrt(2)
• Let’s look at the convergence for the
reciprocal square
square-root of 2.
1/Sqrt(x)

• What is a good choice for the initial seed


point?
– Optimal – the root, but it is unknown
– Consider the normalized format of
the number:

 1
s e 127
2  (1.m) 2
– What is the reciprocal?
– What is the square-root?
1/Sqrt(x)
• Theoretically,

• Current GPU’s provide this operation in


as little as 2 clock cycles!!! How?
• How many significant bits does this
estimate have?
1/Sqrt(x)

GPU’s such as nVidia’s FX cards provide a


23-bit accurate reciprocal square-root in two
clock cycles, by only doing 2 iterations of
Newton’s method.
Need 24-bits of precision =>
Previous iteration had 12-bits of precision
Started with 6-bits of precision
1/Sqrt(x)

Examine the mantissa term again (1.m).


Possible patterns are:
1.000…, 1.100…, 1.010…, 1.110…, …
Pre-compute these and store the results in a
table. Fast and easy table look-up.
A 6-bit table look-up is only 64 words of on
chip cache.
Note, we only need to look-up on m, not
1.m.
This yields a reciprocal square-root for the
first seven bits, giving us about 6-bits of
precision.
1/Sqrt(x)

Slight problem:
The 1.m produces a result between 1 and 2.
Hence, it remains normalized, 1.m’.
1
For , we get a number between ½ and 1.
x
Need to shift the exponent.
Secant Method
What if we do not know the derivative of
f(x)?
Secant Method

As we converge on the root, the secant line


approaches the tangent.
Hence, we can use the secant line as an
estimate and look at where it intersects the
x-axis (its root).
Secant Method

This also works by looking at the definition


of the derivative:
f ( x  h)  f ( x )
f ( x)  lim
h 0 h
f ( xk )  f ( xk 1 )
f ( xk ) 
xk  xk 1

Therefore, Newton’s method gives:

 xk  xk 1 
xk 1  xk    f ( xk )
 f ( xk )  f ( xk 1 ) 
Which is the Secant Method.
Convergence Rate of Secant
Using Taylor’s Series, it can be shown

(proof is in the book) that:

ek 1  x  xk 1
1 f ( k ) 
   ek ek 1  c  ek ek 1
2 f ( k ) 
Convergence Rate of Secant
This is a recursive definition of the error
term. Expressed out, we can say that:


ek 1  C ek
Where =1.62.
We call this super-linear convergence.
Convergence Rate of Secant

Consider x cos x + 1 = 0.

Using secant method with x1 = 1 and x2 =

1.5, calculate x3 and x4 of the sequence that

eventually converges to the root


x2  x1 1.5  1 0.5
x3  x2   1.5   1.5  1.106106  2.7737
f ( x2 )  f ( x1 ) f 1.5  f 1 1.106106  1.540302

x3  x2
x4  x3  f  x2   2.0229
f ( x3 )  f ( x2 )
Fixed - Point Iteration / Iteration Method/
Method of Successive Substitution
Function f(x) = 0
May be written as f(x) = 0 = x – φ(x)
(x)
Hence x = φ(x)
(x)
xk+1 = φ(xk)
Many problems also take on the specialized
form: g(x)=x, where we seek, x, that
satisfies this equation.
Fixed-Point Iteration

• Newton’s iteration and the Secant


method are of course in this form.
• In the limit, f(xk)=0, hence xk+1=xk
• There are many ways of transforming it
• eg x3 + x2 – 2 = 0 can be written as
either way of the following
1
2
x , x  2  x3 , x   2  x 2  3
1 x
• x1= φ(x0),---xk+1 = φ(xk)
• The preceding sequence may not
converge to a definite number.
• If the sequence converge to definite
number ξ then it will be root of the
equation x = φ(x)
• Let xn+1 = φ(xn)
• As n increases xn+1 → ξ
• φ(xn) → φ(ξ)
• Hence ξ is a root then ξ = φ(ξ)
• Since x1 = φ(x0)
• ξ – x1 = φ(ξ) - φ(x0) = (ξ – x0) φ’(ξ0)
• x 0< ξ 0 < ξ
• ξ – x2 = (ξ – x1) φ’(ξ1) x 1< ξ 1 < ξ
• ξ – x2 = (ξ – x1) φ’(ξ1) x 1< ξ 1 < ξ
• ξ – x3 = (ξ – x2) φ’(ξ2) x 2< ξ 2 < ξ
• ξ – xn+1 = (ξ – xn) φ’(ξn) x 1< ξ 1 < ξ
If we assume | φ’(ξi) | ≤ k for all I
Then the above equation yields
  x1  k   x0 

  x2  k   x1 

  x3  k   x2 
...   xn 1  k   xn 
From the above equation we get
| ξ – xn+1 | ≤ kn+1 | ξ – x0 |
For k <1 i.e. | φ’(ξi) | RHS of the above
equation tends to zero and sequence of
approximation x0, x1, x2, … converges to
root ξ
If we express the equation f(x)= 0 in the
form x=  (x) then  (x) must be such that
| ’ (x) | < 1 in the immediate neighbor
hood of the root.
It means that initial approximation x0 is
chosen in an interval containing the root ξ
the sequence of approximations converge to
the root ξ .
Let us prove the root be unique.
Let ξ1 and ξ2 be two roots of equation x= 
(x)
Then ξ1 =  (ξ1) and ξ2 =  (ξ2)
Hence | ξ1 - ξ2| = |  (ξ1) -  (ξ2) |
= | ξ1 - ξ2| ’ ( ), ε (ξ1 ,ξ2)
Hence | ξ1 - ξ2| |1 - ’ ( )| = 0
Since | ’ ( ) | <1 which means ξ1 = ξ2 i.e.
roots are unique
Errors in the roots obtained:
| ξ –xn | ≤ k | ξ –xn-1 |
= k | ξ - xn + xn–xn-1 |
≤ k [| ξ –xn | + |xn – xn-1|]
 | ξ –xn | ≤ k/(k-1) |xn – xn-1| = k/(k-1)
kn-1 |x1 – x0|
≤ kn /(k-1) |x1 – x0|
It shows convergence shall be faster for
smaller k
Let ε be the specified accuracy
| ξ –xn | ≤ ε
The above equation gives |xn – xn-1| ≤ (1-k)/kε
Acceleration of Convergence: Aitken’s ∆2
process
From the relation
| ξ –xn+1 | = |  (ξ) -  (xn) | ≤ k | ξ –xn |
k<1
Means it is linearly convergent.
Slow rate of convergence can be accelerated
using Aitken method.
Let xi-1, xi and xi+1 be three successive
approximation to the root of the equation x=
(x)
ξ- xi = k (ξ- xi-1 ), ξ- xi+1 = k (ξ- xi )
This on simplification yields
  xi   xi 1 x  xi 
2
   xi 1  i 1
  xi 1   xi xi 1  2 xi  xi 1
Denominator is ∆2 xi-1
Hence above equation is written as

  xi 1 
x i
2

2 xi i
MULLER’S METHOD
• Extension of Secant Method
• Quadratic curve from three points
(x1, f(x1)), (x2, f(x2)) and (x3, f(x3)) .
• points x4 as one of the roots is taken
as next approximation
• P(x) = a0 + a1 (x-c) + a2 (x-c)2
• For c= x3 and x= x4 assuming it to be
the root
• a2 (x4-x3)2 + a1 (x4- x3) +a0 = 0
• (x4- x3) = (-2a0 )/( a1 ± (a12- 4 a2 a0)1/2)
• This formula chosen for quadratic
equation to avoid subtractive
cancellation
• at x = x1, x2 and x3
• a2 (x1-x3)2 + a1 (x1- x3) + a0 = p(x1) = f
(x1)
• a2 (x2-x3)2 + a1 (x2- x3) + a0 = p(x2) = f
(x2)
• a2 (x3-x3)2 + a1 (x3- x3) + a0 = p(x3) = f
(x3)
• Let h1 = x1- x3, h2 = x2-x3 and fi = f(xi)
• The above equation with these
modifications
• a2 h12 +a2 h1 + a0 = f1
2
• a2 h2 +a2 h2 + a0 = f2
• 0 +0 + a0 = f3
• Since a0 = f3 we can obtain a1 and a2
• a2 h12 +a2 h1 + a0 = f1 – f3 = d1
• a2 h22 +a2 h2 + a0 = f2 - f3 = d2
• This results in
d 2 h12  d1h22
a1 
h1h2  h1  h 2 
d1h2  d 2 h1
a2 
h1h2  h1  h 2 
2a0
h4 
a1  a12  4a2 a0
• x4 = x3 + h4
• Sign is chosen such a way the
magnitude of denominator is small
• Now x2 x3 and x4 is taken as initial
guess to calculate h5 hence x5
• Example:
• Solve the leonardo equation
3 2
• f(x) = x +2x + 10x – 20 =0 by
Muller’s method
• Sol:
• Three starting guess x1 = 0, x2 = 1, x3
=2
• f1 = -20
• f2 = -7
• f3 =16
• h1 = x1- x3 = -2
• h2 = x2 – x3 =-1
• d1 = f1 – f3 = -36
• d2 = f2 – f3 = -23
• D = h1 h2 (h1- h2)
• D = 2(-2 + 1)= -2
2 2
• a1 = [(-23)(-2) – (-36)(-1) ]/-2 = 28
• a2 = [(-36)(-1) – (-23)(-2)]/-2 = 5
• h = -2*16/[28 ± ( 282 – 4*5*16)1/2] = -
32/49.54
• Taking positive sign x4 = 1.354
• Iteration 2:
• x1 = 1
• x2 = 2
• x3 = 1.354
• h1 = x1- x3 = - 0.354
• h2 = x2 – x3 = 0.646
• f1 = -7
• f2 = 16
• f3 = -0.3096797
• d1 = f1 – f3 = -6.6903202
• d2 = f2 – f3 = 16.3096797
• D = h1h2( h1- h2) = 0.2287031
• a1 = [d2h12 – d1 h22]/D = 21.1454
• a2 = [d1h2 – d2 h1]/D = 6.354
• a0 = f3 = -0.30967
• h= -2a0/[a1 ± (a12 – 4a2a0)1/2] =
0.6193594/42.4762 = 0.01458
• x4 = x3 + h = 1.3686472
Bairstow’s Method
• Here quadratic factor of higher order
polynomial obtained
• Consider cubic equation
3 2
• f(x) = a3x +a2x +a1x +a0
• Let x2 +Rx +S be the exact factor
• Let x2 +rx +s be the approximate
factor
• the quotient and remaider term will be
linear
• f(x) = (x2 +rx +s )(b3x + b2) + b1x +b0
• If r = R and s= S then b1 and b0 shall be
zero
• From the comparison we have
• b3 = a3
• b2 = a2 – rb3
• b1 = a1 –rb2 - sb3
• b0 = a0 – sb2
• The value of bi are given by the
following recurrence formula:
• bn = an
• bn-1 = an-1 – rbn
• bi = ai –r bi+1 – s bi+2 for i = n-2
to 1
• b0 = a0 – sb2
• For x2 +rx +s to be exact fraction
• b0 (r,s) = 0 = b1(r,s)
• Δr0 and Δs0 computed from the above
put it in initial assumption of r0 , s0 to
determine r1 and s1.
SOLUTION OF LINEAR
EQUATION:
The equation
a1x1+ a2x2 + a3x3 + ….+anxn = b
may be written as in the concise form
n

a x
i 1
i i
b
will have infinite solution for xi
To have unique solution the no of
equation and variables be same.
A set of such n independent equation is
termed as system of equations or
simultaneous equation.
a11x1+a12x2+…+a1nxn=b1
a21x1+a22x2+…+a2nxn=b2
a31x1+a32x2+…+a3nxn=b3
.
an1x1+an2x2+…+annxn=bn
In matrix notation it is written as A x = b
This can be solved by:
• Elimination approach
• Iterative approach
Elimination methods are
1. Basic Gauss elimination method
2. Gauss elimination with pivoting
3. Gauss – Jordan method
4. LU decomposition method
5. Matrix inverse method
Given arbitrary system of equations
four possibility may arise:
1. System has a unique solution
2. System has no solution
3. System has infinite solution
4. System is ill conditioned
For systems of equations following
observations are made:
1. For unique solution no of equations
and variables be same.
2. For no of equations less than no of
variables system is said to be under
determined and unique solution may not
be possible.
3. For no of equations larger than no of
variables system is said to be over
determined and unique solution may or
may not be possible.
4. System is said to be homogeneous
when all constants bi are zero.
Direct solutions to linear systems of
algebraic equations
• Solve the system of equations
AX = B

The solution is formally expressed as:


X = A–1B
Typically it is more efficient to solve for
X directly without solving for A-1 since
finding the inverse is an expensive (and
less accurate) procedure
Types of solution procedures
• Direct Procedures
• Exact procedures which have
infinite precision (excluding
roundoff error)
• Suitable when is relatively fully
populated/dense or well banded
• A predictable number of
operations is required
• Indirect Procedures
• Iterative procedures
• Are appropriate when A is
• Large and sparse but not
tightly banded
• Very large (since roundoff
accumulates more slowly)
• Accuracy of the solution
improves as the number of
iterations increases.
Cramer’s Rule - A Direct Procedure
The components of the solution X are
computed as: A
xk  K

where A
Ak is the matrix A with its kth column
replaced by vector B
|A| is the determinant of matrix A
• For each B vector, we must evaluate
N+1 determinants of size N where N
defines the size of the matrix A.
• Evaluate a determinant as follows using
the method of expansion by cofactors

A   aI , j cof aI , j    ai , J cof ai , J 


N N

j 1 i 1

Here I, J may have any value 1 to N


where
I = specified value of i
J = specified value of j
cof ai , j    1 min or a 
i j
i, j

minor (ai,j) = determinant of the sub-


matrix obtained by deleting the ith row
th
and the j column
• Procedure is repeated until 2x2 matrices
are established (which has a determinant
by definition):

2x2 system  O(24) = O (16)


4x4 system  O(44) = O (256)
8x8 system  O(84) = O (4096)
• Cramer’s rule is not a good method
for very large systems!
• If |A| = 0 and |Ak|  0  no solution
! The matrix A is singular
• If |A| = 0 and |Ak|  0  infinite
number of solutions.
Gauss Elimination - A Direct
Procedure
Basic concept is to produce an upper or
lower triangular matrix and to then use
backward or forward substitution to solve
for the unknowns.
Example application
• Solve the system of equations
 a11 a12 a13   x1   b1 
a a22 a23   x2   b2 
 21    
 a31 a32 a33   x3  b3 

• Divide the first row of A and B by a11


(pivot element) to get
1 a '12 a '13   x1  b'1 
a a22 a23   x2    b2 
 21    
 a31 a32 a33   x3   b3 
• Now multiply row 1 by a21 and
subtract from row 2
• and then multiply row 1 by a31 and
subtract from row 3
1 a '12 a '13   x1   b '1 
0 a ' a '23   x2   b '2 
 22

 0 a '32 a '33   x3   b '3 


• Now divide row 2 by a’22 (pivot
element)
1 a '12 a '13   x1   b '1 
0 1 a "23   x2   b "2 

0 a '32 a '33   x3   b '3 
• Now multiply row 2 by a’32 and
subtract from row 3 to get

1 a '12 a '13   x1   b '1 


0 1 a "23   x2   b "2 

0 0 a "33   x3   b "3 
• We apply a backward substitution
procedure to solve for the components
of X
• x3 = b’”3
• x2 + a'‘23x3 = b''2  x2 = b''2 – a'‘23x3
• x1 + a‘12x2 + a‘13x3 = b'1  x1 = b'1 –
a‘12x2 – a‘13x3
• We can also produce a lower
triangular matrix and use a forward
substitution procedure
• Number of operations required for
Gauss elimination
• Triangularization 1/3N3
2
• Backward substitution 1/2N
• Total number of operations for
Gauss elimination equals O(N)3
versus O(N)4 for Cramer’s rule
• Therefore we save O(N) times
operations as compared to Cramer’s
rule
Gauss-Jordan Elimination - A Direct
Procedure
Gauss Jordan elimination is an adaptation
of Gauss elimination in which both
elements above and below the pivot
element are cleared to zero  the entire
column except the pivot element become
zeroes
1 0 0 0   x1  b1'''' 
 
0
 1 0 0   x2  b2'''' 
 ''''
0 0 1 0   x3  b3 
     '''' 
0 0 0 1   x4  b4 
No backward/forward substitution is
necessary
Matrix Inversion by Gauss-Jordan
Elimination
• Given A , find A-1 such that
1 0 0 0 0 0
• A A-1 =I 0

1 0 0 0 0

0 0 1 0 0 0
• Here I= Identity matrix = 0

0 0 1 0 0

0 0 0 0 1 0
0 0 0 0 0 1

• Procedure is similar to finding the


solution of AX=B except that the
matrix A-1 assumes the role of vector
X and matrix I serves as vector B
• Therefore we perform the same
operations on A and I
Convert A→I through Gauss-Jordan
elimination
AA-1 = I

A’A-1= I’
• However through the manipulations A
→ A’= I and therefore
IA-1 = I’

A-1 = I’
• The right hand side matrix, I’ , has
been transformed into the inverted
matrix
• Notes:
• Inverting a diagonal matrix simply
involves computing reciprocals a 0 11 0
A 0 a 0
-1  
• AA = I
22

 0 0 a33 
• Inverse of the product relationship
[A1A2A3]-1 = A3-1A2-1A1-1 1/ a 0 11 0 
A1   0 1 / a22 0 
 
 0 0 1 / a33 
• Gauss Elimination Type Solutions to
Banded Matrices
• Banded matrices
• Have non-zero entries contained
within a defined number of positions
to the left and right of the diagonal
(bandwidth)
NxN system Compact Diagonal
x x 0 x 0 0 0 0 0 0 0 0  0 0 0 x x 0 x 
x x x x 0 0 0 0 0 0 0 0 0 0 x x x x 0
   
0 x x 0 x x 0 0 0 0 0 0 0 0 x x 0 x x
x 0 x x 0 0 x 0 0 0 0 0 x 0 x x 0 0 x
   
0 x x 0 x 0 x 0 0 0 0 0Stored as x
x 0 x 0 x 0
0 0 0 x x x x 0 x 0 0 0 0 x x x x 0 x
  x 0 x x x 0 x
0 0 0 x 0 x x x 0 x 0 0  
0 0 0 0 0 0 x x x x 0 0 0 0 x x x x 0
  x 0 x x x 0 x
0 0 0 0 0 x 0 x x x 0 x x 0 x x x 0 0
0 0 0 0 0 0 x 0 x x x 0  
0 0 0 0 0 0 0 0 x x x x 0 x x x x 0 0
  x 0 x x 0 0 0 
 0 0 0 0 0 0 0 0 x 0 x x  

half bandwidth Bandwidth


Storage required = N2, (M+1)/2=4
M=7, Storage required = NM
Notes on banded matrices
• The advantage of banded storage mode
is that we avoid storing and
manipulating zero entries outside of
the defined bandwidth
• Banded matrices typically result from
finite difference and finite element
methods (conversion from p.d.e. & ode
→ algebraic equations)
• Compact banded storage mode can still
be sparse (this is particularly true for
large finite difference and finite
element problems)
• Savings on storage for banded matrices
• N2 for full storage versus NM for
banded storage
• where N = the size of the matrix and M
= the bandwidth
Savings on computations for banded
matrices
• Assuming a Gauss elimination
procedure
• O(N3) versus O (NM2)
• Full Banded
2 2
•Therefore save O(N /M ) times
operations since we are not
manipulating all the zeros outside of the
bands!
• Examples:
Problems with Gauss Elimination
Procedures
Inaccuracies originating from the pivot
elements
The pivot element is the diagonal
element which divides the associated row
• As more pivot rows are processed, the
number of times a pivot element has been
modified increases.
• Sometimes a pivot element can become
very small compared to the rest of the
elements in the pivot row
• Pivot element will be inaccurate due to
roundoff
• When the pivot element divides the rest
of the pivot row, large inaccurate
numbers result across the pivot row
• Pivot row now subtracts (after being
multiplied) from all rows below the pivot
row, resulting in propagation of large
errors throughout the matrix!

Partial Pivoting
• Always look below the pivot element
and pick the row with the largest value
and switch rows
Complete pivoting
• Look at all columns and all rows to the
right/below the pivot element and switch
so that the largest element possible is in
the pivot position.
• For complete pivoting, you must
change the order of the variable array
• Pivoting procedures give large diagonal
elements
• minimize roundoff error
• increase accuracy
• Pivoting is not required when the
matrix is diagonally dominant
A matrix is diagonally dominant when
the absolute values of the diagonal terms
is greater than the sum of the absolute
values of the off diagonal terms for each
row
Example:
Find the inverse of the matrix
1 2 3
A  0 1 2 
 
0 0 1  a11 a12 a13 
Let A1   a22 a23 
 
 a33 

1 2 3 a11 a12 a13  1 0 0


0 1 2   a22 a23   0 1 0
    
0 0 1   a33  0 0 1

-1
Since AA = I, we write
a11 =1, a22 = 1
2 a11 + a12= 0, 2a22 + a23 = 0
 a12 = -2,  a23 = -2
3a11 + 2a12 + a13 = 0, a33 = 1
 a13 = 1
Hence 1  2 1 
A1  0 1  2
 
0 0 1 
Example:
Use gauss elimination to solve following
system of equation.
2x + y + z =10
3x + 2y +3z =18
x + 4y +9z = 16
Eliminate x from second and third
equation
• Multiply fist eq. with -3/2 and add it to
second equation giving y + 3z = 6
• Multiply fist eq. with -1/2 and add it to
third equation giving 7y + 17z = 22
• From last two equation eliminate y
Multiply -7 to first and add it to two
giving -4z = -20, z = 5,
The upper Triangular form is given
as:
2x + y + z =10
y +3z = 6
z=5
resulting x = 7, y = -9 and z = 5
Example:
Solve the equation
0.0003120 x1 + 0.006032 x2 = 0.003328
0.50000 x1 + 0.8942 x2 = 0.9471
The exact solution is x1= 1 and x2 = 0.5
First solve system with pivoting

0.50000 x1 + 0.8942 x2 = 0.9471


0.0003120 x1 + 0.006032 x2 = 0.003328
Using Gauss elimination above equation
reduces to
0.50000 x1 + 0.8942 x2 = 0.9471
+ 0.005747 x2= 0.002737
Back substitution gives x2 = 0.5 and x1
= 1.0
Without pivoting Gaussian elimination
results:
0.0003120 x1 + 0.006032 x2 = 0.003328
-8.7725 x2 = -5.3300
Back substitution gives x2 = 0.6076 and
x1 = -1.0803
Example:
Using Gauss Jordan method solve
following system of equation.
2x + y + z =10
3x + 2y +3z =18
x + 4y +9z = 16
Elimination of x from 2nd and 3rd
equation
2x + y + z = 10
(½)y + (3/2) z = 3
(7/2) y + (17/2)z = 11
Next the unknown y is eliminated both
from first and third equation giving
x –z = 2
y + 3z = 6
z=5
Matrix Inversion by Gauss Method
We know that X will be inverse of A if
AX = I
For third order matrix
 a11 a12 a13   x11 x12 x13  1 0 0
a a22 a23   x21 x23 x23   0 1 0
 21    
 a31 a32 a33   x31 x32 x33  0 0 1

This equation is equivalent to three


equation
 a11 a12 a13   x11  1
a a22 a23   x21   0
 21    
a31 a32 a33   x31  0

 a11 a12 a13   x12  0  a11 a12 a13   x13  0
a a23   x22   1 a
 21
a22
    a22 a23   x23   0
 21    
 a31 a32 a33   x32  0  a31 a32 a33   x33  1
We can apply Gaussian elimination to
each of these systems and result in each
case shall be corresponding column of
A-1 . We can solve all three system
simultaneously by augmented matrix
 a11 a12 a13  1 0 0 
a a22 a23  0 1 0
 21 
 a31 a32 a33  0 0 1

We obtain after first and second stage


 a11 a12 a13  1 0 0 
0 a '22 a '23   a21 a11 1 0 

 0 a '32 a '33  a31 a11 0 1
And
 a11 a12 a13  1 0 0 
0 a '22 a '23   21 1 0 

 0 0 a "33   31  32 1
The inverse can now be obtained easily
with the help of matrix I
' '
a21 a21 a32 a31 a32
 21   ,  31  '
 ,  32   '
a11 a11 a22 a11 a22

 1 0 0
I   21 1 0 
 31  32 1 

Example:
2 1 1
The matrix A is given as: A   3 2 3
 1 4 9 

The augmented system is

2 1 1  1 0 0 
3 2 3  0 1 0
 
1 4 9  0 0 1
After the first stage
2 1 1  1 0 0 
 0 1/ 2 3 / 2  3 / 2 1 0 
 
 0 7 / 2 17 / 2  1/ 2 0 1

Finally at the end of second stage


2 1 1  1 0 0 
 0 1/ 2 3 / 2  3 / 2 1 0 
 
 0 0 2  10 7 1 
This is equivalent to three systems

2 1 1  1 
0 1 / 2 3 / 2   3 / 2
 
0 0  2  10 
2 1 1  0 
0 1 / 2 3 / 2  1 
 
0 0  2   7

2 1 1  0 
0 1 / 2 3 / 2  0
 
0 0  2  1 

Whose solution by back substitution


yields three column of matrix:
 3 5 / 2  1 / 2
 12  17 / 2 3 / 2 
 
 5 7 / 2  1 / 2
Which is the required inverse A-1
We can also find |A| = 2(-1/2)2 = -2
DIRECT SOLUTIONS TO
LINEAR ALGEBRAIC
SYSTEMS - CONTINUED
Ill-conditioning of Matrices
• There is no clear cut or precise
definition of an ill-conditioned matrix.
Effects of ill-conditioning
• Roundoff error accrues in the
calculations
• Can potentially result in very inaccurate
solutions
• Small variation in matrix coefficients
causes large variations in the solution
Detection of ill-conditioning in a matrix
• An inaccurate solution for X can satisfy
an ill-conditioned matrix quite well!
• Apply back substitution to check for ill-
conditioning
• Solve AX=B through Gauss or other
direct method → Xpoor
• Back substitute AXpoor →Bpoor
• Comparing we find that Bpoor ≈ B
• Back substitution is not a good
detection technique.
• The effects of ill-conditioning are very
subtle!
• Examine the inverse of matrix A
• If there are elements of A-1 which are
many orders of magnitude larger than the
original matrix, A, then is probably ill-
conditioned
• It is always best to normalize the rows
of the original matrix such that the
maximum magnitude is of order 1
-1
• Evaluate A using the same method
with which you are solving the system of
equations. Now compute A-1 A and
compare the results to I. If there’s a
significant deviation, then the presence
of serious round off exists!
• Compute A-1 using the same method
with which you are solving the system of
equations. This is a more severe test of
round off since it is accumulated both in
the original inversion and the re-
inversion.
Can also evaluate ill-conditioning by
examining the normalized determinant.
The matrix may be ill-conditioned when:
det A
N N
 1
 ij
a 2

i 1 j 1 N N
Where Euclidean norm of A=  a
i 1 j 1
2
ij
If the matrix A is diagonally dominant,
i.e. the absolute values of the diagonal
terms ≥ the sum of the off-diagonal terms
for each row, then the matrix is not ill-
conditioned
N
aii   aij i, j  1,2.3,...N
i . j 1
j i
Effects of ill-conditioning are most
serious in large dense matrices (e.g.
especially those obtained in such
problems as curve fitting by least
squares)
• Sparse banded matrices which result
from Finite Difference and Finite
Element methods are typically much
better conditioned (i.e. can solve fairly
large sets of equations without excessive
roundoff error problems)
Ways to overcome ill-conditioning
• Make sure you pivot!
• Use large word size (use double
precision)
• Can use error correction schemes to
improve the accuracy of the answers
• Use iterative methods
Factor Method (Cholesky Method)
• Problem with Gauss elimination
• Right hand side “load” vector, B ,
must be available at the time of matrix
triangulation
• If B is not available during the
triangulation process, the entire
triangulation process must be
repeated!
• Procedure is not well suited for
solving problems in which B changes
AX = B1  O (N3) + O (N2) Steps
AX = B2  O (N3) + O (N2) Steps
:
AX = BR  O (N3) + O (N2) Steps
• Using Gauss elimination, O(N3R)
operations, where N = size of the
system of equation and R = the
number of different load vectors
which must be solved for
Concept of the factor method is to
facilitate the solution of multiple right
hand sides without having to go through
a re-triangulation process for each Br
Factorization step
• Given A, find L and U such that
A= LU
Where A, L and U are NxN matrices.
• We note that |A|  0  |L| |U|  0 and
therefore neither L nor U can be
singular
• We can have only N2 unknown !
• L is defined as lower triangular
matrix.
• U is defined as upper triangular
matrix.
• Now we have N2 + N unknown.
• Reduce the number of unknowns by
selecting either
• lii = 1 i = 1,2,….N  Doolittle
Method
• uii= 1 i = 1,2,….N  Crout Method
• Now we only have N2 unknowns! We
can solve for all unknown elements of
L and U by proceeding from left to
right and top to bottom.
• Factorization proceeds from left to
right and then top to bottom as:
 a11 a12 a13   u11 u12 u13 
a a22 a23   l21u11 l21u12  u22 l21u13  u23 
 21 
 a31 a32 a33  l31u11 l31u12  l32u22 l31u13  l32u23  u33 
• Red → current unknown being solved
• Blue→ unknown value already solved
Note: In the summation terms of RHS
matrix number of operands is least of
the two subscript of the terms of LHS
matrix.
a11 = u11
a12 = u12
a13 = u13
a21 = l21u11
a22 = l21u12 + u22
a23 = l21u13 + u23
a31 = l31u11
a32 = l31u12 + l32u22
a33 = l31u13 + l32u23 + u33
These factorization are by Doolittle
Method.
Note: Besides these two methods there
may be infinite ways to factorize it
depending upon number of N Terms
out of L and U matrices and its values.
Now considering the equation to be
solved
AX = B
• However A= LU where L and U are
known
LUX = B
Forward/backward substitution
procedures to obtain a solution
• Changing the order in which the
product is formed
L(UX) = B
• Now let Y = U X
• Hence we have two systems of
simultaneous equations
LY = B
UX=Y
• Apply a forward substitution sweep to
solve for Y for the system of
equations L Y = B
• Apply a backward substitution sweep
to solve for X for the system of
equations U X = Y
Notes on Factorization Methods
• Procedure
• Perform the factorization by solving
for L and U
• Perform the sequential forward and
backward substitution procedures to
solve for Y and X.
• The factor method is very similar to
Gauss elimination although the order in
which the operations are carried out is
somewhat different.
• Number of operations
• O(N3) for LU decomposition (same as
triangulation for Gauss)
• O(N2) forward/backward substitution
(same as backward sweep for Gauss)
Advantages of LU factorization over
Gauss Elimination
• Can solve for any load vector B at any
time with O(N2) operations (other than
triangulation which is done only once
with O(N3) operations)
• Generally has somewhat smaller
roundoff error
Example comparing costs
If we are solving R systems of NxN
equations in which the matrix A stays the
same and only the vector B changes,
compare the overall costs for Gauss
elimination and LU factorization
• Gauss Elimination costs
Triangulation cost = R[O ( N3)]
Back substitution cost = R[O ( N2)]
Total cost = R[ O (N3) + O (N2) ]
Total cost for large N @ R [O (N3)]
• LU factorization cost
3
• LU factorization cost = [O (N ) ]
• Back forward substitution cost = R
2
[ O (N ) ]
• Total cost = [ O (N3) + R[O(N2)]
• Total cost for R>>N @ R [O (N2)]
• Considering some typical values for R
and N
• We can also implement LU
factorization (decomposition) in
banded mode and the savings
compared to banded Gauss
elimination would be O(M) (where
M = bandwidth)
• Substituting the factored form of the
matrix and changing the order in which
products are taken
L(LTX) = B
• Let LTX = Y
• Now sequentially solve
LY = B by forward substitution
LTX = Y by backward substitution
LDLT Method:
Decompose A = LDLT
• Where L is a lower triangular matrix
• Where D = diagonal matrix
• Set the diagonal terms of L to unit
CHOLESKY’S FACTORIZATION
or Method of root square
A = LLT
A = UTU
 a11 a12 a13  u11  u11 u12 u13 
a a22 a23   u12 u22  u22 u23 
 21    
 a31 a32 a33  u13 u23 u33   u33 

a11  u112 u11  a11


a12 a
a12  u11u12  u12   12
u11 a11
a13 a
a13  u11u13  u13   13
u11 a11
a21 a a
a21  l21u11  u12u11  u12   21  12
u11 a11 a11

a22  l21u12  l22u22  u12u12  u22u22  u122  u222  u22  a22  u122
1
a23  l21u13  l22u23  u12u13  u22u23  u22u23  a23  u12u13  u23  a23  u12u13 
u22

a33  l31u13  l32u23  l33u33  u132  u232  u332  u33  a33  u132  u232

u11  a11 , u22  a22  u122 , u33  a33  u132  u232

1
u12  a12 , u13  1 a13 
u11 u11

1
u23  a23  u12u13 ,
u22
1 1
a45  u14u15  u24u24  u34u35   uij   aij   ukiukj 
i 1
u45   j  i
u44 uii  k 1 

Factorise the matrix


1 2 3 
2 8 22
 
3 22 82 
u11  1  1
a12 2
u12   2
u11 1
a13 3
u13   3
u11 1

u22  a22  u122  8  4  2

u23 
1
a23  u12u13   22  2 * 3  16  8
u22 2 2

u33  a33  u132  u232  82  9  64  9  3

Thus we have 1 2 3
U   0 2 8
 
0 0 3
ITERATIVE SOLUTIONS TO
LINEAR ALGEBRAIC EQUAT -
IONS
• As finer discretizations are being
applied with Finite Difference and
Finite Element codes:
• Matrices are becoming increasingly
larger
• Density of matrices is becoming
increasingly smaller
• Banded storage direct solution
algorithms no longer remain attractive
as solvers for very large systems of
simultaneous equations.
Example
• For a typical Finite Difference or
Finite Element code, the resulting
algebraic equations have between 5 and
10 nonzero entries per matrix row (i.e.
per algebraic equation associated with
each node)
  0 0 0  0 0 0 0 0 0 0 0 
   0 0 0  0 0 0 0 0 0 0 
 
    0 0 0  0 0 0 0 0 0 
 
 0    0 0 0  0 0 0 0 0 
 0 0    0 0 0  0 0 0 0
 0 0 0    0 0 0  0 0 0
 
0  0 0 0    0 0 0  0 0
A
0 0  0 0 0    0 0 0  0
 
0 0 0  0 0 0    0 0 0 
0 0 0 0  0 0 0    0 0 
 
0 0 0 0 0  0 0 0    0 
0 0 0 0 0 0  0 0 0    
0 0 0 0 0 0 0  0 0 0    

 0 0 0 0 0 0 0 0  0 0 0   
Banded compact matrix density
• Storage required for banded compact
storage mode equals NM where N =
size of the matrix, and M = full
bandwidth
• Total nonzero entries in the matrix
assuming (a typical estimate of) 5 non-
zero entries per matrix row = 5N
• Banded compact matrix density = the
ratio of actual nonzero entries to entries
stored in banded compact mode

Actual nonzero entries 5 N 5


Banded compact matrix density   
Banded storage NM M
Thus with the increasing size of
problems/applications and the
decreasing matrix densities, iterative
methods are becoming increasingly
popular/better alternatives!
Jacobi Method - An Iterative
Method
• Let’s consider the following set of
algebraic equations
a11x1 + a12 x2 + a13x3 = b1
a21x1 + a22 x2 + a23x3 = b2
a31x1 + a32 x2 + a33x3 = b3
• Guess a set of values for X → X[0]
• Now solve each equation for
unknowns which correspond to the
diagonal terms in A, using guessed
values for all other unknowns:
b1  a12 x20  a13 x30
x 
1
1
a11

b  a x 0
 a x 0

x12  2 21 1 23 3

a22
b  a x 0
 a x 0

x31  3 31 1 32 2

a33

• Arrive at a second estimate → X[1]


• Continue procedure until you reach
convergence (by comparing results of 2
consecutive iterations)
• This method is referred to as the
(Point) Jacobi Method.
• The (Point) Jacobi Method is
formally described in vector notation as
follows:
Define A as
A=D-C
• Such that all diagonal elements of A
are put into D
• Such that all off-diagonal elements
of A are put into -C
• The scheme is now defined as:
D X[k+1] = CX[k] + B k≥0

X[k+1] = D-1 C X[k] +D-1 B k≥0
•Recall that inversion of a diagonal
matrix (to find D-1) is obtained simply
by taking the reciprocal of each
diagonal term
The (Point) Jacobi Method can be
described in index notation as:

k 1
N aij k  bi
xi   
j 1, j  i aii
xj 
aii
1 i  N, k 0

• Advantage of iterative methods:


• Each cycle O(N2 ) operations for full
storage mode
• Therefore roundoff error only
accrues during operations! This is
much better than direct methods in
which O(N3 ) operations accrue much
more error!
• Since each cycle only produces an
approximation for the next cycle, any
error in a guess will be handled by the
next cycle
• We can consider roundoff error to
accrue only during the last iteration
• Algorithm can be readily
implemented to operate only on non-
zero entries in the
• matrix reducing both storage and
computations dramatically when
matrix density is low
Total number of operations for full
storage mode
O (N2K) where K= number of cycles
required for convergence
• Note that you don’t a priori know the
number of cycles, K, required to
achieve a certain degree of
convergence and therefore accuracy
• Total number of operations for sparse
non-zero entry only storage mode
O (NαK) where α = number of non
zero entries per equation
K= number of
cycles required for convergence
• The operation count dramatically
reduces for sparse storage modes and
is only a function of the number of
non-zero entries and the number of
cycles. Note that α is not related to the
size of the problem, N, but to the local
grid structure and algorithm
• Iterative methods are ideally suited for
• Very large matrices since they
reduce the round off problem
• Sparse but not banded matrices since
they can reduce computational effort
by not operating on zeroes
• Very large sparse banded matrices
due to efficiency
Example
• Solve by point Jacobi method:
5 x  y  10
 
2 x  3 y  4

5 x[ k 1]  10  y k 
 k 1 k 

3 y  4  2 x
 k 1 1 k 
 x  2  y
5

 y k 1  4  2 xk 
 3 3
Start with solution guess x[0]=-1, y[0]
= -1 and start iterating on the solution

This is a converging process → keep on


going until the desired level of accuracy
is achieved
 Iterative Convergence
• Is the (k+1)th solution better than (k)th
solution?
• Iterative process can be convergent
/divergent
•A necessary conditions for
convergence is that the set be
diagonal.
• This requires that one of the
coefficients in each of the equations
be greater than all others and that
this “strong coefficient” be
contained in a different position in
each equation.
• We can re-arrange all strong
elements onto diagonal positions by
switching columns → this now
makes the matrix diagonal.
• A sufficient condition to ensure
convergence is that the matrix is
diagonally dominant
N
aii   aij , i  1, N
j i
j i

• There are less stringent conditions for


convergence
• A poor first guess will prolong the
iterative process but will not make it
diverge if the matrix is such that
convergence is assured.
• Therefore better guesses will speed up
the iterative process
Criteria for ascertaining convergence
• Absolute convergence criteria
k 1 k 
xi  xi   for i  1, N

• Where   a user specified tolerance


or accuracy
• The absolute convergence criteria is
best used if you have a good idea of
the magnitude of the xi‘s
• Relative convergence criteria
xik 1  xik 
k  
xi

• This criteria is best used if the


magnitude of the xi‘s are not known.
• There are also problems with this
criteria if xi  0
(Point) Gauss Seidel Method
• This method is very similar to the
Jacobi method except that Gauss Seidel
uses the most recently computed values
for x in its computations.
• Using all updated values of x increases
the convergence rate (twice as fast as
Jacobi)
• Consider the system of equation
a11x1 + a12 x2 + a13x3 = b1
a21x1 + a22 x2 + a23x3 = b2
a31x1 + a32 x2 + a33x3 = b3
• Solve for the unknowns associated with
diagonal terms as follows
k  k 
k 1 b1  a12 x2  a13 x3
x1 
a11

k 1 k 
b  a x  a x
x2k 1  2 21 1 23 3

a22

k 1  k 1
 k 1 b3  a31 x1  a32 x2
x3 
a33

• The Gauss Seidel method is formally


described in vector form as
• Define A as
A= D - L – U
• Put diagonal elements of A into D
• Put negative of elements of A below
the diagonal into L
• Put negative of elements of A above
the diagonal into U
• Scheme is then defined as:
DX[k + 1] = LX[k + 1] + UX[k] + B, k≥0 
X[k + 1] = D–1LX[k + 1] + D–1UX[k] + D–1B, k ≥ 0
• The Gauss Seidel method is formally
described using index notation as

i 1aij k 1 N aij k  bi


 k 1
xi   x j   x j  , 1 i  N, k 0
j 1 aii j  i 1aii aii
Point Relaxation Methods (Successive /
Systematic (Over) Relaxation - SOR)
• The SOR approach improves the
calculated values at the k+1th iteration
using Gauss- Seidel by calculating a
weighted average of kth and k+1th
iterations and using this for the next
iteration
xi[k + 1] = λxi [k + 1]* + (1 – λ)xi[k]
• Where xi [k + 1]* is the value obtained
from the current Gauss-Seidel iteration
• λ is the relaxation factor which must be
specified
• Ranges of λ values
• λ ranges between 0 < λ < 2
• λ = 1 → Gauss-Seidel
• 0< λ < 1 → Under-relaxation
• 1 < λ < 2 → Over-relaxation
• Under-relaxation → 0 < λ < 1
• The current value is a weighted
average of current Gauss-Seidel
value and the value from the
previous iteration.
• Typically used to make a non-
convergent process converge.
• Can also be useful in speeding up
convergence when the solutions
oscillate about the converged
solution.
• Over-relaxation → 0 < λ < 1
• The current value is extrapolated
beyond the Gauss-Seidel value
• Typically used to accelerate an
already convergent process
• For λ > 2 , the process diverges
• For a diagonally dominant matrix, SOR
will always converge for
0<λ>2
• Selection of an optimal λ value is
quite complex
• Depends on the characteristics of the
matrix
• Certain “classes” of problems will
have optimal ranges
• Trial and error is very useful
• We can apply different values of λ
for different blocks within a matrix
which exhibit significantly different
characteristics (different blocks in
matrix may be associated with different
p.d.e.’s in a coupled system)
Application of Gauss-Seidel to Non-
Linear Equations
• Gauss-Seidel (with relaxation) is a
very popular method to solve for systems
of nonlinear equations.
• Notes:
• Multiple solutions exist for nonlinear
equations.
• There must be linear components
included in the equations such that a
diagonal is formed.
• No general theory on iterative
convergence is available for nonlinear
equations
Block Iterative Methods
• Instead of operating on a point by
point basis, we solve simultaneously
for entire groups of unknowns using
direct methods.
• Partition the coefficient matrix into
blocks. All elements in the block are
then solved in one step using a direct
method
Direct/Iterative Methods
• Can correct errors due to roundoff in
direct solutions by applying an
iterative solution after the direct
solution has been implemented.
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]

1
.

Chapter 4

Solution of Non-linear Equation

Module No. 1

Newton’s Method to Solve Transcendental Equation


......................................................................................

Finding roots of algebraic and transcendental equations is a very important task.


These equations occur in many applications of science and engineering.
A function f (x) is called algebraic if each term of f (x) contains only the arithmetic
operations between real numbers and x with rational power. On the other hand, a tran-
scendental function includes at least one non-algebraic function, i.e. an exponential
function, a logarithmic function, trigonometric functions, etc.
An equation f (x) = 0 is called algebraic or transcendental according as f (x) is
algebraic or transcendental.
The equations x9 − 120x2 + 12 = 0 and x15 + 23x10 − 9x8 + 30x = 0 are the examples
of algebraic equations and the equations log x + xex = 0, 3 sin x − 9x2 + 2x = 0 are the
examples of transcendental equations.
Lot of numerical methods are available to solve the equation f (x) = 0. But, each
method has some advantages and disadvantages over another method. Mainly, the
following points are considered to compare the methods:
rate of convergence, ___domain of applicability, number of evaluation of functions, pre-
computation step, etc.
The commonly used methods to solve an algebraic and transcendental equations
are bisection, regula-falsi, secant, fixed point iteration, Newton-Raphson, etc. In this
module, only Newton-Raphson method is discussed. It is a very interesting method and
rate of convergence of this method is high compare to other methods.

1.1 Newton-Raphson method

This method is also known as method of tangent. The Newton-Raphson method is


an iteration method and so it needs an initial or starting value. Let f (x) = 0 be the
given equation and x0 be the initial guess, i.e. the initial approximate root.
Let x1 = x0 + h be an exact root of the equation f (x) = 0, where h is a correction of
the root, i.e. the amount of error. Generally, it is assumed that h is small. Therefore,
f (x1 ) = 0.
Now, by Taylor’s series, the equation f (x1 ) = f (x0 + h) = 0 is expanded as

h2 ′′
f (x0 ) + hf ′ (x0 ) + f (x0 ) + · · · = 0.
2!
1
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

Since h is small, so the second and higher power terms of h are neglected and then
the above equation reduces to

f (x0 )
f (x0 ) + hf ′ (x0 ) = 0 or, h = − .
f ′ (x0 )

Note that this is an approximate value of h. Using this h, the value of x1 is

f (x0 )
x1 = x0 + h = x0 − . (1.1)
f ′ (x0 )

It is obvious that x1 is a better approximation of x than x0 . Since x1 is not an exact


root of the equation f (x) = 0, therefore another iteration is to be performed to find the
next better root. For this purpose, the value of x0 is replaced by x1 in equation (1.1)
to get second approximate root x2 . That is,

f (x1 )
x2 = x1 − . (1.2)
f ′ (x1 )

In this way, the (n + 1)th iterated value is given by

f (xn )
xn+1 = xn − . (1.3)
f ′ (xn )

The above formula generates a sequence of numbers x1 , x2 , . . . , xn , . . .. The terms of


this sequence go to the exact root ξ. The method will terminate when |xn+1 − xn | ≤ ε,
where ε is a pre-assigned very small positive number called the error tolerance.
Note 1.1 This method is also used to find a complex root of the equation f (x) = 0.
But, for this case, the initial root is taken as a complex number.

Geometrical interpretation

The geometrical interpretation of Newton-Raphson method is shown in Figure 1.1.


Here, a tangent is drawn at the point (x0 , f (x0 )) to the curve y = f (x). Let the
tangent cuts the x-axis at the point (x1 , 0). Again, a tangent is drawn at the point
(x1 , f (x1 )). Suppose this tangent cuts the x-axis at the point (x2 , 0). This process is
repeated until the nth iterated root xn coincides with the exact root ξ, for large n. For
this reason this method is known as method of tangents.
2
......................................................................................

f (x)
6

6

6 - x
O ξ x2 x1 x0

Figure 1.1: Geometrical interpretation of Newton-Raphson method.

The choice of initial guess x0 is a very serious task. If the initial guess is close to
the root then the method converges rapidly. But, if the initial guess is not much close
to the root or if it is wrong, then the method may generates an endless cycle. Also, if
the initial guess is not close to the exact root, the method may generates a divergent
sequence of approximate roots.
Thus, to choose the initial guess the following rule is suggested.
Let a root of the equation f (x) = 0 be in the interval [a, b]. If f (a) · f ′′ (x) > 0 then
x0 = a be taken as the initial guess of the equation f (x) = 0 and if f (b) · f ′′ (x) > 0,
then x0 = b be taken as the initial guess.

1.1.1 Convergence of Newton-Raphson method

Suppose a root of the equation f (x) = 0 lies in the interval [a, b].
The Newton-Raphson iteration formula (1.3) is

f (xi )
xi+1 = xi − = φ(xi ) (say). (1.4)
f ′ (xi )

If ξ is a root of the equation f (x) = 0, therefore

ξ = φ(ξ). (1.5)
3
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

Subtracting (1.4) from (1.5),

ξ − xi+1 = φ(ξ) − φ(xi )


= (ξ − xi )φ′ (ξi ) (by MVT)
(where ξi lies between ξ and xi )

Now, substituting i = 0, 1, 2, . . . , n to the above equation and multiplying them we


get

(ξ − xn+1 ) = (ξ − x0 )φ′ (ξ0 )φ′ (ξ1 ) · · · φ′ (ξn )


or |ξ − xn+1 | = |ξ − x0 ||φ′ (ξ0 )||φ′ (ξ1 )| · · · |φ′ (ξn )| (1.6)

Let |φ′ (x)| ≤ l for all x ∈ [a, b]. Then from the equation (1.6)

|ξ − xn+1 | ≤ ln+1 |ξ − x0 |.

Now, if l < 1 then |ξ − xn+1 | → ∞.


Therefore, lim xn+1 = ξ.
n→∞
Hence, the sequence {xn } converges to ξ for all x ∈ [a, b], if |φ′ (x)| < 1 or
 
d f (x)

dx x − <1 or |f (x) · f ′′ (x)| < |f ′ (x)|2 (1.7)
f ′ (x)
within the interval [a, b].
Thus, the Newton-Raphson method converges if the initial guess x0 is chosen suf-
ficiently close to the root and the functions f (x), f ′ (x) and f ′′ (x) are continuous and
bounded within [a, b]. This is the sufficient condition for the convergence of the Newton-
Raphson method.
The rate of convergent of Newton-Raphson method is calculated in the following
theorem.
Theorem 1.1 The rate of convergence of Newton-Raphson method is quadratic.
Proof. The Newton-Raphson iteration formula is
f (xn )
xn+1 = xn − . (1.8)
f ′ (xn )
Let ξ and xn be an exact root and the nth approximate root of the equation f (x) = 0.
Let εn be the error occurs at the nth iteration. Then xn = εn + ξ and f (ξ) = 0.
4
......................................................................................

Therefore, from the equation (1.8)

f (εn + ξ)
εn+1 + ξ = εn + ξ −
f ′ (εn + ξ)

That is,

f (ξ) + εn f ′ (ξ) + (ε2n /2)f ′′ (ξ) + · · ·


εn+1 = εn − [by Taylor’s series expansion]
f ′ (ξ) + εn f ′′ (ξ) + · · ·
 
′ ε2n f ′′ (ξ)
f (ξ) εn + 2 f ′ (ξ) + · · ·
= εn −   [since f (ξ) = 0]
f ′′ (ξ)
f (ξ) 1 + εn f ′ (ξ) + · · ·

ε2 f ′′ (ξ)
−1
f ′′ (ξ)
 
= εn − εn + n ′ + ··· 1 + εn ′ + ···
2 f (ξ) f (ξ)
1 f ′′ (ξ)
= ε2n ′ + O(ε3n ).
2 f (ξ)

Neglecting the third and higher powers of εn , the above expression reduces to

f ′′ (ξ)
εn+1 = Cε2n , where C = a constant number. (1.9)
2f ′ (ξ)

Since the power of εn is 2, therefore the rate of convergence of Newton-Raphson


method is quadratic.

Example 1.1 Using Newton-Raphson method find a root of the equation x3 − 2 sin x −
2 = 0 correct up to five decimal places.

Solution. Let f (x) = x3 − 2 sin x − 2. One root lies between 1 and 2. Let x0 = 1 be
the initial guess.
The iteration scheme is

f (xn )
xn+1 = xn −
f ′ (xn )
x3 − 2 sin xn − 2
= xn − n 2 .
3xn − 2 cos xn

The sequence {xn } for different values of n is shown below.


5
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

n xn xn+1
0 1.000000 2.397806
1 2.397806 1.840550
2 1.840550 1.624820
3 1.624820 1.588385
4 1.588385 1.587366
5 1.587366 1.587365

Therefore, one root of the given equation is 1.58736 correct up to five decimal places.

Example 1.2 Find an iteration scheme to find the kth root of a number a and hence
find the cube root of 2.

Solution. Let x be the kth root of a. Therefore, x = a1/k or xk − a = 0.


Let f (x) = xk − a. Then the Newton-Raphson iteration scheme is

f (xn )
xn+1 = xn −
f ′ (xn )
xk − a k xkn − xkn + a
= xn − n k−1 =
k xn k xk−1
n
1h a i
= (k − 1)xn + k−1 .
k xn

Second part: Here, a = 2 and k = 3.


Then the above iteration scheme reduces to
x3n + 1
 
1h 2 i 2
xn+1 = 2xn + 2 = .
3 xn 3 x2n

All calculations are shown in the following table.

n xn xn+1
0 1.00000 1.33333
1 1.33333 1.26389
2 1.26389 1.25993
3 1.25993 1.25992

3
Thus, the value of 2 is 1.2599, correct up to four decimal places.
6
......................................................................................

5 x3n + 1
Example 1.3 Suppose 2xn+1 = is an iteration scheme to find a root of the
9
equation f (x) = 0. Find the function f (x).

Solution. Let l be a root obtained from the given iteration scheme

5 x3n + 1
2xn+1 = .
9
Then, lim xn = l.
n→∞ h i
Now, lim 18xn+1 = 5 lim x3n + 1 .
n→∞ n→∞
That is, 18l = (5l3 + 1), or 5l3 − 18l + 1 = 0.
Therefore, the required equation is 5x3 − 18x + 1 = 0, and hence f (x) = 5x2 − 18x + 1.

Example 1.4 Discuss the Newton-Raphson method to find a root of the equation x15 −
1 = 0 starting with x0 = 0.5.

Solution. It is obvious that the real roots of the given equation are ±1.
Here f (x) = x15 − 1.
Therefore,
x15
n −1 14x15
n +1
xn+1 = xn − 14
= .
15xn 15x14
n

14 × (0.5)15 + 1
Let the initial guess be x0 = 0.5. Then x1 = = 1092.7333. This is
15 × (0.5)14
far away from the root 1. This is because 0.5 is not close enough to the exact root
x = 1.
But, the initial guess x0 = 0.9 gives the first approximate root as x1 = 1.131416 and
it is close to the root 1.
This example shows the importance of initial guess in Newton-Raphson method.
The Newton-Raphson method may also be used to find the complex root. This is
illustrated in the following example.

7
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

Example 1.5 Find a complex root of the equation z 3 + 3z 2 + 3z + 2 = 0. An initial


guess may be taken as 0.5 + 0.5i.

Solution. Let z0 = 0.5+0.5i = (0.5, 0.5) be the initial guess and f (z) = z 3 +3z 2 +3z+2.
Then f ′ (z) = 3z 2 + 6z + 3. The Newton-Raphson iteration scheme is
f (zn )
zn+1 = zn − .
f ′ (zn )
The values of zn and zn+1 at each iteration are tabulated below:

n zn zn+1
0 ( 0.50000000, 0.50000000) (–0.10666668, 0.41333333)
1 (–0.10666668, 0.41333333) (–0.62715298, 0.53778100)
2 (–0.62715298, 0.53778100) (–0.47841841, 1.0874815)
3 (–0.47841841, 1.0874815) (–0.50884020, 0.90368903)
4 (–0.50884020, 0.90368903) (–0.50117314, 0.86686337)
5 (–0.50117314, 0.86686337) (–0.50000149, 0.86602378)
6 (–0.50000149, 0.86602378) (–0.49999994, 0.86602539)
7 (–0.49999994, 0.86602539) (–0.49999994, 0.86602539)

Thus one complex root is (−0.49999994, 0.86602539), i.e. −0.49999994+ 0.86602539 i


correct up to eight decimal places.

1.2 Newton-Raphson method for multiple root

Using Newton-Raphson method, one can determined the multiple root of the equation
f (x) = 0. But, the following modified formula
f (xn )
xn+1 = xn − k (1.10)
f ′ (xn )
gives a more faster convergent scheme, where k is the multiplicity of the root. The term
in the formula k1 f ′ (xn ) is the slope of the straight line passing through point (xn , f (xn ))
and intersecting the x-axis at the point (xn+1 , 0).
Let ξ be a root of the equation f (x) = 0 with multiplicity k. Then ξ is also a root
of the equation f ′ (x) = 0 with multiplicity (k − 1). In general, ξ is a root of the
equation f p (x) = 0 with multiplicity (k − p), p < k. If the equation f (x) = 0 has a
8
......................................................................................

root with multiplicity k and if the initial guess is very close to the exact root ξ, then
the expressions

f (x0 ) f ′ (x0 ) f ′′ (x0 ) f k−1 (x0 )


x0 − k , x0 − (k − 1) , x0 − (k − 2) , . . . , x 0 −
f ′ (x0 ) f ′′ (x0 ) f ′′′ (x0 ) f k (x0 )

must have the same value.


Theorem 1.2 The rate of convergence of the formula (1.10) is quadratic.

Proof. Let ξ be a multiple root of the equation f (x) = 0 of multiplicity k. Therefore,


f (ξ) = f ′ (ξ) = f ′′ (ξ) = · · · = f k−1 (ξ) = 0 and f k (ξ) 6= 0. Let xn and εn be the nth
approximate root and the error at this step. Then εn = xn − ξ. Now, from the iteration
scheme (1.10), we have

f (εn + ξ)
εn+1 = εn − k
f ′ (εn + ξ)
εk−1 k−1 (ξ) εkn k εk+1 k+1 (ξ)
f (ξ) + εn f ′ (ξ) + · · · + (k−1)! f
n
+ k! f (ξ) + (k+1)! f
n
+ ···
= εn − k
εk−2 k−1 (ξ) + εk−1 k (ξ) εkn k+1
f ′ (ξ) + εn f ′′ (ξ) + · · · + (k−2)! f
n
(k−1)! f
n
+ k! f (ξ) + ···

εkn k εk+1 k+1 (ξ)


k! f (ξ) + (k+1)! f + ···
n

= εn − k
εk−1 k εkn k+1
(k−1)! f (ξ) + k! f (ξ) + ···
n

ε2n f k+1 (ξ) εn f k+1 (ξ)


  −1
εn
= εn − k + + ··· 1 + + ···
k k(k + 1) f k (ξ) k f k (ξ)
ε2n f k+1 (ξ) εn f k+1 (ξ)
  
εn
= εn − k + + ··· 1 − + ···
k k(k + 1) f k (ξ) k f k (ξ)
ε2 f k+1 (ξ) ε2n f k+1 (ξ)
 
= εn − εn + n − + ···
k + 1 f k (ξ) k f k (ξ)
f k+1 (ξ)
 
2 1
= εn + O(ε3n ).
k(k + 1) f k (ξ)

1 f k+1 (ξ)
Let C = . Neglecting cube and higher order terms of εn , the above
k(k + 1) f k (ξ)
equation becomes εn+1 = Cε2n .
Thus, the rate of convergence of the scheme (1.10) is quadratic.
9
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

Example 1.6 Find the multiple root with multiplicity 3 of the equation x4 − x3 − 3x2 +
5x − 2 = 0.

Solution. Let the initial guess be x0 = 0.5. Also, let f (x) = x4 − x3 − 3x2 + 5x − 2.
f ′ (x) = 4x3 − 3x2 − 6x + 5, f ′′ (x) = 12x2 − 6x − 6, f ′′′ (x) = 24x − 6.
The first iterated values are
f (x0 ) f (0.5)
x1 = x0 − 3 ′ = 0.5 − 3 ′ = 1.035714
f (x0 ) f (0.5)
f ′ (x0 ) f ′ (0.5)
x1 = x0 − 2 ′′ = 0.5 − 2 ′′ = 1.083333 and
f (x0 ) f (0.5)
f ′′ (x0 ) f ′′ (0.5)
x1 = x0 − ′′′ = 0.5 − ′′′ = 1.5.
f (x0 ) f (0.5)
The first two values of x1 are closed to 1. It indicates that the equation may have a
double root near 1.
Let x1 = 1.035714.
f (x1 ) f (1.035714)
Then x2 = x1 − 3 ′ = 1.035714 − 3 ′ = 1.000139
f (x1 ) f (1.035714)
f ′ (x1 ) f ′ (1.035714)
x2 = x1 − 2 ′′ = 1.035714 − 2 ′′ = 1.000277
f (x1 ) f (1.035714)
f ′′ (x1 ) f ′′ (1.000277)
x2 = x1 − ′′′ = 1.000277 − ′′′ = 1.000812.
f (x1 ) f (1.000277)
Here it is seen that the three values of x2 are very close to 1. So the equation has a
multiple root near 1 of multiplicity 3.
Let x2 = 1.000139.
The third iterated values are
f (x2 ) f ′ (x2 )
x3 = x2 − 3 ′ = 1.000000, x3 = x2 − 2 ′′ = 1.000000 and
f (x2 ) f (x2 )
f ′′ (x2 )
x3 = x2 − ′′′ = 1.000000.
f (x2 )
All the values of x3 are same and hence one root of the equation is 1.000000 correct
up to six decimal places, with multiplicity 3.

1.3 Modification on Newton-Raphson method

After development of Newton-Raphson method, some modifications have been made


on this method. One of them is discussed below.
10
......................................................................................

Note that in the Newton-Raphson method the derivative of the function f (x) is
evaluated at each iteration. That is, to find xn+1 , the value of f ′ (xn ) is required for
n = 0, 1, 2, . . .. Therefore, at each iteration two functions are evaluated at the point
xn , n = 0, 1, 2, . . .. So, a separate method is required to find derivatives. Thus, in
each iteration of this method more calculations are needed. But, the following proposed
method can reduced the computational effort:

f (xn )
xn+1 = xn − . (1.11)
f ′ (x0 )

In this method, the derivative of f (x) is calculated only at the initial guess x0 and
obviously it reduces the computation time at each iteration. But, the rate of convergence
of this method reduced to 1. This is, proved in the following theorem.
Theorem 1.3 The rate of convergence of the modified Newton-Raphson method (1.11)
is linear.
Solution. Let ξ be an exact root of the equation f (x) = 0 and xn be the approximate
root at the nth iteration. Then f (ξ) = 0. Let εn be the error occurs at the nth iteration.
Then εn = xn − ξ.
Now, from the formula (1.11), we have

f (εn + ξ) f (ξ) + εn f ′ (ξ) + · · ·


εn+1 = εn − = εn −
f ′ (x0 ) f ′ (x0 )
 ′
f (ξ)

= εn 1 − ′ + O(ε2n ).
f (x0 )

Neglecting square and higher power terms of εn , the above equation reduces to
 
f ′ (ξ)
εn+1 = εn 1 − ′ .
f (x0 )

f ′ (ξ)
Let C = 1 − , which is free from εn . Using this notation the above error equa-
f ′ (x0 )
tion becomes

εn+1 = Cεn . (1.12)

This shows that the rate of convergence of the formula (1.11) is linear.

11
. . . . . . . . . . . . . . . . . . . . . . . Newton’s Method to Solve Transcendental Equation

Example 1.7 Find a root of the equation x3 − 3x2 + 1 = 0 using modified Newton-
Raphson formula (1.11) and Newton-Raphson method correct up to four decimal places.

Solution. Let f (x) = x3 − 3x2 + 1. One root of this equation lies between 0 and 1.
Let the initial guess be x0 = 0.5. Now, f ′ (x) = 3x2 − 6x and hence f ′ (x0 ) = −2.25.
The iteration scheme for the formula (1.11) is
f (xn )
xn+1 = xn −
f ′ (x0 )
x3 − 3x2n + 1 x3 − 3x2n + 2.25xn + 1
= xn − n = n .
−2.25 2.25
All the approximate roots are calculated in the following table.

n xn xn+1
0 0.50000 0.66667
1 0.66667 0.65021
2 0.65021 0.65313
3 0.65313 0.65263
4 0.65263 0.65272
5 0.65272 0.65270
Therefore, 0.6527 is a root of the given equation correct up to four decimal places.

By Newton-Raphson method
The iteration scheme for Newton-Raphson method is
f (xn )
xn+1 = xn −
f ′ (xn )
x3 − 3x2 + 1 2x3 − 3x2n − 1
= xn − n 2 n = n2 .
3xn − 6xn 3xn − 6xn
Let x0 = 0.5. The successive iterations are shown below.

n xn xn+1
0 0.50000 0.66667
1 0.66667 0.65278
2 0.65278 0.65270
3 0.65270 0.65270
12
......................................................................................

Therefore, 0.6527 is a root correct up to four decimal places.


In this example, Newton-Raphson method takes less number of iterations whereas as
expected the modified formula (1.11) needs more iterations.

13
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]

1
.

Chapter 4

Solution of Non-linear Equation

Module No. 2

Roots of a Polynomial Equation


......................................................................................

Derivation of all roots of a polynomial equation is a very important task. In many


applications of science and engineering all roots of a polynomial equations are needed
to solve a particular problem. For example, to find the poles, singularities, etc. of a
function, the zeros of the denominator (polynomial) are needed. The available analytic
methods are useful when the degree of the polynomial is at most four. So, numerical
methods are required to find the roots of the higher degree polynomial equations.
Fortunately, many direct and iterated numerical methods are developed to find all the
roots of a polynomial equation. In this module, two iterated methods, viz. Birge-Vieta
and Bairstow methods are discussed.

2.1 Roots of polynomial equations

Let Pn (x) be a polynomial in x of degree n. If a0 , a1 , . . . , an are coefficients of Pn (x),


then equation Pn (x) = 0 can be written in explicit form as

Pn (x) ≡ a0 xn + a1 xn−1 + · · · + an−1 x + an = 0. (2.1)

Here, we assumed that the coefficients a0 , a1 , . . . , an are real numbers.


A number ξ (may be real or complex) is a root of the polynomial equation Pn (x) = 0
if and only if Pn (ξ) = 0. That is, Pn (x) is exactly divisible by x − ξ. If Pn (x) is exactly
divisible by (x − ξ)k (k ≥ 1), but it is not divisible by (x − ξ)k+1 , then ξ is called a root
of multiplicity k. The roots of multiplicity k = 1 are called simple roots or single
roots.
From fundamental theorem of algebra, we know that every polynomial equation has a
root. More precisely, every polynomial equation Pn (x) = 0, (n ≥ 1) with any numerical
coefficients has exactly n, real or complex roots.
The roots of any polynomial equation are either real or complex. If the coefficients
of the equation are real and it has a complex root α + iβ of multiplicity k, then α − iβ
must be another complex root of the equation with multiplicity k.
Let

a0 xn + a1 xn−1 + · · · + an−1 x + an = 0, (2.2)

be a polynomial equation, where a0 , a1 , . . . , an are real coefficients. Also, let A =


1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

max{|a1 |, |a2 |, . . . , |an |} and B = max{|a0 |, |a1 |, . . . , |an−1 |}. Then the magnitude of a
1 A
root of the equation (2.2) lies between and 1 + .
1 + B/|an | |a0 |

The other methods are also available to find the upper bound of the positive roots of
the polynomial equation. Two such results are stated below:
Theorem 2.1 (Lagrange’s). If the coefficients of the polynomial

a0 xn + a1 xn−1 + · · · + an−1 x + an = 0

satisfy the conditions a0 > 0, a1 , a2 , . . . , am−1 ≥ 0, am < 0, for some m ≤ n, then the
p
upper bound of the positive roots of the equation is 1 + m B/a0 , where B is the greatest
of the absolute values of the negative coefficients of the polynomial.

Theorem 2.2 (Newton’s). If for x = c the polynomial

f (x) = a0 xn + a1 xn−1 + · · · + an−1 x + an

and its derivatives f 0 (x), f 00 (x), . . . assume positive values then c is the upper bound of
the positive roots of the equation f (x) = 0.

In the following sections, two iteration methods, viz. Birge-Vieta and Bairstow meth-
ods are discussed to find all the roots of a polynomial equation of degree n.

2.2 Birge-Vieta method

In Module 1 of this chapter, Newton-Raphson method is described to find a root of


an algebraic and transcendental equations. The rate of convergence of this method is
quadratic. So, Newton-Raphson method can be used to find a root of a polynomial
equation as polynomial equation is an algebraic equation. Birge-Vieta method is based
on the Newton-Raphson method.
Let ξ be a root of the polynomial equation Pn (x) = 0 (Pn (x) is a polynomial in x of
degree n). Then (x − ξ) is a factor of the polynomial Pn (x). Thus, the problem is to
find the root ξ. This root can be determined by Newton-Raphson method. Then Pn (x)
is divided by the factor (x − ξ) and obtained the quotient Qn−1 (x) of degree n − 1.
2
......................................................................................

Let the polynomial equation be

Pn (x) = xn + a1 xn−1 + a2 xn−2 + · · · + an−1 x + an = 0. (2.3)

Assume that Qn−1 (x) and R be the quotient and remainder when Pn (x) is divided
by the factor (x − ξ). Here, Qn−1 (x) is a polynomial of degree (n − 1), so it can be
written as

Qn−1 (x) = xn−1 + b1 xn−2 + b2 xn−3 + · · · + bn−2 x + bn−1 . (2.4)

Thus,
Pn (x) = (x − ξ)Qn−1 (x) + R. (2.5)

If ξ is an exact root of the equation Pn (x) = 0, then R must be zero. Thus, the value
of R depends on the accuracy of ξ. The Newton-Raphson method or any other method
be used to find the value of ξ starting from an initial guess x0 such that

R(ξ) = Pn (ξ) = 0. (2.6)

The Newton-Raphson iteration scheme for the equation Pn (x) = 0 is

Pn (xk )
xk+1 = xk − , k = 0, 1, 2, . . . . (2.7)
Pn0 (xk )
This method determines the approximate value of ξ, so for this ξ, R is not exactly 0,
but it is a small number.
Since Pn (x) is a polynomial, so it is differentiable everywhere. Also, the values of
Pn (xk ) and Pn0 (xk ) can be determined by synthetic division or any other method. To
find the polynomial Qn−1 (x) and R, comparing the coefficient of like powers of x on
both sides of the equation (2.5). Thus we get the following equations.

a1 = b1 − ξ b 1 = a1 + ξ
a2 = b2 − ξb1 b2 = a2 + ξb1
.. ..
. .
ak = bk − ξbk−1 bk = ak + ξbk−1
.. ..
. .
an = R − ξbn−1 R = an + ξbn−1
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

From equation (2.5),

Pn (ξ) = R = bn (say). (2.8)

Thus,

bk = ak + ξbk−1 , k = 1, 2, . . . , n, with b0 = 1. (2.9)

Therefore, bn is the value of Pn .


To determine the value of Pn0 , the equation (2.5) is differentiated with respect to x,
i.e.

Pn0 (x) = (x − ξ)Q0n−1 (x) + Qn−1 (x).

That is,

Pn0 (ξ) = Qn−1 (ξ) = ξ n−1 + b1 ξ n−2 + · · · + bn−2 ξ + bn−1 . (2.10)

Again,

Pn0 (xi ) = xn−1


i + b1 xn−2
i + · · · + bn−2 xi + bn−1 . (2.11)

Thus, the evaluation of Pn0 (x) is same as Pn (x). Differentiating (2.9) with respect to
ξ, we get

dbk dbk−1
= bk−1 + ξ . (2.12)
dξ dξ

We denote
dbk
= ck−1 . (2.13)

Then the equation (2.12) reduces to

ck−1 = bk−1 + ξck−2

Therefore, the recurrence relation of ck is

ck = bk + ξck−1 , k = 1, 2, . . . , n − 1. (2.14)
4
......................................................................................

Now, from equation (2.8), we have

dR dbn
Pn0 (ξ) = = = cn−1 [using (2.13)].
dξ dξ

Hence, the iteration scheme (2.7) becomes

bn
xk+1 = xk − , k = 0, 1, 2, . . . . (2.15)
cn−1

This method is known as Birge-Vieta method.


The values of bk and ck are generally written in a tabular form shown in Table 2.1.

x0 1 a1 a2 · · · an−2 an−1 an
x0 x0 b1 · · · x0 bn−3 x0 bn−2 x0 bn−1
x0 1 b1 b2 · · · bn−2 bn−1 bn = R
x0 x0 c1 · · · x0 cn−3 x0 cn−2
1 c1 c2 · · · cn−2 cn−1 = Pn0 (x0 )
Table 2.1: Tabular form of b’s and c’s.

Example 2.1 Find all the roots of the polynomial equation x4 +x3 −8x2 −11x−3 = 0.

Solution. Let P4 (x) = x4 + x3 − 8x2 − 11x − 3 be the given polynomial. Also, let the
initial guess be x0 = −0.5.
First iteration for first root

–0.5 1 1 –8 –11 –3
–0.500000 –0.250000 4.125000 3.437500
–0.5 1 0.500000 –8.250000 –6.875000 0.437500 =b4 = P4 (x0 )
–0.500000 –0.000000 4.125000
1 0.000000 –8.250000 –2.750000=c3 = P40 (x0 )

Therefore,
b4 0.437500
x1 = x0 − = −0.500000 − = −0.340909.
c3 −2.750000
This is the first iterated value.

5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

Second iteration for first root

–0.340909 1 1 –8 –11 –3
–0.340909 –0.224690 2.803872 2.794135
–0.340909 1 0.659091 –8.224690 –8.196128 –0.205865=b4
–0.340909 –0.108471 2.840850
1 0.318182 –8.333161 –5.355278=c3

Then the second iterated root is


b4 −0.205865
x2 = x1 − = −0.340909 − = −0.379351.
c3 −5.355278
Third iteration for first root

–0.379351 1 1 –8 –11 –3
–0.379351 –0.235444 3.124121 2.987720
–0.379351 1 0.620649 –8.235444 –7.875879 –0.012280=b4
–0.379351 –0.091537 3.158846
1
0.241299 –8.326981 –4.717033=c3
b4 −0.012280
Therefore, x3 = x2 − = −0.379351 − = −0.381954.
c3 −4.717033

Fourth iteration for first root

–0.381954 1 1 –8 –11 –3
–0.381954 –0.236065 3.145798 2.999944
–0.381954 1 0.618046 –8.236065 –7.854202 –0.000056=b4
–0.381954 –0.090176 3.180241
1
0.236092 –8.326241 –4.673960=c3
b4 −0.000056
Then x4 = x3 − = −0.381954 − = −0.381966.
c3 −4.673960

Therefore, one root of the equation is −0.38197.


The reduce polynomial is x3 + 0.618034x2 − 8.236068x − 7.854102 = 0 (obtained from
the third row of the above table).
First iteration for second root
Let x0 = 1.0.
6
......................................................................................

1 1 0.618034 –8.236068 –7.854102


1.000000 1.618030 –6.618040
1 1 1.618030 –6.618040 –14.472139=b3
1.000000 2.618030
1 2.618030 –4.000010= c2

Therefore,
b3 −14.472139
x1 = x0 − = 1.000000 − = −2.618026.
c2 −4.000010
Second iteration for second root

–2.618026 1 0.618034 –8.236068 –7.854102


–2.618026 5.236043 7.854150
–2.618026 1 –1.999996 –3.000027 0.000050=b3
–2.618026 12.090104
1 –4.618022 9.090076= c2

The second iterated value is


b3 0.000050
x2 = x1 − = −2.618026 − = −2.618032.
c2 9.090076
It is seen that x = −2.61803 is another root. The next reduced equation is

x2 − 2.00000x − 3.00003 = 0.

Roots of this equation are x = 3.00001, 1.00000.


Hence, the roots of the given equation are −0.38197, −2.61803, 3.00001, 1.00000.

Note 2.1 The Birga-Vieta method is used to find all real roots of a polynomial equation.
But, the current form of this method is not applicable to find the complex roots. After
modification, this method may be used to find all roots (real or complex) of a polynomial
equation. Since the method is based on Newton-Raphson method, the rate of convergent
of this method is quadratic, as the rate of convergent of Newton-Raphson method is
quadratic.

7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

2.3 Bairstow method

This method is also an iterative method. In this method, a quadratic factor is ex-
tracted from the polynomial Pn (x) by iteration. As a by product the deflated polynomial
(the polynomial obtained by dividing Pn (x) by the quadratic factor) is also obtained. It
is well known that the determination of roots (real or complex) of a quadratic equation
is easy. Therefore, by extracting all quadratic factors one can determine all the roots of
a polynomial equation. This is the basic principle of Bairstow method.
Let the polynomial Pn (x) of degree n be

xn + a1 xn−1 + a2 xn−2 + · · · + an−1 x + an . (2.16)

Let x2 + px + q be a factor of the polynomial Pn (x), n > 2. When this polynomial is


divided by the factor x2 + px + q, then the quotient is a polynomial of degree (n − 2)
and remainder is a linear polynomial. Let the quotient and the remainder be denoted
by Qn−2 (x) and M x + N , where M and N are two constants.
Using this notation, Pn (x) can be written as
Pn (x) = (x2 + px + q)Qn−2 (x) + M x + N. (2.17)

The polynomial Qn−2 (x) is called deflated polynomial and let it be

Qn−2 (x) = xn−2 + b1 xn−3 + · · · + bn−3 x + bn−2 . (2.18)

It is obvious that the values of M and N depends on p and q. If x2 + px + q is an


exact factor of Pn (x), then the remainder M x + N , i.e. M and N must be zero. Thus
the main aim of Bairstow method is to find the values of p and q such that
M (p, q) = 0 and N (p, q) = 0. (2.19)

These are two non-linear equations in p and q and these equations can be solved by
Newton-Raphson method for two variables (discussed in Module 3 of this chapter).
Let (pT , qT ) be the exact values of p and q and ∆p, ∆q be the (errors) corrections to
p and q. Therefore,

pT = p + ∆p and qT = q + ∆q.
8
......................................................................................

Hence,

M (pT , qT ) = M (p + ∆p, q + ∆q) = 0 and N (pT , qT ) = N (p + ∆p, q + ∆q) = 0.

By Taylor’s series expansion, we get


∂M ∂M
M (p + ∆p, q + ∆q) = M (p, q) + ∆p + ∆q + ··· = 0
∂p ∂q
∂N ∂N
and N (p + ∆p, q + ∆q) = N (p, q) + ∆p + ∆q + · · · = 0.
∂p ∂q
All the derivatives are evaluated at the approximate value (p, q) of (pT , qT ). Neglect-
ing square and higher powers of ∆p and ∆q, as they are small, the above equations
become

∆pMp + ∆qMq = −M (2.20)


∆pNp + ∆qNq = −N. (2.21)

Therefore, the values of ∆p and ∆q are obtained by the formulae


M Nq − N Mq N Mp − M Np
∆p = − , ∆q = − . (2.22)
Mp Nq − Mq Np Mp Nq − Mq Np
It is expected that in this stage the values of ∆p and ∆q are either 0 or very small.
Now, the coefficients of the deflated polynomial Qn−2 (x) and the expressions for M
and N in terms of p and q are computed below.
From equation (2.17)

xn + a1 xn−1 + a2 xn−2 + · · · + an−1 x + an


= (x2 + px + q)(xn−2 + b1 xn−3 + · · · + bn−3 x + bn−2 ) + M x + N. (2.23)

Comparing both sides, we get

a1 = b1 + p b1 = a1 − p
a2 = b2 + pb1 + q b2 = a2 − pb1 − q
.. ..
. .
ak = bk + pbk−1 + qbk−2 bk = ak − pbk−1 − qbk−2 (2.24)
.. ..
. .
an−1 = M + pbn−2 + qbn−3 M = an−1 − pbn−2 − qbn−3
an = N + qbn−2 N = an − qbn−2 .
9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

In general,

bk = ak − pbk−1 − qbk−2 , k = 1, 2, . . . , n. (2.25)

The values of b0 and b−1 are taken as 1 and 0 respectively.


With this notation, the expressions for M and N are
M = bn−1 , N = bn + pbn−1 . (2.26)

Note that M and N depend on b’s. Differentiating the equation (2.25) with respect
to p and q to find the partial derivatives of M and N .
∂bk ∂bk−1 ∂bk−2 ∂b0 ∂b−1
= −bk−1 − p −q , = =0 (2.27)
∂p ∂p ∂p ∂p ∂p
∂bk ∂bk−1 ∂bk−2 ∂b0 ∂b−1
= −bk−2 − p −q , = =0 (2.28)
∂q ∂q ∂q ∂q ∂q
For simplification, we denote

∂bk
= −ck−1 , k = 1, 2, . . . , n (2.29)
∂p
∂bk
and = −ck−2 . (2.30)
∂q
With this notation, the equation (2.27) simplifies as
ck−1 = bk−1 − pck−2 − qck−3 . (2.31)

Also, the equations (2.28) becomes


ck−2 = bk−2 − pck−3 − qck−4 . (2.32)

Hence, the recurrence relation for ck is

ck = bk − pck−1 − qck−2 , k = 1, 2, . . . , n − 1 and c0 = 1, c−1 = 0. (2.33)

Therefore,
∂bn−1
Mp = = −cn−2
∂p
∂bn ∂bn−1
Np = +p + bn−1 = bn−1 − cn−1 − pcn−2
∂p ∂p
∂bn−1
Mq = = −cn−3
∂q
∂bn ∂bn−1
Nq = +p = −(cn−2 + pcn−3 ).
∂q ∂q
10
......................................................................................

From the equation (2.22), the explicit expressions for ∆p and ∆q, are obtained as
follows:
bn cn−3 − bn−1 cn−2
∆p = −
c2n−2
− cn−3 (cn−1 − bn−1 )
bn−1 (cn−1 − bn−1 ) − bn cn−2
∆q = − 2 . (2.34)
cn−2 − cn−3 (cn−1 − bn−1 )
Therefore, the improved values of p and q are p + ∆p and q + ∆q. Thus if p0 , q0 be
the initial guesses of p and q, then the first approximate values of p and q are

p1 = p0 + ∆p and q1 = q0 + ∆q. (2.35)

Table 2.2 is helpful to calculate the values of bk ’s and ck ’s, where p0 and q0 are taken
as initial values of p and q.

1 a1 a2 ··· ak ··· an−1 an


−p0 −p0 −p0 b1 ··· −p0 bk−1 ··· −p0 bn−2 −p0 bn−1
−q0 −q0 ··· −q0 bk−2 ··· −q0 bn−3 −q0 bn−2
1 b1 b2 ··· bk ··· bn−1 bn
−p0 −p0 −p0 c1 ··· −p0 ck−1 ··· −p0 cn−2
−q0 −q0 ··· −q0 ck−2 ··· −q0 cn−3
1 c1 c2 ··· ck ··· cn−1
Table 2.2: Tabular form of b’s and c’s.

The second approximate values p2 , q2 of p and q are determined from the equations:
p2 = p1 + ∆p, q2 = q1 + ∆q.
In general,
pk+1 = pk + ∆p, qk+1 = qk + ∆q, (2.36)

the values of ∆p and ∆q are calculated at p = pk and q = qk .


The iteration process to find the values of p and q will be terminated when both |∆p|
and |∆q| are very small.
The next quadratic factor can be obtained by similar process from the deflated poly-
nomial Qn−2 (x).
The values of ∆p and ∆q are obtained by applying Newton-Raphson method for two
variables case. Also, the rate of convergence of Newton-Raphson method is quadratic.
Hence, the rate of convergence of this method is quadratic.

11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

Example 2.2 Extract all the quadratic factors from the equation x4 + 2x3 + 3x2 + 4x +
1 = 0 by using Bairstow method and hence solve this equation.

Solution. Let the initial guess of p and q be p0 = 0.5 and q0 = 0.5.

First iteration

1 2.000000 3.000000 4.000000 1.000000


−0.500000 −0.500000 −0.750000 −0.875000 −1.187500
−0.500000 −0.500000 −0.750000 −0.875000
1 1.500000 1.750000 2.375000 −1.062500
−0.500000 −0.500000 −0.500000 −0.375000
−0.500000 −0.500000 −0.500000
1 1.000000 0.750000 1.500000
= c1 = c2 = c3

b4 c1 − b3 c2 b3 (c3 − b3 ) − b4 c2
∆p = − = 1.978261, ∆q = − = 0.891304
c22 − c1 (c3 − b3 ) c22 − c1 (c3 − b3 )

Therefore, p1 = p0 + ∆p = 2.478261, q1 = q0 + ∆q = 1.391304.

Second iteration

1.00000 4.00000 −7.00000 −22.00000 24.00000


−1.38095 −1.38095 −3.61678 11.11025 5.73794
2.57143 2.57143 6.73469 −20.68805
1.00000 2.61905 −8.04535 −4.15506 9.04989
−1.38095 −1.38095 −1.70975 9.92031
2.57143 2.57143 3.18367
1.00000 1.23810 −7.18367 8.94893

∆p = 0.52695, ∆q = −0.29857.
p2 = p1 + ∆p = 1.90790, q2 = q1 + ∆q = −2.86999.
12
......................................................................................

Third iteration

1 2.000000 3.000000 4.000000 1.000000


−2.478261 −2.478261 1.185256 −6.924140 5.597732
−1.391304 −1.391304 0.665407 −3.887237
1 −0.478261 2.793951 −2.258734 2.710495
−2.478261 −2.478261 7.327033 −21.634426
−1.391304 −1.391304 4.113422
1 −2.956522 8.729680 −19.779737

∆p = −0.479568, ∆q = −0.652031.
p3 = p2 + ∆p = 1.998693, q3 = q2 + ∆q = 0.739273.

Fourth iteration

1 2.000000 3.000000 4.000000 1.000000


−1.998693 −1.998693 −0.002613 −4.513276 1.027812
−0.739273 −0.739273 −0.000967 −1.669363
1 0.001307 2.258114 −0.514242 0.358449
−1.998693 −1.998693 3.992159 −11.014794
−0.739273 −0.739273 1.476613
1 −1.997385 5.511000 −10.052423

∆p = −0.187110, ∆q = −0.258799.
p4 = p3 + ∆p = 1.811583, q4 = q3 + ∆q = 0.480474.

Fifth iteration

1 2.000000 3.000000 4.000000 1.000000


−1.811583 −1.811583 −0.341334 −3.945975 0.066131
−0.480474 −0.480474 −0.090530 −1.046566
1 0.188417 2.178192 −0.036504 0.019565
−1.811583 −1.811583 2.940498 −8.402511
−0.480474 −0.480474 0.779889
1 −1.623165 4.638216 −7.659126

∆p = −0.015050, ∆q = −0.020515.
p5 = p4 + ∆p = 1.796533, q5 = q4 + ∆q = 0.459960.
13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roots of a Polynomial Equation

Sixth iteration

1 2.000000 3.000000 4.000000 1.000000


−1.796533 −1.796533 −0.365535 −3.906570 0.000282
−0.459960 −0.459960 −0.093587 −1.000184
1 0.203467 2.174505 −0.000157 0.000098
−1.796533 −1.796533 2.861996 −8.221908
−0.459960 −0.459960 0.732746
1 −1.593066 4.576541 −7.489319

∆p = −0.000062, ∆q = −0.000081.
p6 = p5 + ∆p = 1.796471, q6 = q5 + ∆q = 0.459879.
Note that, ∆p and ∆q are correct up to four decimal places. Thus p = 1.7965, q =
0.4599 correct up to four decimal places.
Therefore, a quadratic factor is x2 + 1.7965x + 0.4599 and the deflated polynomial is
Q2 (x) = P4 (x)/(x2 + 1.7965x + 0.4599) = x2 + 0.2035x + 2.1745.
Thus, P4 (x) = (x2 + 1.7965x + 0.4599)(x2 + 0.2035x + 2.1745).
Hence, the roots of the given equation are
−0.309212, −1.487258, (−0.1018, 1.4711), (−0.1018, −1.4711).

14
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]

1
.

Chapter 5

Solution of System of Linear Equations

Module No. 1

Matrix Inverse Method


......................................................................................

The system of linear and non-linear equations occur in many applications. To solve
a system of linear equations many direct and iterated methods are developed. The old
and trivial methods are Cramer’s rule and matrix inverse method. But, these meth-
ods depend on evaluation of determinant and computation of inverse of the coefficient
matrix. Few methods are available to evaluate a determinant, among them pivoting
method is most efficient and applicable for all type of determinants. In this module,
pivoting method is discussed to evaluate a determinant and inverse of the coefficient
matrix. Then, matrix inverse method is described to solve a system of linear equations.
Other direct and iteration methods are discussed in next modules.
A system of m linear equations with n variables is given by

a11 x1 + a12 x2 + · · · + a1n xn = b1


··························· ···
ai1 x1 + ai2 x2 + · · · + ain xn = bi (1.1)
··························· ···
am1 x1 + am2 x2 + · · · + amn xn = bm .

The quantities x1 , x2 , . . ., xn are the unknowns (variables) of the system and a11 ,
a12 , . . ., amn are called the coefficients and generally they are known. The numbers
b1 , b2 , . . . , bm are constant or free terms of the system.
The above system of equations (1.1) can be written as a single equation:
n
X
aij xj = bi , i = 1, 2, . . . , m. (1.2)
j=1

Also, the entire system of equations (1.1) can be written with the help of matrices as

AX = b, (1.3)

where    
b1 x1
 
a11 a12 · · · a1n
   

 a21 a22 · · · a2n 
  b2   x2 
 ..  .. 
     

 ··· ··· ··· ···    .   . 
A= ,b =   x .
 and X =  (1.4)
 

 ai1 ai2 · · · ain 

 b
 i

  i 
 .   . 

··· ··· ··· ··· 
  .  . 
 .  . 
 

am1 am2 · · · amn bm xm
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

A system of linear equations may or may not have a solution. If the system of
linear equations (1.1) has a solution then the system is called consistent otherwise it is
called inconsistent or incompatible. Again, a consistent system of linear equations
may have unique solution or multiple solutions. Finding of unique solution is easy, but
determination of multiple solutions, if exists, is a complicated problem.
To solve a system of linear equations usually three type of the elementary transfor-
mations are applied. These are discussed below.
Interchange: The order of two equations can be changed.
Scaling: Multiplication of both sides of an equation by any non-zero number.
Replacement: Addition to (subtraction from) both sides of one equation of the cor-
responding sides of another equation multiplied by any number.
If for a system, all the constant terms b1 , b2 , . . . , bm are zero, then the system is called
homogeneous system otherwise it is called the non-homogeneous system.
Two type of methods are available to solve a system of linear equations, viz. direct
method and iteration method.
Again, many direct methods are used to solve a system of equations, among them
Cramer’s rule, matrix inversion, Gauss elimination, matrix factorization, etc. are well
known.
Also, the mostly used iteration methods are Jacobi’s iteration, Gauss-Seidal’s itera-
tion, etc.
In many applications, we have to determine the value of a determinant. So an efficient
method is required for this purpose. One efficient method based on pivoting is discussed
in the following section.

1.1 Evaluation of determinant

One of the best methods to evaluate determinant is known as triangularization and it


is also known as Gauss reduction method. The main idea of this method is to convert the
given determinant (D) into a lower or upper triangular form by using only elementary
row operations. If the determinant is reduced to a triangular form (say D0 ), then the
value of D is obtained by multiplying the diagonal elements of D0 .
2
......................................................................................


a a ··· a
11 12 1n

a21 a22 · · · a2n
Let D be a determinant of order n given by
.
··· ··· ··· ···


an1 an2 · · · ann
Using the elementary row operations, D can be reduced to the following upper tri-
angular form:


a11 a12 a13 · · · a1n

0 a(1) (1) (1)

22 a23 · · · a2n
(2) (2)
D0 = 0 0 a33 · · · a3n .


··· ··· ··· ··· · · ·

(n−1)
0 0 0 ··· ann
To convert in this form lot of elementary operations are required. To convert all
the elements of the first column, except first element, to 0 the following elementary
operations are used

(1) ai1
aij = aij − a1j , for i, j = 2, 3, . . . , n.
a11
Similarly, to convert all the elements of the second column below the second element
to 0, the following operations are used.

(1)
(2) (1) ai2 (1)
aij = aij − a ,
(1) 2j
for i, j = 3, 4, . . . , n.
a22
All these elementary operations can be written as
(k−1)
(k) (k−1) aik (k−1)
aij = aij − a
(k−1) kj
; (1.5)
akk
(0)
i, j = k + 1, . . . , n; k = 1, 2, . . . , n − 1 and aij = aij , i, j = 1, 2, · · · , n.
Once D0 is available, then the value of D is given by
(1) (2)
a11 a22 a33 · · · a(n−1)
nn .

It is observed that the formula for the elementary operations is simple and easy to
programmed. The time taken by this method is O(n3 ). But, there is a serious drawback
of this formula, which is discussed below.
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

(k) (k−1)
To compute the value of aij one division is required. If akk is zero or very small
(k−1)
then the method fails. If akk is very small, then there is a chance of loosing significant
digits or data overflow. To avoid this situation the pivoting techniques are used.
A pivot is the largest magnitude element in a row or in a column or in the principal
diagonal or the leading or trailing sub-matrix of order i (2 ≤ i ≤ n).
Let us consider the following matrix to illustrate these terms:
 
0 1 0 −5
 
 1 −8 3 10 
A=  .
 9 3 −33 18 

4 −40 9 11

For this matrix 9 is the pivot for the first column, −33 is the pivot for the principal
diagonal, −40
" is the # pivot for the entire matrix and −8 is the pivot for the trailing
0 1
sub-matrix .
1 −8
If any one of the column pivot element (during elementary operation) is zero or very
small relative to other elements in that row, then we rearrange the remaining rows in
such a way that the pivot becomes non-zero or not a very small number. The method
is called pivoting. The pivoting methods are of two types, viz. partial pivoting and
complete pivoting, these are discussed below.

1.1.1 Partial pivoting

In partial pivoting method, the pivot is the largest magnitude element in a column.
In the first stage, find the first pivot which is the largest element in magnitude among
the elements of first column. If it is a11 , then there is nothing to do. If it is ai1 , then
interchange rows i and 1. Then apply the elementary row operations to make all the
elements of first column, except first element, to 0. In the next stage, the second pivot
is determined by finding the largest element in magnitude among the elements of second
column leaving first element and let it be aj2 . In this case, interchange second and jth
rows and then apply elementary row operations. This process continues for (n − 1)th
times. In general, at the ith stage, the smallest index j is chosen for which

(k) (k) (k) (k) (k)


|aij | = max{|akk |, |ak+1 k |, . . . , |ank |} = max{|aik |, i = k, k + 1, . . . , n}
4
......................................................................................

and the rows k and j are interchanged.


Complete pivoting or full pivoting

In partial pivoting, the pivot is chosen from column. But, in complete pivoting the pivot
element is the largest element (in magnitude) among all the elements of the determinant.
Let it be at the (l, m)th position for first time.
Thus, alm is the first pivot. Then interchange first row and the lth row and of
first column and mth column. In second stage, the largest element (in magnitude) is
determined among all elements leaving the first row and first column. This element is
the second pivot.
In this manner, at the kth stage, we choose l and m such that

(k) (k)
|alm | = max{|aij |, i, j = k, k + 1, . . . , n}.

Then interchange the rows k, l and columns k, m. In this case, akk is the kth pivot
element.
It is obvious that the complete pivoting is more complicated than the partial pivot-
ing. Partial pivoting is easy to program. Generally, partial pivoting is used for hand
calculation.
We have mentioned earlier that the pivoting is used to find the value of all kind of
determinants. To determine the pivot and to interchange the rows and/or columns some
additional time is required. But, for some type of determinants without pivoting one
can determine its value. Such type of determinants are stated below.

Note 1.1 If the coefficient matrix A is diagonally dominant, i.e.


n
X n
X
|aij | < |aii | or |aji | < |aii |, for i = 1, 2, . . . , n. (1.6)
j=1 j=1
j6=i j6=i

or real symmetric and positive definite then no pivoting is necessary.

Note 1.2 Every diagonally dominant matrix is non-singular.

5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

Example 1.1 Convert the determinant



1 0 3


A = −2 7 1
5 −1 6

into the upper triangular form using (i) partial pivoting, and (ii) complete pivoting and
hence determine the value of A.
Solution. (i) (Partial pivoting) The largest element in the first column is 5, present
in the third row and it is the first pivot of A. Therefore, first and third rows are
interchanged and the reduced determinant is

5 −1 6


−2 7 1 .


1 0 3

Since two rows are interchanged then the value of the determinant is to be multiplied
by −1. To maintain it a variable sign is used and in this case it’s value is sign = −1.
Now, we apply the elementary row operations to convert all elements of first column,
except first, to 0.
2 1
Adding times the first row to the second row, − times the first row to the third
5 5
2 1
row, i.e. R20 = R2 + R1 and R30 = R3 − R1 . (R2 and R20 represent the original second
5 5
row and modified second row respectively.)
The reduced determinant is
5 −1 6


0 33/5 17/5 .


0 1/5 9/5
Now, we determine the second pivot element. In this case, the pivot element is at the
(2, 2)th position, therefore no interchange is required.
−1/5 1 1
Adding =− times the second row to the third row, i.e. R30 = R3 − R2 .
33/5 33 33
The reduced determinant is
5 −1 6


0 33/5 17/5 .


0 0 56/33

6
......................................................................................

Note that this is an upper triangular determinant and hence its value is sign ×
(5)(33/5)(56/33) = −56.
(ii) (Complete pivoting) The largest element in A is 7 at position (2,2). Interchanging
first and second columns and assign sign = −1; and then interchanging first and second
rows and setting sign = −sign = 1. Then the updated determinant is

7 −2 1


0 1 3 .

−1 5 6

Adding 1
7 times the first row to the third row, i.e. using the formula R30 = R3 + 17 R1 .
The reduced determinant is
7 −2 1


0 1
3 .

0 33/7 43/7
Now, we determine the second pivot element from the submatrix
obtained
by deleting
1 3
first row and column. That is, from the trailing sub-matrix .

33/7 43/7
The second pivot is 43/7 at (3,3) position. Interchange the second and third columns
and setting sign = −sign = −1 and then interchanging
second and third rows. Then
7 1 −2


the modified determinant is 0 43/7 33/7 and sign = 1.

0 3 1
21
Now, we apply row operation as R30 = R3 − R2 and we obtain the required upper
43
7 1 −2



triangular determinant 0 43/7 33/7 .
0 0 −56/43

Hence, the value of the determinant is sign × (7)(43/7)(−56/43) = −56.


Observed that the values obtained by both the methods are same and it is expected.

Advantages and disadvantages of partial and complete pivoting

In pivoting method, the symmetry or regularity of the original matrix may be lost. It
is easily observed that the partial pivoting requires less time, as it needs less number
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

of interchanges than complete pivoting. Again, the partial pivoting needs less number
of comparison to get pivot element. A combination of partial and complete pivoting is
expected to be very effective not only for computing a determinant but also for solving
system of linear equations. The pivoting prevent the loss of significant digits.

1.2 Inverse of a matrix

Let A be a non-singular square matrix and there exists a matrix B such that AB = I.
Then B is called the inverse of A and vice-versa. The inverse of a matrix is denoted by
A−1 . Now, using some theories of matrices it can be shown that the inverse of a matrix
A is given by
adj A
A−1 = . (1.7)
|A|

The matrix adj A is called adjoint of A and defined as


 
A11 A21 · · · An1
 
 A12 A22 · · · An2 
adj A =  ··· ···
,
 ··· ···  
A1n A2n · · · Ann

where Aij being the cofactor of aij in |A|.


This is the first definition to find the inverse of a matrix.
But, this definition is not suitable for large matrix as it needs huge amount of arith-
metic calculations. In this method, we have to calculate n2 cofactors and each cofactor
is a determinant of order (n − 1) × (n − 1). It is mentioned in previous section that to
evaluate a determinant of order n, O(n3 ) arithmetic calculations are required. Thus, to
compute all cofactors, total (n3 × n2 ) = O(n5 ) arithmetic calculations are needed. This
is a huge amount of time for large matrices.
Fortunately, many efficient methods are available to find the inverse of a matrix,
among them Gauss-Jordan is most popular. In the following Gauss-Jordan method
is discussed to find the inverse of a square non-singular matrix.

8
......................................................................................

1.2.1 Gauss-Jordan method

In this method, the matrix A is augmented with a unit matrix of same size, and only
elementary row operations are applied to get the inverse of the matrix. Let the order
of the matrix A be n × n and it is augmented with the unit matrix I. This augmented
. .
matrix is denoted by [A..I]. The order of the augmented matrix [A..I] becomes n × 2n.
The augmented matrix is of the following form:
.
 
a a · · · a1n .. 1 0 ··· 0
 11 12
.

 a21 a22 · · · a2n .. 0
 
.. 1 ··· 0 
[A.I] = 
.
. (1.8)
 · · · · · · · · · · · · .. · · ·

 ··· ··· ··· 

..
a a ··· a n1 n2 . 0
nn 0 ··· 1

Now, the inverse of A is calculated in two phases. In the first phase, the first half of the
augmented matrix is converted into an upper triangular matrix by using only elementary
row operations. In the second phase, this upper triangular matrix is converted to an
identity matrix by using only row operations. All these operations are applied on the
.
augmented matrix [A..I].
. .
After second phase, the augmented matrix [A..I] is transferred to [I..A−1 ]. Thus, the
right half becomes the inverse of A. Symbolically, we can write as
 ..  Gauss − Jordan  .
−→ I..A−1 .

A.I

In explicit form, the transformation is


.. .
   
a a · · · a1n . 1 0 · · · 0 1 0 · · · 0 .. a011 a012 ··· a01n
 11 12
. .
  
 a21 a22 · · · a2n .. 0 1 · · · 0  Gauss-Jordan  0 1 · · · 0 .. a021 a022 · · · a02n 
   

.
 −→

.
.
 · · · · · · · · · · · · .. · · · · · · · · · · · ·   · · · · · · · · · · · · .. · · ·
  
   ··· ··· ···  
. .
an1 an2 · · · ann .. 0 0 · · · 1 0 0 · · · 1 .. a0n1 0 0
an2 · · · ann

9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

Example
 1.2 Use
 partial pivoting method to find the inverse of the following matrix
2 0 1
 
A =  −1 3 4 

.
4 −2 0

.
Solution. The augmented matrix [A..I] is
..
 
2 0 1 . 1 0 0
. 
.

[A..I] =  −1 3 4 .. 0 1 0  .
 
..
 
4 −2 0 . 0 0 1
Phase 1. (Reduction to upper triangular form):
In the first column 4 is the largest element, so it is the first pivot. So we interchange
first and third rows to place the pivot element 4 at the (1,1) position. Then, the above
matrix
 becomes
..

 4 −2 0 . 0 0 1
.
 −1 3 4 .. .
 
0 1 0
.
 
2 0 1 .. 1 0 0
..
 
 1 −1/2 0 .
0 0 1/4 
..  0
∼  −1 3 4 R = 1 R1

.
0 1 0   1 4
..

2 0 1 1 0 0 .
..
 
1 −1/2 0 . 0 0 1/4 
.

 0
∼  0 5/2 4 .. R = R2 + R1 ; R30 = R3 − 2R1

0 1 1/4   2
.

0 1 1 .. 1 0 −1/2
All the elements of first column, except first, become 0. Now, we convert the element
of (3,2) position to 0. For this purpose, we find the largest element (in magnitude) from
the second column leaving first element and it is 52 . Fortunately, it is at (2,2) position
and so there is no need to interchange any rows.
.
 
 1 −1/2 0 .. 0 0 1/4 
.  0
∼0 1 8/5 .. 0 2/5 R = 2 R2

1/10  2 5
.

0 1 1 .. 1 0 −1/2
10
......................................................................................

..
 
1 −1/2 0 . 0 0 1/4
..
 
 0
∼0 1  R3 = R3 − R2

8/5 . 0 2/5 1/10
..
 
0 0 −3/5 . 1 −2/5 −3/5
..
 
 1 −1/2 0 . 0 0 1/4 
..
∼  0 1 8/5 . 0 2/5 1/10  R30 = − 53 R2
 
..
 
0 0 1 . −5/3 2/3 1

Phase 2.  (Make the left half a unit matrix):


..

1 0 4/5 . 0 1/5 3/10
. 
.

[A..I] ∼  0 1 8/5 .. 0 2/5 1/10  R10 = R1 + 21 R2
 
..
 
0 0 1 . −5/3 2/3 1
..
 
 1 0 0 . 4/3 −1/3 −1/2 
.
∼  0 1 0 .. 8/3 −2/3 −3/2  R10 = R1 − 54 R3 ; R20 = R2 − 85 R3
 
..
 
0 0 1 . −5/3 2/3 1
Now, the left half becomes a unit matrix, thus the second half is the inverse of the
given matrix, and it is
 
4/3 −1/3 −1/2
 
 8/3 −2/3 −3/2  .
 
−5/3 2/3 1

Complexity of the algorithm

By analyzing each step of the method to find the inverse of a matrix A of order n × n, it
can be shown that the time complexity to compute the inverse of a non-singular matrix
is O(n3 ).

1.3 Matrix inverse method

A system of equations (1.1) can be written in the matrix form (1.3) as

Ax = b
11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Inverse Method

where A, b and x are defined in (1.4).


The solution of Ax = b is obtained from the equation

x = A−1 b (1.9)

where A−1 is the inverse of the matrix A.


Thus, the vector x can be obtained by finding inverse of A and then multiplying with
b.
Example 1.3 Solve the following system of equations by matrix inverse method
x1 + 12x2 + 3x3 − 4x4 + 6x5 = 2,
13x1 + 4x2 + 5x3 + 4x5 = 4,
5x1 + 4x2 + 3x3 + 2x4 − 2x5 = 6,
5x1 + 14x2 + 3x4 − 2x5 = 10,
−5x1 + 4x2 + 3x3 + 4x4 + 5x5 = 13.
Solution. The given equations can be written as Ax = b, where
     
1 12 3 −4 6 x1 2
     
 13 4 5 0 4  x2   4 
     
A= 5 4 3 2 −2  , x =  x3 , b =  6 .
     
     
 5 14
 0 3 −2 

x
 4


 10 
 
−5 4 3 4 5 x5 13
Using partial pivoting method, the inverse of A is obtained as
 
−0.0362 0.0788 −0.0641 0.0357 −0.0309
 
 0.0358 −0.0241 0.0068 0.0464 −0.0024 
 
−1
A =  0.0798 −0.0646 0.3333 −0.1531 0.0280 
 
 
 −0.1186 0.0473 −0.0682 0.0768 0.1079 
 
−0.0178 0.0990 −0.2150 0.0291 0.0679
 
−0.1872
 
 0.4486 
 
Thus, the solution vector is x = A−1 b =  0.7333 
 
 
 1.7136 
 
0.2430
12
......................................................................................

Hence, x1 = −0.1872, x2 = 0.4486, x3 = 0.7333, x4 = 1.7136, x5 = 0.2430, correct up


to four decimal places.

Note 1.3 It is mentioned earlier that the time to compute the inverse of an n×n matrix
is O(n3 ) and this amount of time is required to multiply two matrices of same order.
Hence, the time complexity to solve a system of linear equations containing n equations
is O(n3 ).

13
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]

1
.

Chapter 5

Solution of System of Linear Equations

Module No. 3

Methods of Matrix Factorization


......................................................................................

Let the system of linear equations be

Ax = b (3.1)

where    
b1 x1
 
a11 a12 · · · a1n
   

 a21 a22 · · · a2n 
  b2   x2 
 ..  .. 
     

 ··· ··· ··· ···   .   . 
A= ,b =   x .
 and X =  (3.2)
  

 ai1 ai2 · · · ain 

 b
 i

  i 
 .   . 

··· ··· ··· ··· 
  .  . 
 .  . 
 

am1 am2 · · · amn bm xm

In the matrix factorization method, the coefficient matrix A is expressed as a product


of two or more other matrices. By finding the factors of the coefficient matrix, some
methods are adapted to solve a system of linear equations with less computational
time. In this module, LU decomposition method is discussed to solve a system of linear
equations. In this method, the coefficient matrix A is written as a product of two
matrices L and U, where the first matrix is a lower triangular matrix and second one
is an upper triangular matrix.

3.1 LU decomposition method

LU decomposition method is also known as matrix factorization or Crout’s reduction


method.
Let the coefficient matrix A be written as A = LU, where L and U are the lower
and upper triangular matrices respectively.
Unfortunately, this factorization is not possible for all matrices. Such factorization is
possible and it is unique if all the principal minors of A are non-singular, i.e.

a11 a12 a13

a a
11 12
a11 6= 0, 6= 0, a21 a22 a23 6= 0, · · · , |A| 6= 0 (3.3)
a21 a22
a31 a32 a33

Since the matrices L and U are lower and upper triangular, so these matrices can be
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

written in the following form:


   
l11 0 0 ··· 0 u11 u12 u13 · · · u1n

0 ··· 0 
 
0 u22 u23 · · · u2n 

 l21 l22 
   
L =  l31 l32
 l33 · · · 0  and U = 

 0 0 u33 · · · u3n  . (3.4)
 . . .. .. ..   .. .. .. .. .. 
 . .
 . . . . .  . . . . . 
  

ln1 ln2 ln3 · · · lnn 0 0 0 · · · unn
If the factorization is possible, then the equation Ax = b can be expressed as

LUx = b. (3.5)

Let Ux = z, then the equation (3.5) reduces to Lz = b, where z = (z1 , z2 , . . . , zn )t is


an unknown vector. Thus, the equation (3.2) is decomposed into two systems of linear
equations. Note that these systems are easy to solve.
The equation Lz = b in explicit form is

l11 z1 = b1
l21 z1 + l22 z2 = b2
l31 z1 + l32 z2 + l33 z3 = b3 (3.6)
···································· ··· ···
ln1 z1 + ln2 z2 + ln3 z3 + · · · + lnn zn = bn .
This system of equations can be solved by forward substitution, i.e. the value of z1
is obtained from first equation and using this value, z2 can be determined from second
equation and so on. From last equation we can determine the value of zn , as in this
stage the values of the variables z1 , z2 , . . . , zn−1 are available.
By finding the values of z, one can solve the equation Ux = z. In explicit form, this
system is

u11 x1 + u12 x2 + u13 x3 + · · · + u1n xn = z1


u22 x2 + u23 x3 · · · + z2n xn = z2
u33 x3 + u23 x3 · · · + u3n xn = z3
(3.7)
························ ··· ···
un−1n−1 xn−1 + un−1n xn = zn−1
unn xn = zn .
2
......................................................................................

Observed that the value of the last variable xn can be determined from the last
equation. Using this value one can compute the value of xn−1 from the last but one
equation, and so on. Lastly, from the first equation we can find the value of the variable
x1 , as in this stage all other variables are already known. This process is called the
backward substitution.
Thus, the outline to solve the system of equations Ax = b is given. But, the compli-
cated step is to determine the matrices L and U. The matrices L and U are obtained
from the relation A = LU. Note that, this matrix equation gives n2 equations contain-
ing lij and uij for i, j = 1, 2, . . . , n. But, the number of elements of the matrices L and
U are n(n + 1)/2 + n(n + 1)/2 = n2 + n. So, n additional equations/conditions are
required to find L and U completely. Such conditions are discussed below.
When uii = 1, for i = 1, 2, . . . , n, then the method is known as Crout’s decom-
position method. When lii = 1, for i = 1, 2, . . . , n then the method is known as
Doolittle’s method for decomposition. In particular, when lii = uii for i = 1, 2, . . . , n
then the corresponding method is called Cholesky’s decomposition method.

3.1.1 Computation of L and U

In this section, it is assumed that uii = 1 for i = 1, 2, . . . , n. Now, the equation


LU = A becomes
 
l11 l11 u12 l11 u13 · · · l11 u1n

· · · l21 u1n + l22 u2n

 l21 l21 u12 + l22 l21 u13 + l22 u23 
 
l l u +l
 31 31 12 32 l31 u13 + l32 u23 + l33 · · · l31 u1n + l32 + u2n + l33 u3n 

. . .. .. .. 
. ..
. . . .


ln1 ln1 u12 + ln2 ln1 u13 + ln2 u23 + ln3 · · · ln1 u1n + ln2 u2n + · · · + lnn
 
a11 a12 a13 · · · a1n
 
 a21 a22 a23 · · · a2n 
= .
 · · · · · · · · · · · · · · · 

a31 a32 a33 · · · ann

From first row and first column, we have


a1j
li1 = ai1 , i = 1, 2, . . . , n and u1j = , j = 2, 3, . . . , n.
l11
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

Similarly, from second column and second row we get the following equations.

li2 = ai2 − li1 u12 , for i = 2, 3, . . . , n,


a2j − l21 u1j
u2j = for j = 3, 4, . . . , n.
l22
Solving these equations we obtained the second column of L and second row of U.
In general, the elements of the matrix L, i.e. lij and the elements of U, i.e. uij are
determined from the following equations.
j−1
X
lij = aij − lik ukj , i ≥ j (3.8)
k=1
i−1
P
aij − lik ukj
k=1
uij = , i<j (3.9)
lii
uii = 1, lij = 0, j > i and uij = 0, i > j.

The matrix equations Lz = b and Ux = z can also be solved by finding the inverses
of L and U as

z = L−1 b (3.10)
and x = U−1 z. (3.11)

But, the process is time consuming, because finding of inverse takes much time.
It may be noted that the time to find the inverse of a triangular matrix is less than
an arbitrary matrix.
The inverse of A can also be determined from the relation

A−1 = U−1 L−1 . (3.12)

Few properties of triangular matrices

Let L = [lij ] and U = [uij ] be the lower and upper triangular matrices.

• The determinant of a triangular matrix is the product of the diagonal elements.

• Product of two lower (upper) triangular matrices is a lower (upper) triangular


matrix.
4
......................................................................................

• Square of a lower (upper) triangular matrix is a lower (upper) triangular matrix.

• The inverse of lower (upper) triangular matrix is also a lower (upper) triangular
matrix.

• Since A = LU, |A| = |L||U|.

Let us illustrate the LU decomposition method.

Example 3.1 Let


 
2 −3 1
 
A=
1 .
2 −3 
4 −1 −2

Express A as A = LU, where L and U are lower and upper triangular matrices and
hence solve the system of equations 2x1 − 3x2 + x3 = 1, x1 + 2x2 − 3x3 = 4, 4x1 −
x2 − 2x3 = 8.
Also, determine L−1 , U−1 , A−1 and |A|.

Solution. Let
 
2 −3 1
 
 1 2 −3 
 
4 −1 −2
    
l11 0 0 1 u12 u13 l l u l11 u13
    11 11 12 
= l21 l22 0   0 1 u23  =  l21 l21 u12 + l22
   l21 u13 + l22 u23 .

l31 l32 l33 0 0 1 l31 l31 u12 + l32 l31 u13 + l32 u23 + l33
To find the values of lij and uij , comparing both sides and we obtained
l11 = 2, l21 = 1 l31 = 4
l11 u12 = −3 or, u12 = −3/2
l11 u13 = 1 or, u13 = 1/2
l21 u12 + l22 = 2 or, l22 = 7/2
l31 u12 + l32 = −1 or, l32 = 7
l21 u13 + l22 u23 = −3 or, u23 = −1
l31 u13 + l32 u23 + l33 = −2 or, l33 = 1.
5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

Hence L and U are given by


   
2 0 0 1 −3/2 1/2
   
L=  1 7/2 0 ,
 U=
0 .
1 −1 
4 5 1 0 0 1

The given equations can


 be writtenas Ax = b, where
  
2 −3 1 x1 1
     
A =  1 2 −3  , x =  x2  , b =  4 
    
.
4 −1 −2 x3 8
Let A = LU. Then, LUx = b. Let Ux = z. Then, the given equation reduces to
Lz = b.
First we consider the equation Lz = b. Then
    
2 0 0 z 1
  1   
  z2  =  4  .
1 7/2 0     

4 5 1 z3 8

In explicit form these equations are

2z1 = 1,
z1 + 7/2z2 = 4,
4z1 + 5z2 + z3 = 8.

The solution of the above equations is z1 = 1/2, z2 = 1, z3 = 1.


Therefore, z = (1/2, 1, 1)t .
Now, we solve the equation Ux = z, i.e.
    
1 −3/2 1/2 x 1/2
  1   
0 1 −1   =  1 .
  x2

   
0 0 1 x3 1

In explicit form, the equations are

x1 − (3/2)x2 + (1/2)x3 = 1/2


x2 − x3 = 1
x3 = 1.
6
......................................................................................

The solution is x3 = 1, x2 = 1 + 1 = 2, x1 = 1/2 + (3/2)x2 − (1/2)x3 = 3, i.e.


x1 = 3, x2 = 2, x3 = 1.
Third Part. Gauss-Jordan method is used to find L−1 . Augmented matrix is

.
 
2 0 0 .. 1 0 0 
 ..  .
L.I = 0 .. 0 1 0 
 
1 7/2
..
 
4 5 1 . 0 0 1
.
 
1 0 0 .. 1/2 0 0 
.  0 1
∼ 0 .. R1 = R1

1 7/2 0 1 0 2
.
 
4 5 1 .. 0 0 1
.
 
1 0 0 .. 1/2 0 0 
.  0
∼ 0 .. −1/2 1 0  R = R2 − R1 , R30 = R3 − 4R1

0 7/2  2
.

0 5 1 .. −2 0 1
..
 
1 0 0 . 1/2 0 0
2
..  0
∼ R1 = R2

0 1 0 . −1/7 2/7 0  7
..
 
0 5 1 . −2 0 1
..
 
1 0 0 . 1/2 0 0
..  0
∼ R = R3 − 5R2 .

0 1 0 . −1/7 2/7 0 
 3
..

0 0 1 . −9/7 −10/7 1

 
1/2 0 0
Thus, L−1 = 
 
 −1/7 2/7 0  .

−9/7 −10/7 1
Using same process, one can determine U−1 . But, here another method is used
to determine U−1 . We know that the inverse of an upper triangular matrix is upper
triangular.  
1 b12 b13
Therefore, let U−1 = 
 
0 1 b23
.
 
0 0 1
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

From the identity U−1 U = I, we have


    
1 b12 b13 1 −3/2 1/2 1 0 0
    
 0 1 b23   0 1 −1  =  0 1 0 .
    
0 0 1 0 0 1 0 0 1
   
1 −3/2 + b12 1/2 − b12 + b13 1 0 0
   
This gives, 0
 1 −1 + b23  =  0 1 0 .
  
0 0 1 0 0 1
Comparing both sides
−3/2 + b12 = 0 or, b12 = 3/2, 1/2 − b12 + b13 = 0 or, b13 = 1
−1 + b23 = 0 or, b23 = 1
Thus,  
1 3/2 1
U−1 = 
 
0 1 .
1
0 0 1
Now,
  
1 3/2 1 1/2 0 0
A−1 = U−1 L−1 = 
  
0 1   −1/7
1 2/7 0


0 0 1 −9/7 −10/7 1
 
−1 −1 1
 
=  −10/7 −8/7

.
1
−9/7 −10/7 1
 
Last Part. |A| = |L||U| = 2 × (7/2) × 1 × 1 = 7.

3.2 Cholesky method

The Cholesky method is used to solve a system of linear equations Ax = b if the


coefficient matrix A is symmetric and positive definite. This method is also known as
square-root method.
Let A be a symmetric matrix, then A can be written as a product of lower triangular
matrix and its transpose. That is,
8
......................................................................................

A = LLt , (3.13)

where L = [lij ], lij = 0, i < j, a lower triangular matrix, LT is the transpose of the
matrix L.
Again, the matrix A can be written as

A = UUt , (3.14)

where U is an upper triangular matrix.


Using (3.13), the equation Ax = b becomes

LLt x = b. (3.15)

Let
Lt x = z, (3.16)

then
Lz = b. (3.17)

Using forward substitution one can easily solve the equation (3.17) to obtained the
vector z. Then by solving the equation Lt x = z using back substitution, we obtained
the vector x.
Alternately, the values of z and then x can be determined from the following equa-
tions.

z = L−1 b and x = (Lt )−1 z = (L−1 )t z. (3.18)

As an intermediate result the inverse of A can be determined from the following


equation.
A−1 = (L−1 )t L−1 .

From this discussion, it is clear that the solution of the system of equations is com-
pletely depends on the matrix L. The procedure to compute the matrix L is discussed
below.

9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

3.2.1 Procedure to determine L

Since A = LLt , then

 
l11 0 0 ··· 0  
 l11 l21 · · · lj1 · · · ln1
 l21 l22 0 · · · 0  


   0 l22 · · · lj2 · · · ln2 
 ··· ··· ··· ··· ···  
A =    0 0 · · · lj3 · · · ln3 
  
 li1 li2 li3 · · · 0   
  ··· ··· ··· ··· ··· ··· 
 ··· ··· ··· ··· ··· 
  
0 0 · · · 0 · · · lnn
ln1 ln2 ln3 · · · lnn
 
2
l11 l21 l11 · · · lj1 l11 · · · ln1 l11
 2 2
 l21 l11 l21 + l22 · · · lj1 l21 + lj2 l22 · · · ln1 l21 + ln2 l22


 
 ··· ··· ··· ··· ··· ··· 
=  .
 
 li1 l11 l21 li1 + l22 li2 · · · lj1 li1 + · · · + ljj lij · · · ln1 li1 + · · · + lni lii 
 
 ··· ··· ··· ··· ··· ···
 

2 + l2 + · · · + l2
ln1 l11 l21 ln1 + l22 ln2 · · · lj1 ln1 + · · · + ljj lnj · · · ln1 n2 nn

By comparing both sides, we get the following system of equations.

l11 = (a11 )1/2


 i−1
P 1/2
2 + l2 + · · · + l2 = a
li1 or l = a − lij , i = 2, 3, . . . , n
i2 ii ii ii ii
j=1
li1 = ai1 /l11 , i = 2, 3, . . . , n
(3.19)
li1 lj1 + li2 lj2 + · · · + lij ljj = aij
 j−1 1/2
1 P
or, lij = ljj aij − ljk lik , for i = j + 1, j + 2, . . . , n
k=1
lij = 0, i < j.

Note that this system of equations gives the values of lij .


Similarly, the elements of the matrix U for the system of equations (3.14) are given
by,
10
......................................................................................

unn = (ann )1/2


uin = ain /unn , i = 1, 2, . . . , n − 1
n
 
1 P
uij = ujj aij − uik ujk ,
k=j+1
(3.20)
for i = n − 2, n − 3, . . . , 1; j = i + 1, i + 2, . . . , n − 1
 n
1/2
2
P
uii = aii − uik , i = n − 1, n − 2, . . . , 1
k=i+1
uij = 0, i > j.

This method is illustrated by considering the following example.


Example 3.2 Solve the following system of equations by Cholesky method.

4x1 + 2x2 + 6x3 = 16


2x1 + 82x2 + 39x3 = 206
6x1 + 39x2 + 26x3 = 113.

Solution. The given system of equations can be written as  


4 2 6
 
Ax = b where x = (x1 , x2 , x3 )t , b = (16, 206, 113)t , and A = 
 2 82 39  .

6 39 26
Note that the coefficient matrix A is symmetric and positive definite and hence it
can be written t
 as LL =A.
l 0 0
 11 
Let L =  l21 l22 0 

.
l31 l32 l33
   
2
l11 l11 l21 l11 l31 4 2 6
   
Therefore, LLt =  2 2
 l21 l11 l21 + l22  =  2 82 39  .
l21 l31 + l22 l32   
2 + l2 + l2
l31 l11 l31 l21 + l32 l22 l31 6 39 26
32 33
Comparing both sides, we get the following system of equations
2 = 4 or l
l11 11 = 2
l11 l21 = 2 or l21 = 1
l11 l31 = 6 or l31 = 3
2 + l2 = 82 or l 1/2 = 9
l21 22 22 = (82 − 1)

11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

1
l31 l21 + l32 l22 = 39 or l32 = l22 (39 − l31 l21 ) = 4
2
l31 + 2
l32 + 2
l33 2 − l2 )1/2 = 1.
= 26 or l33= (26 − l31 32
2 0 0
 
Therefore, L =  1 9 0 

.
3 4 1
Now, the system of equations Lz = b becomes

2z1 = 16
z1 + 9z2 = 206
3z1 + 4z2 + z3 = 113.

The solution of these equations is


z1 = 8.0, z2 = 22.0, z3 = 1.0.
Now, from the equation Lt x = z,
    
2.0 1.0 3.0 x1 8.0
    
 0.0 9.0 4.0   x2  =  22.0  .
    
0.0 0.0 1.0 x3 1.0

In explicit form the equations are 2x1 + x2 + 3x3 = 8


9x2 + 4x3 = 22
x3 = 1.

Solution of these equations is x3 = 1.0, x2 = 2.0, x1 = 1.5.


Hence, the solution is x1 = 1.5, x2 = 2.0, x3 = 1.0. This is the exact solution of
the given system of equations.

3.3 Gauss elimination method to find inverse of a matrix


 . 
The Gauss elimination method is applied to the augmented matrix A..b to solve a
system of linear equations Ax = b. Using this method one can obtain the inverse of a
 . 
matrix. To find the inverse of A, this method is applied to augmented matrix A..I .
This method converts the matrix A(= LU) to an upper triangular matrix U and the
12
......................................................................................

unit matrix I to the lower triangular matrix. This lower triangular matrix is the inverse
of L. Now, AA−1 = I reduces to LUA−1 = I, i.e.

UA−1 = L−1 . (3.21)

The left hand side of the equation (3.21) is a lower triangular matrix. Also the
matrices U and L−1 are known. Therefore, by solving the system of equations (3.21)
using substitution one can easily determine the matrix A−1 .
The following problem is consider to illustrated the method.
 
1 2 4
 
Example 3.3 Find the inverse of the matrix A =   1 −2 6  by using Gauss elimi-

2 −1 0
nation method.
 . 
Solution. The augmented matrix A..I is

..
 
1 2 4 . 1 0 0
 ..   .
A.I =  1 −2 6 .. 0 1 0 
 
..
 
2 −1 0 . 0 0 1
.
 
1 2 4 .. 1 0 0
0
R2 ← R2 − R1 .

−→  0 −4 2 .. −1
 
R30 ← R3 − 2R1 1 0
.
 
0 −5 −8 .. −2 0 1
.
 
1 2 4 .. 1 0 0
0 5
R3 ← R3 − 4 R2 .

−→  0 −4 2 ..
 
−1 1 0
..
 
  0 0 −21/2  −3/4 −5/4
.  1
1 2 4 1 0 0
−1
   
Thus, we get U = 
 0 −4 2, L = 
 −1 1 0.
0 0 −21/2 −3/4 −5/4 1
 
x x x
−1
 11 12 13 
Let A =  x21 x22 x23 

.
x31 x32 x33
13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

Since, UA−1 = L−1 ,


    
1 2 4 x11 x12 x13 1 0 0
    
  x21 x22 x23  =  −1 .
 0 −4 2    1 0

0 0 −21/2 x31 x32 x33 −3/4 −5/4 1

This equation generates the following system of linear equations.


Comparing first column, we get
x11 + 2x21 + 4x31 = 1
− 4x21 + 2x31 = −1
21
− 2 x31 = −3/4
Second column gives
x12 + 2x22 + 4x32 = 0
− 4x22 + 2x32 = 1
21
− 2 x32 = − 45
From third column we obtain
x13 + 2x23 + 4x33 = 0
− 4x23 + 2x33 = 0
21
− 2 x33 = 1.
The solution of these equations is
1 2 1
x11 = , x21 = , x31 = ,
7 7 14
2 4 5
x12 = − , x22 = − , x32 = ,
21 21 42
10 1 2
x13 = , x23 = − , x33 = − .
21 21 21 
1/7 −2/21 10/21
Therefore, A−1 = 
 
 2/7 −4/21 −1/21  .

1/14 5/42 −2/21

3.4 Matrix partition method

We generally presumed that the size of the coefficient matrix of a system is not very
large, that is, the entire matrix can be stored in the primary memory of a computer.
But, in many applications, it is seen that the size of the matrix is very large and it
14
......................................................................................

cannot be stored in the primary memory of a computer. So, for such cases the entire
matrix is divided into some matrices with lower sizes. With the help of these lower order
matrices, one can find the inverse of the given matrix. This process of division is known
as matrix partitioning method. This method is also useful when few more variables and
consequently few more equations are added to the original system.
Suppose the coefficient matrix A be partitioned as
 .. 
B . C
 
A=
 ··· ··· ··· 
 (3.22)
.
D .. E

where B is an l × l matrix, C is an l × m matrix, D is an m × l and E is an m × m


matrix; and l, m are positive integers with l + m = n.
Let A−1 be partitioned as
 .. 
P . Q
A−1 = 
 
 · · · · · · · · · 
 (3.23)
..
R . S

where the matrices P, Q, R and S are of the same orders as those of the matrices B, C, D
and E respectively. Then

 ..  ..   .. 
B . C P . Q I1 . 0
AA−1 = 
    
 ··· ··· ···  ···  =  ···
··· ···  ,
··· ···  (3.24)
 
. .. ..
D .. E R . S 0 . I2

where I1 and I2 are identity matrices of order l and m respectively.


From (3.24), we have

BP + CR = I1
BQ + CS = 0
DP + ER = 0
DQ + ES = I2 .
15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods of Matrix Factorization

From, BQ + CS = 0 we have Q = −B−1 CS, i.e. DQ = −DB−1 CS.


Again, from DQ + ES = I2 , we have (E − DB−1 C)S = I2 .
Thus, S = (E − DB−1 C)−1 .
Similarly, the other matrices are given by

Q = −B−1 CS
R = −(E − DB−1 C)−1 DB−1 = −SDB−1
P = B−1 (I1 − CR) = B−1 − B−1 CR.

Note that, we have to determine the inverse of two square matrices B and (E − DB−1 C)
of order l × l and m × m respectively.
That is, the inverse of the matrix A of order n × n depends on the inverses of two
lower order (roughly half) matrices. If the matrices B, C, D, E are still large to fit in
the computer memory, then further partition them.
Example
 3.4 Using matrix partition method find the inverse of the matrix
1 2 3
 
A =  2 −1 0 

.
0 2 4
Hence, find the solution of the system of equations

x1 + 2x2 + 3x3 = 1
2x1 − x2 = 0
2x2 + 4x3 = −1.

Solution.
 Suppose the matrix
 A be partitioned as
..
 1 2 . 3  .
 
. B .. C
 2 −1 .. 0  
 

A=  =  · · · · · · · · · .
 ··· ··· ··· ··· 
   
..

.
 D . E
0 2 .. 4
" #
1 2
The matrices B, C, D, E are given by B =
2 −1
" #
3 h i h i
C= , D= 0 2 , E= 4 .
0
16
......................................................................................

 .. 
P . Q
Then the inverse of A is given by A−1 = 
 
 · · · · · · · · ·  , the matrices P, Q, R and

.
R .. S
S are obtain from the following formulae.

S = (E − DB−1 C)−1 , R = −SDB−1 , P = B−1 − B−1 CR, Q = −B−1 CS.

Now, " # " #


−1 1 −1 −2 1 1 2
B =− = .
5 −2 1 5 2 −1
" #" #
h i1 1 2 3 8
E − DB−1 C = 4 − 0 2 = .
5 2 −1 0 5
5
S =
8 " #
5h i1 1 2 h i
R = − 0 2 = − 12 1
4
8 5 2 −1
" # " #" #
−1 −1 1 1 2 1 1 2 3 h
1
i
P = B −B CR = − 2 − 14
5 2 −1 5 2 −1 0
" #
1 1
2 4
= .
1 − 12
" #" # " #
1 1 2 3 5 − 38
Q = − =
5 2 −1 0 8 − 34
Hence, 
1 1

2 4 − 38
A−1 = 
 
 1 − 12 − 34 
.
− 12 1
4
5
8
Now, the solution of the given system of equation is
    
1 1 3 7
2 4 − 8  1 8
x = A−1 b = 
   
1 3  = 7 .
 1 − 2 − 4  0   4 
− 12 1
4
5
8 −1 − 98
7 7 9
Hence, the required solution is x1 = , x2 = , x3 = − .
8 4 8
17
Numerical Analysis
by
Dr. Anita Pal
Assistant Professor
Department of Mathematics
National Institute of Technology Durgapur
Durgapur-713209
email: [email protected]

1
.

Chapter 5

Solution of System of Linear Equations

Module No. 4

Gauss Elimination Method and Tri-diagonal Equations


......................................................................................

In this module, Gauss elimination method is discussed to solve a system of linear


equations. Also, another special type of system of equations called tri-diagonal system of
equations is also introduced here. The tri-diagonal system occurs in many applications.
A special case of LU-decomposition method is used to solve a tri-diagonal system off
equations.

4.1 Gauss elimination method


Let
a11 x1 + a12 x2 + · · · + a1n xn = b1
··························· ···
ai1 x1 + ai2 x2 + · · · + ain xn = bi (4.1)
··························· ···
an1 x1 + an2 x2 + · · · + ann xn = bn .
be a system of linear equations containing n variables and n equations.
In Gauss elimination method, the variables are eliminated from the system of equa-
tions one by one. The variable x1 is eliminated from second equation to nth equation, x2
is eliminated from third equation to nth equation, x3 is eliminated from fourth equation
to nth equation, and so on and finally, xn−1 is eliminated from nth equation. Thus, the
reduced system of linear equations becomes an upper triangular system which can be
solved by back–substitution.
Assumed that a11 6= 0. To eliminate x1 , from second, third, · · · , and nth equations
a21 a31 an1
the first equation is multiplied by − ,− , ...,− respectively and successively
a11 a11 a11
added with the second, third, · · · , nth equations. After this step the system reduces to

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1


(1) (1) (1) (1)
a22 x2 + a23 x3 + · · · + a2n xn = b2
(1) (1) (1) (1)
a32 x2 + a33 x3 + · · · + a3n xn = b3 (4.2)
··························· ··· ···
(1) (1)
an2 x2 + an3 x3 + · · · + a(1) (1)
nn xn = bn ,

where
(1) ai1
aij = aij − a1j ; i, j = 2, 3, . . . , n.
a11
1
. . . . . . . . . . . . . . . . . . . . . Gauss Elimination Method and Tri-diagonal Equations

(1)
Now, to eliminate x2 (here also assumed that a22 6= 0) from the third, forth, . . ., and
(1) (1) (1)
a32 a an2
nth equations, the second equation is multiplied by − (1) , − 42
(1)
, . . ., − (1)
respectively
a22 a22 a22
and successively added to the third, fourth, . . ., and nth equations. The reduced system
of equations becomes

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1


(1) (1) (1) (1)
a22 x2 + a23 x3 + · · · + a2n xn = b2
(2) (2) (2)
a33 x3 + · · · + a3n xn = b3 (4.3)
····················· ··· ···
(2)
an3 x3 + · · · + a(2) (2)
nn xn = bn ,

where
(1)
(2) (1) ai2 (1)
aij = aij − a ;
(1) 2j
i, j = 3, 4, . . . , n.
a22

Finally, after eliminating xn−1 , the above system of equations converted to

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1


(1) (1) (1) (1)
a22 x2 + a23 x3 + · · · + a2n xn = b2
(2) (2) (2)
a33 x3 + · · · + a3n xn = b3 (4.4)
············ ··· ···
a(n−1)
nn xn = b(n−1)
n ,

where,
(k−1)
(k) (k−1) aik (k−1)
aij = aij − (k−1)
akj ;
akk
(0)
i, j = k + 1, . . . , n; k = 1, 2, . . . , n − 1, and apq = apq ; p, q = 1, 2, . . . , n.
Note that, from last equation one can determine the value of xn easily. From last
but one equation we can determine the value of xn−1 using the value of xn obtained
from last equation. In this process, we can determine the values of all variables and this
process is known as back substitution.
2
......................................................................................

(n−1)
bn
From last equation we have, xn = (n−1)
. Using this value, we can determine the
ann
value of xn−1 from the last but one equation, and so on. Finally, first equation gives
the value of x1 .
The process to determine the values of the variables xi ’s is a back substitution because
we first determine the value of the last variable xn , but the evaluation of the elements
(k)
aij ’s is a forward substitution.

Note 4.1 In Gauss elimination method, it is assumed that the diagonal elements are
non-zero. If one these elements is zero or close to zero then the method is not applicable
to solve the system of equations though it may have a solution. In this case, the partial
or complete pivoting method must be used to find a solution or a better solution.

It is mentioned in previous module that if the system is diagonally dominant or real


symmetric and positive definite then no pivoting is necessary.

Example 4.1 Solve the equations by Gauss elimination method.


2x1 − x2 + x3 = 5, x1 + 2x2 + 3x3 = 10, x1 + 3x2 − 2x3 = 7.

Solution. Multiplying the second and third equations by 2 and subtracting them from
first equation we obtained

2x1 − x2 + x3 = 5,
−5x2 − 5x3 = −15,
−7x2 + 5x3 = −9.

Multiplying third equation by 5/7 and subtracting from second equation we get

2x1 − x2 + x3 = 5,
−5x2 − 5x3 = −15,
10 10
− x3 = − .
7 7
Observe that the value of x3 can easily be determined from the third equation and
it is x3 = 1. Using this value, from second equation we have x2 = 2. Finally, from first
equation 2x1 = 5 + 2 − 1 = 6, i.e. x1 = 3.
Hence, the solution is x1 = 3, x2 = 2, x3 = 1.
3
. . . . . . . . . . . . . . . . . . . . . Gauss Elimination Method and Tri-diagonal Equations

4.2 Gauss-Jordan elimination method


In Gauss elimination method, the coefficient matrix is transferred into an upper
triangular form and then the solution is obtained by back substitution. But, in Gauss-
Jordan method the coefficient matrix is transferred to a diagonal matrix by using row
operations and hence the values of the variables are obtained directly by comparing
each row.
Using Gauss-Jordan elimination method the system of equations (4.1) reduces to the
following form:

   
b01

1 0 ··· 0 x1
   
  b0 
x2

 0 1 ··· 0    2
 
.. =  . . (4.5)
··· ··· ··· ···
  .  .. 
   
0 0 ··· 1 xn b0n

It is obvious that the solution of the given system of equations is

x1 = b01 , x2 = b02 , . . . , xn = b0n .

Symbolically, the Gauss-Jordan method can be written as

 ..  Gauss − Jordan  . 
A.b −→ I..b0 . (4.6)

Normally, the Gauss-Jordan method is not used to solve a system of equations, as


it needs more arithmetic computations than the Gauss elimination method. But, this
method is widely used to find the inverse of a matrix.
Example 4.2 Use Gauss-Jordan elimination method to solve the following equations.

x1 + x2 + x3 = 4, 2x1 − x2 + 3x3 = 1, 3x1 + 2x2 − x3 = 1.


Solution. For this problem, the associated matrices are
     
1 1 1 x1 4
     
A =  2 −1 3  , x =  x2  and b =  1 
    
.
3 2 −1 x3 1
4
......................................................................................

 . 
The augmented matrix A..b is

..
 
1 1 1 . 4
 ..   . 

A.b =  2 −1 3 .. 1 

.
 
3 2 −1 .. 1
..
 
1 1 1 . 4  R0 = R − 2R ,
.

2 1
∼  0 −3 1 .. −7  20
 
 R3 = R3 − 3R1
.

0 −1 −4 .. −11

..
 
1 1 1. 4
1
..  0
∼  0 −3 R = R3 − R2

1. −7  3 3
..

0 0 −13/3 . −26/3
..
 
1 1 1 . 4  R0 = − 3 R ,
..

13 3
∼  0 1 −1/3 . 7/3  30
 
 R2 = − 13 R2
.

00 1 .. 2
..
 
1 0 4/3 . 5/3
.
 
∼  0 1 −1/3 .. 7/3  R10 = R1 − R2
 
.
 
00 1 .. 2
..
 
1 0 0 . −1  R0 = R − 4 R ,
.

1 3 3
∼  0 1 0 .. 3  10
 
 R2 = R2 + 13 R3
..

001. 2

Thus, the given system of equations reduces to


x1 = −1
x2 =3
x3 = 2.
Hence, the required solution is x1 = −1, x2 = 3, x3 = 2.

5
. . . . . . . . . . . . . . . . . . . . . Gauss Elimination Method and Tri-diagonal Equations

4.3 Solution of tri-diagonal systems

The tri-diagonal system of equations is a particular case of a system of linear equa-


tions. These type of equations occur in many applications, viz. cubic spline interpola-
tion, solution of boundary value problem, etc. The tri-diagonal system of equations is
of the following form

b1 x1 + c1 x2 = d1
a2 x1 + b2 x2 + c2 x3 = d2 (4.7)
a3 x2 + b3 x3 + c3 x4 = d3
·················· ···
an xn−1 + bn xn = dn .

The coefficient matrix for this system is

 
b1 c1 0 0 · · · 0 0 0 0  
 a2 b2 c2 0 · · · 0 0 d1
 
0 0 
   
 0 a b c ··· 0 0 0 0   d2 
3 3 3
A=  and d =  .  (4.8)
   
··· ··· ··· ··· ··· ··· ··· ··· ···   .. 
   
 0 0 0 0 · · · 0 an−1 bn−1 cn−1 
 
dn
0 0 0 0 ··· 0 0 an bn

This matrix has many interesting properties. Note that the main diagonal and its
two adjacent (below and upper) diagonals are non-zero and all other elements are zero.
This special matrix is called tri-diagonal matrix and the system of equations is called a
tri-diagonal system of equations. This matrix is also known as band matrix.
A tri-diagonal system of equations can be solved by the methods discussed earlier.
But, this system has some special properties. Exploring these special properties, the
system can be solved by a simple way, starting from the LU decomposition method.
Let A = LU where
6
......................................................................................

 
γ1 0 0 · · · 0 0 0
 
 β2 γ2 0 · · · 0 0 0 
 
L =  · · · · · · · · · · · · · · · · · · · · · ,
 
 
 0 0 0 ··· β
n−1 γn−1 0 


0 0 0 ··· 0 β n γn
 
1 α1 0 · · · 0 0 0
 
 0 1 α2 · · · 0 0 0 
 
and U =  · · · · · · · · · · · · · · · · · · · · ·  .
 
 
 0 0 0 ··· 0 1 α 
 n−1 
0 0 0 ··· 0 0 1
Then  
γ1 γ1 α1 0 ··· 0 0 0
 
 β2 α1 β2 + γ2 α2 γ2 · · · 0 0 0 
 
LU =  0 β3 α2 β3 + γ3 · · · 0 0 0 .
 
 
··· ··· ··· ··· ··· ··· ··· 
 
0 0 0 · · · 0 βn βn αn−1 + γn
Now, comparing both sides of the matrix equation LU = A and we obtain the
following system of equations.

γ1 = b1 , γi αi = ci , or, αi = ci /γi , i = 1, 2, . . . , n − 1
βi = ai , i = 2, . . . , n
ci−1
γi = bi − αi−1 βi = bi − ai , i = 2, 3, . . . , n.
γi−1
Hence, the elements of the matrices L and U are given by the following equations.

γ1 = b1 ,
ai ci−1
γi = bi − , i = 2, 3, . . . , n (4.9)
γi−1
βi = ai , i = 2, 3, . . . , n (4.10)
αi = ci /γi , i = 1, 2, . . . , n − 1. (4.11)

Note that, this is a very simple system of equations.


Now, the solution of the equation Ax = d where d = (d1 , d2 , . . . , dn )t can be obtained
by solving the equation Lz = d by forward substitution and then by solving the equation
Ux = z by back substitution.
7
. . . . . . . . . . . . . . . . . . . . . Gauss Elimination Method and Tri-diagonal Equations

The solution of the equation Lz = d is


d1 di − ai zi−1
z1 = , zi = , i = 2, 3, . . . , n. (4.12)
b1 γi
And the solution of the equation Ux = z is
ci
xn = zn , xi = zi − αi xi+1 = zi − xi+1 , i = n − 1, n − 2, . . . , 1. (4.13)
γi
Observe that the number of computations is linear, i.e. O(n) for n equations. Thus,
this special method needs significantly less time compare to other method to solve a
tri-diagonal equations.
Example 4.3 Solve the following tri-diagonal system of equation
x1 + 2x2 = 4, −x1 + 2x2 + 3x3 = 6, 3x2 + x3 = 8.

Solution. For this problem, b1 = 1, c1 = 2, a2 = −1, b2 = 2, c2 = 3, a3 = 3, b3 = 1,


d1 = 4, d2 = 6, d3 = 8.
Thus,

γ1 = b1 = 1
c1
γ2 = b2 − a2 = 2 − (−1).2 = 4
γ1
c2 3 5
γ3 = b3 − a3 = 1 − 3. = −
γ2 4 4
d1 d2 − a2 z1 5 d3 − a3 z2 2
z1 = = 4, z2 = = , z3 = =−
b1 γ2 2 γ3 5
2 c2 14 c1 8
x 3 = z3 = − , x2 = z2 − x3 = , x1 = z1 − x2 = − .
5 γ2 5 γ1 5
8 14 2
Therefore, the required solution is x1 = − , x2 = , x3 = − .
5 5 5
The above method is not applicable for all kinds of tri-diagonal equations. The
equations (4.12) and (4.13) are valid only if γi 6= 0 for all i = 1, 2, . . . , n. If any one of γi
is zero at any stage, then the method fails. Remember that this method is based on LU
decomposition method and LU decomposition method is applicable and gives unique
solution if all the principal minors of the coefficient matrix are non-zero. Fortunately,
a modified method is available if one or more γi are zero. The modified method is
described below.
8
......................................................................................

Without loss of generality, let us assume that γk = 0 and γi 6= 0, i = 1, 2, . . . , k − 1.


Let γk = x, x is a symbolic value of γk . The values of the other γi , i = k + 1, . . . , n
are calculated by using the equation (4.9). Using these γ’s, the values of zi and xi are
determined by the formulae (4.12) and (4.13). Note that, in general, the values of xi ’s
are depend on x. Practically, the value of x is 0. Finally, the solution is obtained by
substituting x = 0.

4.4 Evaluation of tri-diagonal determinant

For n ≥ 3, the general form of a tri-diagonal matrix T = [tij ]n×n is


 
b1 c1 0 · · · · · · · · · 0
 a2 b2 c2 · · · · · · · · · 0 
 
 
 0 a b ··· ··· ··· 0 
3 3
T=
 

··· ··· ··· ··· ··· ··· ··· 
 
0 0 0 · · · a b c
 
 n−1 n−1 n−1 
0 0 0 · · · 0 an bn

and tij = 0 for |i − j| ≥ 2.


Note that the first and last rows contain two non-zero elements and the number of
non-zero elements to all other rows is three. These elements may also be zero for any
particular case.
In general, for any matrix the number of non-zero elements are n2 , but for tri-diagonal
matrix only 3(n − 2) + 4 = 3n − 2 non-zero elements are present. So, this matrix
can be stored using only three vectors c = (c1 , c2 , . . . , cn−1 ), a = (a2 , a3 , . . . , an ), and
b = (b1 , b2 , . . . , bn ).
Let us define a vector d = (d1 , d2 , . . . , dn ) as

 b1 if i = 1
di = ai (4.14)
 bi − ci−1 if i = 2, 3, . . . , n.
di−1
n
Y
The value of the determinant is the product P = di .
i=1
If di = 0 for any particular i, i = 1, 2, . . . , n, then, set di = x (x is just a sym-
bolic name). Using this value calculate other d’s, i.e. di+1 , di+2 , . . . , dn . In this case,
9
. . . . . . . . . . . . . . . . . . . . . Gauss Elimination Method and Tri-diagonal Equations

n
Y
d’s contains x. Thus, the product P = di depends on x. Lastly, the value of the
i=1
determinant is obtained by substituting x = 0 in P .
Example 4.4 Find the values of the following tri-diagonal determinants.

1 1 0 1 2 0


A = 1 1 −3 , B = −1 2 −2 .
0 −1 3 0 −1 2

Solution. For the determinant A.


a2 a3 3x − 3
d1 = 1, d2 = b2 − c1 = 0. Here d2 = 0, so let d2 = x. d3 = b3 − c2 = .
d1 d2 x
3x − 3
Thus, P = d1 d2 d2 = 1 · x · = 3x − 3.
x
Now, we put x = 0. Therefore, A = −3.
For the determinant B.
a2 a3 3
d1 = 1, d2 = b2 − c1 = 4, d3 = b3 − c2 = .
d1 d2 2
3
Therefore, P = d1 d2 d2 = 1 · 4 · = 6, that is, B = 6.
2

10

You might also like