Some Mathematical Concepts Used in Visual Statistics

 

The mathematics pertains to numbers, space, time, and logic. The number sense probably developed from the ordinal concepts of greater than, equal, and less than. Combined with the time sense that some events precede and some events follow the others, numbers were arranged along a scale where some numbers precede and some numbers follow the others.

Pythagoras

 During the time of Pythagoras, about 500 B.C., it was observed that certain numbers, called the Pythagorean triplets, describe the right triangles. For example, the original Pythagorean triplets 3, 4, and 5, associated with the height, the width, and the hypotenuse of a right triangle illustrate the Pythagorean theorem, that the square of the hypotenuse of a right triangle is equal to the sum of the squares of the other two sides. This was one of the first connections made between numbers and geometrical properties of objects and thus the foundation of the visual statistics was laid.

The Pythagoreans also noticed that vibrating strings produce harmonious tones when the ratios of the lengths of the strings are whole numbers. They attempted to build a geometric model of these ratia in the sky where the motions of planets, within the celestial spheres were thought to produce a harmony called the music of the spheres.

Cartography and Astronomy

One of the precursors of the Visual Statistics is cartography, intimately linked to astronomy, with both the celestial and terrestrial maps being prototypes of models of spatial relationships. The prince of cartography is Eratosthenes (c.276-195), who worked as a chief librarian at the museum at Alexandria, Egypt. More than 2,000 years ago, he took a trip down the Nile to Aswan, about 500 miles south of Alexandria. There he noticed a sun’s reflection at the bottom of a deep well. He wrote the date and time of this unusual event to his diary. The next year, after his return from Aswan to Alexandria, at the same day and the same time, he measured the length of a shadow of a plumbing line. Then he fastened one end of the plumbing line to a peg in the ground and with the other end drew a circle. This was his model of the Earth. Next, he repeatedly marked the length of the shadow on his model. The shadow fitted to the circle about fifty times. If the shadow of my plumbing line is proportional to the distance between Alexandria and Aswan, than the circumference of the Earth must be 50 times the 500 miles separating these two cities, i.e., about 25,000 miles, reasoned Eratosthenes. The actual equatorial circumference of the Earth is 24,902 miles.

A generation later, Greek astronomer Hipparchus of Rhodes (c190-125) compiled the first star catalog of about 850 stars. His follower, Ptolemy of Alexandria (c. 100-170), prepared a catalog of about 1,000 stars, listing their brightness and positions. In his Geographia, including accurate maps of the countries around the Mediterranean sea, Ptolemy laid the foundations for scientific cartography by introducing the concepts of latitude and longitude.

During the Middle Ages cartographers relied heavily on the Bible for information and showed the Earth as a flat disk. These mappae mundi depicting a flat Earth with Jerusalem at the center and an ocean surrounding the landmass, are close to pure fiction with only loose ties to reality. Following the voyages of Bartolomeu Dias, Vasco da Gama, Christopher Columbus, and Ferdinand Magellan the maps again began to reflect the Earth surface accurately.

Trigonometry

Trigonometry is a part of geometry that involves the measurement of the sides and angles of triangles. By using trigonometric functions, given limited information, you can determine the length of all sides and the size of all angles in a triangle.         

Trigonometry starts with description of right triangles. The size of one of the angles in a right triangle is always 90 degrees and the sum of all the angles is 180 degrees. If you know the length of two of the sides, or the length of one side and the size of one angle, other than the right angle, using trigonometric functions, you can fully describe a triangle.

The sides of the right triangle are called width, altitude, and hypotenuse. Altitude and width are adjacent to the right angle of the triangle. Width is analogous to abscissa, altitude is analogous to ordinate of the Cartesian coordinates. Hypotenuse is the side that is opposite the right angle.

The sine of an angle is equal to the altitude divided by the hypotenuse. The cosine of an angle is equal to the width divided by the hypotenuse. The tangent of an angle is equal to the altitude divided by the width. Each trigonometric function has its inverse, which returns the angle associated with the function. The natural trigonometric functions are tabulated, as shown below.

 

Angle

Sin

Cos

Tan

0

0.000

1.000

0.000

5

0.087

0.996

0.087

10

0.174

0.985

0.176

15

0.259

0.966

0.268

20

0.342

0.940

0.364

25

0.423

0.906

0.466

30

0.50

0.866

0.577

35

0.574

0.819

0.700

40

0.643

0.766

0.839

45

0.707

0.707

1.00

50

0.766

0.643

1.192

55

0.819

0.574

1.428

60

0.866

0.500

1.732

65

0.906

0.423

2.145

70

0.940

0.342

2.747

75

0.966

0.259

3.732

80

0.985

0.174

5.671

85

0.996

0.087

11.430

90

1.000

0.000

¥

 

In visual statistics, the matrices of sines and cosines, such as

 

 

are used to rotate object in the virtual space. For example, to rotate a point with Cartesian coordinates (1,1) 45 degrees, counterclockwise,

find the cosine and sine of 45 degrees in the above table and pre-multiply the coordinates of the point by the rotation matrix, as

 

 

The rotated coordinates of the point are (0, 1.4). The Pythagorean theorem can easily verify the new coordinates. The hypotenuse of the triangle connecting the original point with the origin of the Cartesian coordinates equals ; i.e., ; which is 1.4.

The Numbers

There are three main categories of numbers: natural, integer, and real. The natural numbers [  1    2    3   …  ] are also called the counting numbers. The integer numbers [  …  -2   -1   0   1  2  …  ] are the natural numbers that include the negative numbers and zero. The real numbers are numbers with a decimal point. Consider the following table of powers of real numbers a, shown below.

 

 

 

To express the square roots of negative numbers, we need to introduce an additional category of numbers. Consider that

 

 

 

 and thus

 

 

 

The location of this number within the axis of real numbers (abscissa) and the imaginary numerical axis (ordinate) is

 

 

 

Numbers defined in this fashion are called the complex numbers.

Arithmetic Operations and their Inverse

The basic operations on numbers are addition, subtraction multiplication, division, square roots, powers, and logarithms. The plus sign, +, indicating addition, appeared shortly after introduction of printing press in 1455 as a contraction of the Latin word at meaning 'and.' When at is written hastily, it looks like +. The symbol for the square root was introduced around 1525 as it resembles a small r, the initial letter of the Latin word radix, a root. The symbol =, indicating equality, was introduced in 1557, as at that time an author of a popular book on mathematics insisted that no other things can be more equal than two parallel lines. The second powers were initially written as Aq, as an abbreviation of the Latin word quadrum, a square. The a2 notation was introduced by Descartes around 1637.

          There is a way to reverse most mathematical operations. You can reverse addition by subtraction, multiplication by division. Around 1614 the mathematicians learned how to reverse exponentiation. Consider the equation

 

 

 

If z equals 2, then y equals 100. To reverse this operation, find the logarithm of 100. It equals 2. A logarithm is the exponent required to raise a base number so that it matches the target number. As before the age of computers it was necessary to tabulate logarithms, the agreed upon bases were 10 and e, the Euler's number, approximately  equal to 2.718. Logarithms with base of 10 are called the common logarithms and signified as log. Logarithms with base of e are called the natural logarithms and signified as lg. You can divide the natural logarithm of the desired number by the natural logarithm of the desired base to obtain the logarithm of any base.

The Line

One of the simplest equations of analytic geometry is the equation of a line

 

 

 

Any two points in a plane define a line. Let us call these points O and P, and their respective coordinates X1, Y1, and X2, Y2.

 

To derive a general equation for a line we must define a third point Q, located on the OP line. The coordinates of points O, P, and Q define two similar triangles.

 

 Since both triangles are similar, their sides must be proportional, as

 

Solving for Y3

 

results in a linear equation in a general extended form

 

 

To compute the necessary information for a graphical plot of a line described by an equation, let us assign the numbers 1, 2, 3, and 5 to coordinates X1, X2, Y1, and Y2, respectively. Solving for X3 and Y3, as Y3 = [(X3 - 2) / (2 - 1)] (5 - 3) + 5, the equation for this particular line can be written as Y3 = 2X3 + 1. Next, let us assign values .0, .5, 1.0, 1.5 and 2.0 to the variable X and compute their corresponding Y values, as summarized below.

 

 

 

This table contains selected values of a particular rendering of the linear equation Y = BX + A where B is a slope equal to 2, and A is an intercept equal to 1.

As an example, let us develop a conversion equation for a linear transformation from degrees of Celsius (X) to degrees of Fahrenheit (Y). For this we need to know the corresponding temperatures in both measurement systems for at least two points on the temperature continuum. Well known values are freezing and boiling temperatures of water, indexed by 0 and 100 degrees Celsius and by the 32 and 212 degrees Fahrenheit. Substituting the appropriate values to the equation for the analytical solution

 

 

as (X - 100)/(100 - 0) (212 - 32) + 212 results in an analytical conversion equation Y = 1.8 X + 32. Thus a body temperature of 37 degrees Celsius corresponds to 98.6 degrees of Fahrenheit.

In passim we may mention that normal oral temperatures range from 98 degrees Fahrenheit (36.7 degrees Celsius) in resting persons to 99 degrees Fahrenheit (37.4 degrees Celsius) in active persons. Temperatures taken rectally usually register slightly higher. The reason why Fahrenheit (1686 - 1736) chose 98.6 as the body temperature is not known, however, there is a possibility he was measuring the rectal temperature of a person having a slight fever or being in a state of heightened activity. Swedish astronomer Celsius (1701-1744) originally assigned the value 0 to the boiling point of water, 100 to the freezing point.  These values were later reversed by the botanist Carolus Linnaeus.

The Open Sentences

Algebra is concerned with statements that are either true or false. A large part of algebraic analysis pertains to sentences that cannot be classified as true or false until some additional information is provided. Such sentences are called open sentences. For example the algebraic sentence x + 1 = 2 is an open sentence that is true only if x equals 1. The algebraic sentence

 

 

is true for all points located on the line it defines. The simultaneous equations

 

 

 

are true for the intersection of the two lines (4,-3)they define, i.e., when x equals 4 and y equals -3.

Analytic Geometry

Analytic geometry relates algebra to geometry. Algebraic equations such as

 

 

 

can paint pictures of lines, parabolas, and circles.

The Problem of Areas

Areas under or within curves described by analytic geometry can be computed by the integral calculus. Thus, e.g., the area A under the 0 - 1 segment of a parabola

 

 

shown in the following diagram

 

can be computed as

 

 

The integral sign stands for the sum of an unspecified number of rectangles under the 0 - 1 segment of the curve and the dx indicates that the base of each of these rectangles is infinitesimally small.

The area A under the parabola equals 1/3 of the unit area of the square circumscribing this 0 - 1 segment of the curve. This problem was solved by Archimedes who used small rectangles to add-up this area. A general solution of this problem is

 

 

 

For the example, the exponent, n = 2, x = 1, one to the third power is one, and n + 1 = 3, thus A = 1/3, a solution identical to that by Archimedes.

The Tangent Problem

This problem pertains to finding the tangent line to a given curve at a given point. To find a tangent to a circle is easy, as the tangent is perpendicular to the radius, r, at the point of tangency. However, to find a tangent to more complex lines is difficult. A general solution to the problem of a tangent is to define slope of a line, measured by the vertical leg of a right triangle, divided by its horizontal leg.

          We discussed the slope of a line in detail in one of the preceding sections. There the vertical leg was defined as Y2 - Y1, the horizontal leg as X2 - X1 , and the slope was defined as

 

 

 

The above expression can be also written in an abbreviated form as

 

 

 

Let us find a tangent to a parabola

 

 

at the point (3,9) on the parabola. A general point on the parabola can be written as (x,x2), as the parabola's y-coordinates are the squares of its x-coordinates.

The slope of a line connecting the point (3,9) to any point (x,x2) on the parabola is

 

 

 

The above fraction can be simplified as

 

 

 

Thus the slope of a parabola at the point (3,9) is x+3. This is commonly written as

 

 

 

Remember that the other point (x,x2), defining the slope, is any other point on the parabola. Thus we can move this point as close as we wish to the (3,9) point. When these two points are infinitesimally close to each other, the slope becomes a tangent. Let us say that the (x,x2) point is actually a point (2,4). By moving the point (2,4) toward the point (3,9)

 

 

 

we find that the slope of the tangent line at the point (3,9) is

 

 

 

Now we can construct a tangent at the (3,9) point by moving to the right 1 unit and then upward 6 units.

          This limiting process, described above, is called differentiation. The value of the limit is called derivative. The theory behind this limit operation is called differential calculus.

Differential Calculus

The axiomatic build-up of the general linear model necessitates the demonstration that the coefficient of correlation is equivalent to the slope of a regression line. The proof of the equivalence of the beta and r coefficients for the two variable case cannot be presented without explicating the Laplace/Gauss' criterion of least squares which, in turn, cannot be presented formally without recourse to differential calculus. However, the essentials of the differential calculus, necessary for the proof, are quite modest.

The differential quotient of y with respect to x is also symbolized as dy/dx, thus

 

 

 

In the above equation the symbols dx and dy are called differentials and the letter d means a small amount of. Thus, dx means a small amount of x, and dy means a small amount of y. When x increases a small amount by dx, y also increases by a small amount dy. If dx is small, then dx2 will be negligible. To differentiate a function, the knowledge of at least the following computational techniques is necessary.

Exponents

To differentiate xn, multiply x by the exponent and reduce the exponent by one. Assume that

 

 

 

If y grows to y + dy, while x grows to x + dx, then

 

 

and

 

Substituting x2 for y,

 

and

 

 

As dx is small, dx2 will become infinitely smaller and negligible. Thus

 

 

and

 

In general, if

 

then

 

Added Constants

To differentiate xn + a, neglect the added constant. Consider a function

 

 

 

Suppose y grows to y + dy, while x grows to  x + dx, then

 

 

Expanding the binomial

 

 

Neglecting the dx2, and substituting x for y,

 

 

then

 

and

 

 

The constant 3 has disappeared. As a constant adds nothing to the growth of x, it does not enter into the differential coefficient. Using formal notation, if

 

 

then

 

Multiplied Constants

To differentiate bxn + a, keep the multiplied constant. In the function

 

 

 

3 is an added constant and 2 is a multiplied constant. Suppose x grows to x + dx and y grows to  y + dy, then

 

 

 

Expanding the binomial

 

 

 

simplifying the expression

 

 

neglecting the 2dx2

 

 and substituting 2x2 + 3 for y

 

 

 

the expression

 

remains and

 

 That is, when

 

 

 then

 

Minimum of a Function

For a particular value of x that makes y a minimum, the value of dy/dx = 0. The characteristic of a minimum is that y must increase on either side of it. As x increases by a dy, on either side of the minimum of the function, y also increases by a dy. However, close to the minimum, increases in x are accompanied by nearly zero increases in y. As the curve flattens, the magnitude of increase in y for small increases in x is almost zero. Since dy = 0, so

 

 

 

At this point return to the chapter on the foundations of the general linear model and review the section on the criterion of least squares. If you understand the main computational points described here, you should be able to follow the proof that the Pearson's product-moment coefficient of correlation equals the slope of the line of the best fit.

Fundamental Theorem of Calculus

The integration and differentiation are inverses of each other. As the inverse of addition is subtraction and the inverse of multiplication is division, thus the inverse of integration is differentiation. Remember that to find an area underneath the parabola

 

 

 

 we raised the exponent by one and divided by that resulting number, as

 

 

 

If we differentiate the above expression

 

 

 

we obtain the expression

 

 

 

we started with. The tangent problem and the area problem are the inverse of each other. This statement that integration and differentiation are inverse operations is called the fundamental theorem of calculus.

Numbers, Points, Lines, and Beyond

Phenomena are counted by numbers and visualized by points. Points create lines and lines create shapes. This process can be generalized from a two-dimensional plane to a three-dimensional space. From a three-dimensional space to a subspace within a hyperspace.

In one of the previous sections we have outlined integration of

 

 

within the 0-1 segment

 

as

 

To integrate function

 

within the 2-3 interval

 

 

increment the exponent and place it in both the numerator and denominator as

 

 

 

and replace the x first with the upper limit, then with the lower limit, as

 

 

 

giving the area as equal to 65 / 4.

 

          To integrate

 

 

within the 2-4 interval

 

increment the exponents

 

and compute the area as

 

 

 

which equals 6.33. To check the integration, differentiate the expression

 

 

 

which equals the function

 

 

that we started with.