Then it happened! I received my April 1989 issue of Amazing Computing. It contained an article by John Baez entitled "Creating a Shared Library" (Ref 2). The article was wonderful; I highly recommend it. I perused the article and program listings several times and began to realize that my suspicions were right. Creating a shared library CAN BE EASY! As time permitted, I experimented with the code in John's article. Once I had it working, I was ready to begin creating my own library.
The first library I decided to create was Matrix.library which would contain routines for manipulating matrices. There were multiple reasons for this choice. My regular job requires me to solve simultaneous equations and calculate least-squares multiple regression coefficients. Although I typically use commercial software for these tasks, I knew the calculations were based on matrices and I wanted to learn more about the math behind the calculations. Also, I was personally working on a three-dimensional function plotting program. Two- and three-dimensional vector translation, rotation, and scaling are also based on matrix manipulation. A shared library should only be created if a variety of programs can use it. Matrix.library meets this criterion.
As I began creating Matrix.library, I realized that some improvements could be made to John Baez's code because I was using Lattice version 5.04 which is greatly improved over the version 4 that John had at the time of his article. Therefore, this article has a three-fold purpose: 1) present the source code for Matrix.library as well as an exhaustive test program, 2) point out some improvements over John Baez's code due to improvements in the Lattice development system, and 3) present several programs that illustrate how the library can be put to practical use.
Listing 5 is Test.h, a header file that #includes several other header files and contains vector and matrix display functions. It is used by several of the programs presented in this article. Listing 6 shows a makefile that can be used with LMK (Lattice Make Utility) to create Matrix.library and other programs presented later.
Lines 27-53 are the user function XREF statements that allow the C functions in Matrix.c to be visible during linking. There are 25 user functions in Matrix.library. They will be discussed in more detail later. As more functions are added to the library, more XREF statements need to be added to this list. Also, new jmp statements will need to be added to the jump vector list in lines 94-120. It is important to remember that the jump vectors MUST be in reverse order of the function list in Matrix.fd. The jump table in MatrixLib.a uses the jmp instruction whereas John Baez used a hard coded $4EF9 (long jump) due to a bug in the assembler he was using.
In lines 151 and 157 I have described the size of the jump vector table and data section in terms of the difference of two labels rather than hard coding the sizes. This allows additional functions (and thus more jmp instructions) and new data to be added without having to worry about changing these numbers.
Lines 237-408 contain the interface stub routines for all of the user functions in Matrix.library. These routines simply move the appropriate registers onto the stack and call the C functions in Matrix.c. One thing to keep in mind is that the addq.l instruction can only be used for a value of eight or less. Since each parameter requires the stack to be adjusted by four, addq.l can only be used when two or less parameters are passed. For more parameters, add.l must be used. It is very important that the registers that are placed on the stack match the registers that are declared in Matrix.fd because the compiler will be placing the parameters in the registers according to the .fd file. Lattice v4 only supported four parameters in the #pragma statements (produced from the .fd file using fd2pragma). However, Lattice v5 supports six parameters. This extra support came in the nick of time as Matrix.library has two functions that pass this many parameters.
John Baez's comments near the end of slibh.a state that Lattice is fuzzy (could he mean fussy) about not having its own startup code during linking and gives a piece of code starting with SECTION __MERGED that contains some XREFs and data labels. None of this is needed for Lattice v5.04; blink now behaves just fine.
The user functions in Matrix.library are divided into two categories, long integer functions and double precision floating point functions. The former handle the manipulation of matrices containing integers and the latter handle the manipulation of matrices containing floating point numbers. There are a total of 25 user functions in Matrix.library; 12 for integer matrices and 13 for floating point matrices. A matrix with only one row or column is also known as a vector. Some functions are written specifically with vectors in mind while other functions are written for matrices in general. Table I shows a list of the user functions and describes the tasks performed by each. The letter 'L' in a function name indicates that it is a long integer function; the letter 'D' indicates a double precision floating point function.
The code for the memory allocation and deallocation functions and the matrix inversion function were based on similar functions in the book Numerical Recipes in C (Ref 3). This book contains a world of algorithms for mathematical problems. I highly recommend it to anyone writing mathematical software.
Most of the code in Matrix.c is straightforward and needs no explanation. However, there are a few items that need to be mentioned. When a vector is allocated, a pointer to the allocated memory space is returned to the calling program. This pointer can be used just like a standard C array pointer. However, when a matrix is allocated, the pointer that is returned to the calling program points to an array of pointers each of which points to an array of numbers that represent the rows of the matrix. Thus, to refer to an individual element of the matrix, the standard C notation for a two-dimensional array, m[row][column], cannot be used. Rather, the element must be referred to as (m[row])[column]. Press, et.al. state in Numerical Recipes in C that the standard notation can be used; however, it does not work with the Lattice compiler.
The method used for inverting a matrix in InvertDMatrix() is called LU decomposition. The explanation for how it works is rather lengthy, so I refer you to Numerical Recipes in C if you are interested in this method.
1.3x1 + 342.7x2 - 17.7x3 = 1003.88(1) -222.17x1 + 3.0x2 + 23.6x3 = -667.05(2) 1.0x1 - 17.4x2 - 15.6x3 = 3.0(3)
We need to know what values of the xi's are valid for all three equations. In other words, we need to solve the equations simultaneously. The above equations can be written in matrix notation as AX=B where A is a 3 by 3 matrix containing the three coefficients of each equation in its rows, X is a column vector containing the xi's, and B is a column vector containing the right-hand-side (RHS) constants. This matrix equation can be written as follows:
___ ___ _ _ __ __ | | | | | | | 1.3 342.7 -17.7 | | x1 | | 1003.88 | | | | | | | | -222.17 3.0 23.6 | | x2 | = | -667.05 | | | | | | | | 1.0 -17.4 -15.6 | | x3 | | 3.0 | |___ ___| |_ _| |__ __|
When each side of the equation is multiplied by the inverse of matrix A, A-1, the left-hand-side (LHS) of the equation will reduce to the X vector while the RHS will be the solution vector. In other words, A-1AX=A-1B reduces to X=A-1B. This is due to the fact that A-1A is equal to the unity matrix (ones in the diagonal from upper-left to lower-right and zeros elsewhere). Thus the RHS vector after the multiplication will contain the values of the xi's. Listing 9 is a program for accomplishing the above operation. Listing 10 shows the output of the program. The solution for the X vector is x1=2.71, x2=2.76, and x3=-3.10. Notice that the program multiplies A and X to verify that the result does indeed equal B (the RHS vector).
yield = b0 + b1(temp) + b2(pH)
The data in Table II can be combined with the equation above and put in matrix notation.
___ ___ _ _ __ __ | | | | | | | 1 50 5 | | b0 | | 40.5 | | | | | | | | 1 70 5 | | b1 | = | 49.9 | | | | | | | | 1 50 9 | | b2 | | 61.8 | | | |_ _| | | | 1 70 9 | | 78.2 | |___ ___| |__ __|
or DB=Y. Since there are four equations and only three unknowns, the model is over specified and there is no exact solution for the bi's. However, it is possible to calculate what values of bi will result in PREDICTED yield values that are closest to the ACTUAL yield values. This is accomplished by choosing the bi's such that the sum of the squares of the differences between the four actual and predicted yields is minimized. This is known as least-squares regression analysis. It turns out that the appropriate bi's can be calculated rather simply using matrices by solving the set of simultaneous equations known as the normal equations. These equations in matrix notation are DTDB=DTY, where DT is the transpose of matrix D. DTD will be a square matrix with three rows and columns and DTY will be a column vector with three elements. Thus, the normal equations consist of three equations with three unknowns. They can be solved in the same way the simultaneous equations were solved in the previous section as follows:
(DTD)B = DTY (DTD)-1(DTD)B = (DTD)-1DTY B = (DTD)-1DTY
The program in Listing 11 performs these calculations. The output of the program is shown in Listing 12. The least-squares regression coefficients were b0=-24.5, b1=0.645, and b2=6.20 resulting in a regression equation of
yield = -24.5 + 0.645(temp) + 6.20(pH)
Note the PREDYIELD vector at the end of Listing 12. These are the four predicted yield values at the test conditions of the experimental design. They are different from the actual yield values; however, due to the nature of least-squares regression analysis, there are no other bi values that will result in a lower sum of squares of the differences in the actual and predicted yields.
A rectangle with a width of four and a height of three resides on the X-Y plane such that the coordinates of its four corners are (5,6), (5,9), (9,9), and (9,6). We would like to scale this rectangle by a factor of 1.5 along the X dimension and 2.0 along the Y dimension and then rotate it 45 degrees about its center point (7.0,7.5). The scaling factors and the rotation cannot be applied directly to the coordinates because this would result in the rectangle being scaled and rotated with respect to the origin rather than the center of the rectangle. Therefore, it is first necessary to translate the rectangle such that the center coincides with the origin, scale it, rotate it, and then translate again so the center of the rectangle is back in its original position. This process is shown graphically in Figure 1.
It turns out that Cartesian coordinates are not convenient for the transformation, but homogeneous coordinates are. Homogeneous coordinates add an extra factor w such that (w*x,w*y,w) is equivalent to (x,y,1) which represents the Cartesian coordinate (x,y). For more detail, see Reference 5.
The Cartesian coordinates of the four corners of the rectangle can be represented in homogeneous coordinates as (5,6,1), (5,9,1), (9,9,1), and (9,6,1), which are row vectors. To translate, scale, or rotate these corners, the vectors are multiplied by the appropriate transformation matrix. Thus, to carry out the entire transformation illustrated in Figure 1, the following calculations are needed:
Ni = (Ci)(T-c)SR(Tc) = (Ci)F where Ci = a corner vector, i=1-4 T-c = the translation matrix that moves the center of the rectangle to the origin S = the scaling matrix R = the rotation matrix Tc = the translation matrix that moves the center of the rectangle back to its original location F = the complete transformation matrix = (T-c)SR(Tc) Ni = a transformed corner vector, i=1-4
Table III shows what two-dimensional translation, scaling, and rotation matrices look like. Once all the individual transformation matrices are multiplied together to obtain a complete transformation matrix, the homogeneous corner vectors can each be multiplied by this matrix to obtain the four transformed corner coordinates. Listing 13 shows a program that performs all of these calculations. The program's output is in Listing 14. The transformed corner Cartesian coordinates are (7.00,3.26), (2.76,7.50), (7.00,11.74), and (11.24,7.50). As you can see, these coordinates match those in Figure 1e.