A Shared Library for Matrix Manipulations: Part 1 and Part 2 - Practicalities

by Randy Finch

For some time I had been wanting to learn how to create a shared library for my Amiga computer. I had looked over the ROM Kernel manual (Ref 1) and looked at public domain libraries but had been too dense to understand. It appeared that the problem was two-fold: not enough information and my lack of knowledge of the Amiga's assembly language. In the past, I had programmed my Commodore-64 quite a bit in assembly language. However, 68000 assembly language appeared to be much more complex than 6502 assembly language. I was still learning C and was not inclined to learn 68000 assembly. Deep inside, however, I kept thinking that there had to be a simple way to create a shared library; a way even I could understand.

Then it happened! I received my April 1989 issue of Amazing Computing. It contained an article by John Baez entitled "Creating a Shared Library" (Ref 2). The article was wonderful; I highly recommend it. I perused the article and program listings several times and began to realize that my suspicions were right. Creating a shared library CAN BE EASY! As time permitted, I experimented with the code in John's article. Once I had it working, I was ready to begin creating my own library.

The first library I decided to create was Matrix.library which would contain routines for manipulating matrices. There were multiple reasons for this choice. My regular job requires me to solve simultaneous equations and calculate least-squares multiple regression coefficients. Although I typically use commercial software for these tasks, I knew the calculations were based on matrices and I wanted to learn more about the math behind the calculations. Also, I was personally working on a three-dimensional function plotting program. Two- and three-dimensional vector translation, rotation, and scaling are also based on matrix manipulation. A shared library should only be created if a variety of programs can use it. Matrix.library meets this criterion.

As I began creating Matrix.library, I realized that some improvements could be made to John Baez's code because I was using Lattice version 5.04 which is greatly improved over the version 4 that John had at the time of his article. Therefore, this article has a three-fold purpose: 1) present the source code for Matrix.library as well as an exhaustive test program, 2) point out some improvements over John Baez's code due to improvements in the Lattice development system, and 3) present several programs that illustrate how the library can be put to practical use.

Revealing My Source

Unlike most newspaper reporters, many software developers like to reveal their source. That is exactly what I am prepared to do. Matrix.library consists of four files. They are shown in Listings 1-4. Listing 1 is MatrixLib.a, the only assembly language file. It consists of the standard shared library header data and the interface routines for the standard and user library functions that are written in C. Listing 2 is Matrix.c. It contains all the code for both the standard and the user functions. Listing 3 is Matrix.fd. This is the standard file description text that provides information about which microprocessor registers are to be used for passing parameters to the library user functions. Listing 4 is MatrixProto.h. This is a C header file containing the user function prototypes.

Listing 5 is Test.h, a header file that #includes several other header files and contains vector and matrix display functions. It is used by several of the programs presented in this article. Listing 6 shows a makefile that can be used with LMK (Lattice Make Utility) to create Matrix.library and other programs presented later.

A Comparison

I do not intend to detail the inner workings of MatrixLib.a. Other than the XREF statements and jump vectors near the beginning of the file and the interface stub routines at the end of the file, it is essentially the same as slibh.a in John Baez's article. I do wish to point out some changes I made to John's code. So dig out your April 1989 issue of Amazing Computing, lay it next to this article, and follow through the upcoming discussion. I have numbered the lines in MatrixLib.a to make the discussion easier to follow.

Lines 27-53 are the user function XREF statements that allow the C functions in Matrix.c to be visible during linking. There are 25 user functions in Matrix.library. They will be discussed in more detail later. As more functions are added to the library, more XREF statements need to be added to this list. Also, new jmp statements will need to be added to the jump vector list in lines 94-120. It is important to remember that the jump vectors MUST be in reverse order of the function list in Matrix.fd. The jump table in MatrixLib.a uses the jmp instruction whereas John Baez used a hard coded $4EF9 (long jump) due to a bug in the assembler he was using.

In lines 151 and 157 I have described the size of the jump vector table and data section in terms of the difference of two labels rather than hard coding the sizes. This allows additional functions (and thus more jmp instructions) and new data to be added without having to worry about changing these numbers.

Lines 237-408 contain the interface stub routines for all of the user functions in Matrix.library. These routines simply move the appropriate registers onto the stack and call the C functions in Matrix.c. One thing to keep in mind is that the addq.l instruction can only be used for a value of eight or less. Since each parameter requires the stack to be adjusted by four, addq.l can only be used when two or less parameters are passed. For more parameters, add.l must be used. It is very important that the registers that are placed on the stack match the registers that are declared in Matrix.fd because the compiler will be placing the parameters in the registers according to the .fd file. Lattice v4 only supported four parameters in the #pragma statements (produced from the .fd file using fd2pragma). However, Lattice v5 supports six parameters. This extra support came in the nick of time as Matrix.library has two functions that pass this many parameters.

John Baez's comments near the end of slibh.a state that Lattice is fuzzy (could he mean fussy) about not having its own startup code during linking and gives a piece of code starting with SECTION __MERGED that contains some XREFs and data labels. None of this is needed for Lattice v5.04; blink now behaves just fine.

Looking Through the Library

Let's take a look at the functions that are a part of Matrix.c. The standard library functions CInit(), LibOpen(), LibClose(), and LibExpunge() are the same as John Baez's except CInit(). Some of the user functions in Matrix.c use floating point calculations. Mathieeedoubbas.library is used for these calculations. Therefore, this library must be opened in CInit(). If it fails to open, a NULL is returned to inform Exec that Matrix.library failed to initialize.

The user functions in Matrix.library are divided into two categories, long integer functions and double precision floating point functions. The former handle the manipulation of matrices containing integers and the latter handle the manipulation of matrices containing floating point numbers. There are a total of 25 user functions in Matrix.library; 12 for integer matrices and 13 for floating point matrices. A matrix with only one row or column is also known as a vector. Some functions are written specifically with vectors in mind while other functions are written for matrices in general. Table I shows a list of the user functions and describes the tasks performed by each. The letter 'L' in a function name indicates that it is a long integer function; the letter 'D' indicates a double precision floating point function.

The code for the memory allocation and deallocation functions and the matrix inversion function were based on similar functions in the book Numerical Recipes in C (Ref 3). This book contains a world of algorithms for mathematical problems. I highly recommend it to anyone writing mathematical software.

Most of the code in Matrix.c is straightforward and needs no explanation. However, there are a few items that need to be mentioned. When a vector is allocated, a pointer to the allocated memory space is returned to the calling program. This pointer can be used just like a standard C array pointer. However, when a matrix is allocated, the pointer that is returned to the calling program points to an array of pointers each of which points to an array of numbers that represent the rows of the matrix. Thus, to refer to an individual element of the matrix, the standard C notation for a two-dimensional array, m[row][column], cannot be used. Rather, the element must be referred to as (m[row])[column]. Press, et.al. state in Numerical Recipes in C that the standard notation can be used; however, it does not work with the Lattice compiler.

The method used for inverting a matrix in InvertDMatrix() is called LU decomposition. The explanation for how it works is rather lengthy, so I refer you to Numerical Recipes in C if you are interested in this method.

Testing the Library

Listing 7 shows the C source of a program entitled TestMatLib.c. This program opens Matrix.library, allocates several vectors and matrices, and assigns random numbers to the elements. It then calls every function in Matrix.library. The input vectors and matrices as well as the result vectors and matrices are printed to the standard output each step of the way. When the test is complete, all the vectors and matrices are deallocated and Matrix.library is closed. A sample output of the program is shown in Listing 8.

REALLY Using Matrix.library

The test program is nice for testing Matrix.library for bugs but the REAL purpose of the shared library is to put it to REAL use. I mentioned at the beginning of this article that there were several reasons for writing Matrix.library. They included solving simultaneous equations, determining least-squares regression coefficients, and manipulating two- and three-dimensional vectors. The remainder of this article will concentrate on examples of these problems. Please be aware that, for the sake of brevity, the programs that will be presented do not perform much error checking.

Simultaneous Equations

Suppose we have the following set of equations:

	1.3x1 + 342.7x2 - 17.7x3 = 1003.88(1)

	-222.17x1 + 3.0x2 + 23.6x3 = -667.05(2)

	1.0x1 - 17.4x2 - 15.6x3 = 3.0(3)

We need to know what values of the xi's are valid for all three equations. In other words, we need to solve the equations simultaneously. The above equations can be written in matrix notation as AX=B where A is a 3 by 3 matrix containing the three coefficients of each equation in its rows, X is a column vector containing the xi's, and B is a column vector containing the right-hand-side (RHS) constants. This matrix equation can be written as follows:

      ___                  ___   _  _     __     __
     |                        | |    |   |         |
     |    1.3   342.7   -17.7 | | x1 |   | 1003.88 |
     |                        | |    |   |         |
     | -222.17    3.0    23.6 | | x2 | = | -667.05 |
     |                        | |    |   |         |
     |    1.0   -17.4   -15.6 | | x3 |   |    3.0  |
     |___                  ___| |_  _|   |__     __|

When each side of the equation is multiplied by the inverse of matrix A, A-1, the left-hand-side (LHS) of the equation will reduce to the X vector while the RHS will be the solution vector. In other words, A-1AX=A-1B reduces to X=A-1B. This is due to the fact that A-1A is equal to the unity matrix (ones in the diagonal from upper-left to lower-right and zeros elsewhere). Thus the RHS vector after the multiplication will contain the values of the xi's. Listing 9 is a program for accomplishing the above operation. Listing 10 shows the output of the program. The solution for the X vector is x1=2.71, x2=2.76, and x3=-3.10. Notice that the program multiplies A and X to verify that the result does indeed equal B (the RHS vector).

Least-Squares Regression

Suppose we set up a two-level full factorial experimental design (Ref 4) for a chemical reaction that we believe is dependent on temperature and pH. The dependent variable of interest is the percent yield of some compound. Table II shows the experimental design, which consists of four tests, and the yield results. We want to model the results with the following linear equation:

yield = b0 + b1(temp) + b2(pH)

The data in Table II can be combined with the equation above and put in matrix notation.

      ___        ___   _  _     __  __
     |              | |    |   |      |
     |  1   50   5  | | b0 |   | 40.5 |
     |              | |    |   |      |
     |  1   70   5  | | b1 | = | 49.9 |
     |              | |    |   |      |
     |  1   50   9  | | b2 |   | 61.8 |
     |              | |_  _|   |      |
     |  1   70   9  |          | 78.2 |
     |___        ___|          |__  __|

or DB=Y. Since there are four equations and only three unknowns, the model is over specified and there is no exact solution for the bi's. However, it is possible to calculate what values of bi will result in PREDICTED yield values that are closest to the ACTUAL yield values. This is accomplished by choosing the bi's such that the sum of the squares of the differences between the four actual and predicted yields is minimized. This is known as least-squares regression analysis. It turns out that the appropriate bi's can be calculated rather simply using matrices by solving the set of simultaneous equations known as the normal equations. These equations in matrix notation are DTDB=DTY, where DT is the transpose of matrix D. DTD will be a square matrix with three rows and columns and DTY will be a column vector with three elements. Thus, the normal equations consist of three equations with three unknowns. They can be solved in the same way the simultaneous equations were solved in the previous section as follows:

       (DTD)B = DTY
(DTD)-1(DTD)B = (DTD)-1DTY
            B = (DTD)-1DTY

The program in Listing 11 performs these calculations. The output of the program is shown in Listing 12. The least-squares regression coefficients were b0=-24.5, b1=0.645, and b2=6.20 resulting in a regression equation of

yield = -24.5 + 0.645(temp) + 6.20(pH)

Note the PREDYIELD vector at the end of Listing 12. These are the four predicted yield values at the test conditions of the experimental design. They are different from the actual yield values; however, due to the nature of least-squares regression analysis, there are no other bi values that will result in a lower sum of squares of the differences in the actual and predicted yields.

Vector Transformation

Matrices can be used for transforming N-dimensional objects. For the sake of brevity, the following example is a two-dimensional transformation but the same principle can be applied for higher dimensions (Ref 5).

A rectangle with a width of four and a height of three resides on the X-Y plane such that the coordinates of its four corners are (5,6), (5,9), (9,9), and (9,6). We would like to scale this rectangle by a factor of 1.5 along the X dimension and 2.0 along the Y dimension and then rotate it 45 degrees about its center point (7.0,7.5). The scaling factors and the rotation cannot be applied directly to the coordinates because this would result in the rectangle being scaled and rotated with respect to the origin rather than the center of the rectangle. Therefore, it is first necessary to translate the rectangle such that the center coincides with the origin, scale it, rotate it, and then translate again so the center of the rectangle is back in its original position. This process is shown graphically in Figure 1.

It turns out that Cartesian coordinates are not convenient for the transformation, but homogeneous coordinates are. Homogeneous coordinates add an extra factor w such that (w*x,w*y,w) is equivalent to (x,y,1) which represents the Cartesian coordinate (x,y). For more detail, see Reference 5.

The Cartesian coordinates of the four corners of the rectangle can be represented in homogeneous coordinates as (5,6,1), (5,9,1), (9,9,1), and (9,6,1), which are row vectors. To translate, scale, or rotate these corners, the vectors are multiplied by the appropriate transformation matrix. Thus, to carry out the entire transformation illustrated in Figure 1, the following calculations are needed:

Ni = (Ci)(T-c)SR(Tc) = (Ci)F

where Ci  = a corner vector, i=1-4
      T-c = the translation matrix that moves the center
             of the rectangle to the origin
      S   = the scaling matrix
      R   = the rotation matrix
      Tc  = the translation matrix that moves the center
             of the rectangle back to its original location
      F   = the complete transformation matrix = (T-c)SR(Tc)
      Ni  = a transformed corner vector, i=1-4

Table III shows what two-dimensional translation, scaling, and rotation matrices look like. Once all the individual transformation matrices are multiplied together to obtain a complete transformation matrix, the homogeneous corner vectors can each be multiplied by this matrix to obtain the four transformed corner coordinates. Listing 13 shows a program that performs all of these calculations. The program's output is in Listing 14. The transformed corner Cartesian coordinates are (7.00,3.26), (2.76,7.50), (7.00,11.74), and (11.24,7.50). As you can see, these coordinates match those in Figure 1e.

Conclusion

There you have it: a shared library for performing matrix calculations and the beginnings of how to REALLY use the library. Of course, there are many other tasks that can make good use of Matrix.library. Also, there are many other functions that can be added to the library itself. Feel free to add whatever you wish. If you make any substantial changes that are useful, please let me know about them. You can contact me in care of Amazing Computing.

References

  1. Commodore-Amiga, Inc.; Amiga ROM Kernel Reference Manual: Includes and Autodocs; Addison-Wesley Publishing 1989.
  2. Baez, J.; "Creating Shared Libraries"; Amazing Computing for the Commodore Amiga; PiM Publications, April 1989 (v4.4).
  3. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T.; Numerical Recipes in C: The Art of Scientific Computing; Cambridge University Press 1988.
  4. Box, G.E.P., Hunter, W.G., and Hunter, J.S.; Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building; John Wiley & Sons 1978.
  5. Pokorny, C. K. and Gerald, C. F.; Computer Graphics: The Principles Behind the Art and Science; Franklin, Beedle, and Associates 1989.


Back to list of articles
Back to Randy Finch's Home Page
Last modified on April 1, 1996
Randy Finch at webmaster@rcfinch.com