# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 4 (of 4)."
# Contents:  ./Doc/Data_redn.doc ./Doc/Simplex.doc
# Wrapped by local@rupley on Wed Nov 21 00:31:22 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f './Doc/Data_redn.doc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'./Doc/Data_redn.doc'\"
else
echo shar: Extracting \"'./Doc/Data_redn.doc'\" \(11843 characters\)
sed "s/^X//" >'./Doc/Data_redn.doc' <<'END_OF_FILE'
X NOTES ON DATA REDUCTION
X
X
X J. A. Rupley, Tucson, Arizona
X
X
X
INTRODUCTION:
X
X In order to derive conclusions from quantitative measurements, there
must be some form of data reduction.  This can be as simple as a
comparison by eye of two curves drawn through the data.  If, however,
the data set is large and complex, for example with more than one
independent variable, and if the questions posed are detailed or involve
a complicated nonlinear model, then visual or graphical methods are less
satisfactory than a computer-based analysis.  The latter is now widely used.
X
X The intent of this short introduction is to show that a sophisticated
computer program can be handled easily, that its use saves time and
effort, that it can treat a more complicated model than can be handled
graphically, and that it provides information such as estimates of
uncertainties in the parameters that is difficult or impossible to
obtain from graphical methods.  
X
X These comments, although general, are addressed to the solution of a
particular problem, an analysis of enzyme kinetic data.  These are
initial rate measurements made on the lactate-dehydrogenase-catalyzed
reaction of pyruvate with NADH, in the presence and absence of lactate
as inhibitor.  The results of the computer fit are the following:  (1)
values of the kinetic constants V, KmA, KmB, KmAB, KmQ/KmPQ, and
KBInhib.  The defining equation is given below.  The first five
constants are those that can be evaluated by the standard graphical
methods of primary and secondary reciprocal plots, as described in
introductory textbooks.  The constant KBInhib is the dissociation
constant for the dead-end complex LDH-NADH-lactate, which is included in
the mechanism fit by the computer program but cannot be included in a
mechanism on which the graphical methods are based.  (2) Estimates of
the standard deviations of the kinetic constants.  These are needed for
an understanding of the reliability and significance of the values
calculated for the kinetic constants.  (3) A list of the coordinates of
points suitable for construction of the lines of the reciprocal plots of
the standard graphical methods.
X
X
X
REMARKS ON FITTING OF A MODEL TO DATA
X
X In a typical data reduction, a particular model to be tested is fit to
a set of data points under some criterion for best-fit.  The ith data
point of a set of N data points consists of a single value for the
dependent variable, Yobserved(i), measured for corresponding single
values for the one or more independent variables, Xobserved(i).  The
commonly-used least squares criterion for quality of fit is the minimum
value of the weighted sum of the squares of the deviations between the
observed values of Y and the values of Y calculated according to the
model.
X
Working from the model to be fit to the data, one develops an equation
relating the dependent variable Y to the independent variables X and to
a set of M variable parameters p, for each of the N data points.  For the
ith data point:
X
X     Ymodel(i) = F(Xobserved(i); p(j), j=1,M)        eq. 1
X
If the model predicts a linear relationship between Y and a single
independent variable X :
X
X     Ymodel(i) = p(1) + p(2) * Xobserved(i)          eq. 2
X
The constants p(1) and p(2) of equation (2) are the Y-axis intercept and
the slope, respectively, and of course are the same for all data points
X(for all pairs of values Y(i) and X(i)).
X
X Fitting of a model to data consists of finding the values of the M
variable parameters p that give best agreement between the N pairs
of values of Ymodel(i) and Yobserved(i).  Best agreement can be defined
as the minimum value of the least squares function y:
X
X          N
X     y = SUM (Yobserved(i) - Ymodel(i))^2 * W(i)     eq. 3
X         i=1
X
X
The factor W(i) of equation (3) is the normalized reciprocal variance
X(the statistical weighting) of the ith data point, and it can be set at
unity if the data points are all of equal estimated uncertainty.
X
X Combining equations (1) and (3), one sees that the least squares
function y of equation (3) is a function of the full set of N data
points and a set of M variable parameters:
X
X  y = f(Yobserved(i), Xobserved(i), i=1,N; p(j), j=1,M)   eq. 4
X
X The fitting problem therefore consists of finding the minimum value of
the least squares function y.  For a given set of data, y depends only
on the M variable parameters p (the data points Yobserved(i) --
XXobserved(i) in equation (4) are constant in the fitting).  There are
several methods commonly used to find the minimum of y and thus evaluate
the best-fit values of the parameters p. The more useful of these can
handle nonlinear model functions F (equation (1)) of essentially arbitrary
mathematical form.  The rate law for lactate dehydrogenase, given below,
is an example of a nonlinear model function.
X
X In the simplex method used for solving the fitting problem, one
constructs an M dimensional polyhedron with M + 1 vertices (the
simplex).  Each dimension of the simplex corresponds to a variable
parameter of equation (4).  Each vertex of the simplex is a point in the
M dimensional space, which is called "parameter space" or "factor
space." The M coordinates of each vertex are values of the M parameters.
Thus each vertex of the simplex has an associated value of the least
squares function y. The starting simplex is constructed to be so large
as to include within it the point corresponding to the minimum value of
y. This minimum point has as its coordinates the best-fit values of
the parameters.
X
X The minimization process shrinks the simplex about the minimum point,
even though the coordinates of the minimum are not known beforehand,
until the vertices of the simplex are so close together and so nearly
equal that an exit test is satisfied.  The exit test is set so that a
desired level of accuracy is obtained.  The values of the M parameters
averaged over all the vertices, i.e., the parameter values for the centroid
of the simplex, serve as reliable estimates of the best-fit parameter
values (those for the least squares function minimum), because the
minimum point is known to be inside the shrunken simplex and thus near
the centroid.
X
X We generally want to estimate the uncertainties in the parameter values
obtained for a model fit to a particular set of data points.  To this end,
one calculates standard deviations of the parameters.
X
X There are likely to be large uncertainties in the parameters if there
are few data points or if there are large deviations between Ymodel and
Yobserved.  As a rule, one should have 5 to 10 times as many data points
as parameters.
X
X The first try at estimating uncertainties of the parameters can fail.
The calculation involves matrix inversion, the use of differences
between nearly equal large numbers, and the approximation of a complex
surface by a simple quadratic function.  It may be necessary to change
certain test values and then to repeat the calculation of the standard
deviations.  In particular, if a parameter is close to a bound, so that
expansion of the simplex in that dimension is not possible, then that
parameter should be fixed in the quadratic fit.
X
X All fitting methods can fail.  We will not discuss problems with
bounds, local minima, ill-behaved functions, poor quality data,
physically unreasonable best fits, etc.  References given below should be
read for more complete discussions of the fitting problem.
X
X For additional discussion of fitting by use of the simplex
method, see "Notes on the fitting program", and the article by
Nelder and Mead (1965).
X
X
X
THE FUNCTION FIT TO THE DATA
X
X The function to be fit to data obtained in steady-state kinetic
experiments with lactate dehydrogenase is for an ordered ternary-complex
pathway with dead-end complexes (EAP and EQB):  
X
X                    EAP
X                     |   KPInhib = ea * p/eap
X                     |
X          ----------EA-----------
X   k1 * a | k-1             k-2 | k2 * b
X          |                     |
X          |                     |
X          E               EAB <---> EPQ                 eq. 5
X          |                     |
X          |                     |
X       k4 | k-4 * q     k-3 * p | k3
X          ----------EQ-----------
X                     |
X                     |   KBInhib = eq * b/eqb
X                    EQB
X
where E is lactate dehydrogenase, A is NADH, B is pyruvate, P is
lactate, and Q is NAD.  Lower case letters denote reactant or product
concentrations.
X
X This pathway is more complicated than the one without dead-end
complexes, which is the basis of the standard graphical methods of
analysis of two-substrate--two-product reactions:
X
X          -----------EA----------
X          |                     |
X          |                     |
X          E               EAB <---> EPQ                 eq. 6
X          |                     |
X          |                     |
X          -----------EQ----------
X
X For the direction of reaction pyruvate reduction by NADH and the
product inhibitor lactate, the rate law for the pathway of equation
X(5) is:
X
X     vo  =  V / [ 1  +  KmQ/KmPQ * p  +  KmA/a       eq. 7
X
X               +  KmB/b * (1 + KmQ/KmPQ * p) * (1 + 1/KPInhib * p)
X
X               +  KmAB/(a * b) * (1 + KmQ/KmPQ * p)
X
X               +  k3/(k3 + k4) * 1/KBInhib * b ]
X
X The presence in the pathway of equation (5) of the dead end complexes
XEAP and EQB leads to a significantly more complicated rate law than is
found for the simpler pathway of equation (6); compare equation (7) with
the following equation, for the "bare" compulsory order pathway without
dead-end complexes (the pathway of equation (6)):
X
X     vo  =  V / [ 1  +  KmQ/KmPQ * p  +  KmA/a       eq. 8
X
X               +  KmB/b * (1 + KmQ/KmPQ * p)
X
X               +  KmAB/(a * b) * (1 + KmQ/KmPQ * p) ]
X
X Measurements of the initial rate, vo, made as a function of the
concentrations of NADH, pyruvate, and lactate are fit with equation (7),
by use of the program "ldhfit".  The parameters evaluated in the fitting
are those of equation (7) and are listed in Table I.
X
X Several points should be noted regarding equations (7) and (8):  (1) The
last term of equation (7), containing the equilibrium constant KBInhib,
probably can be neglected under initial rate conditions with q=0; in the
fitting this is recognized by setting the last term at a small value,
X1E-10, which eliminates it.  (2) With this change, equations (7) and (8)
are identical except for the appearance in one term of equation (7) of a
factor containing the equilibrium constant KPInhib, which is for
dissociation of the dead-end complex EAP defined in the pathway of
equation (5).
X
X
X
REFERENCES:
X
A simplex method for function minimization.
J.A. Nelder and R. Mead (1965. Computer J. 7, 308.
X
Digital computer user's handbook.
M. Klerer and G.A. Korn (1967). Mcgraw-Hill, New York.
X
Data analysis in biochemistry and biophysics.
M.E. Magar (1972). Academic, New York.
X
The solution of the general least squares problem with special
reference to high-speed computers.
R.H. Moore and R.K. Ziegler (1960). Los Alamos Scientific
Laboratory Report LA-2367.
X
X
X
TABLE I:
X
X                         EQUATION (7)
XFITTING                  RATE LAW
PARAMETERS               PARAMETERS
X
X1                        V
X
X2                        KmA  =  KmNADH
X
X3                        KmB  =  KmPyruvate
X
X4                        KmAB =  KmNADH-Pyruvate
X
X5                        KmQ/KmPQ = KmNAD/KmLactate-NAD
X
X6                        1/KPInhib = 1/KInhibLactate
X
X7                        k3/(k3 + k4) * 1/KInhibPyruvate
X                              parm(7) approx. equal to 0 at t=0
X
X
INDEPENDENT VARIABLES:
X
a = [NADH]     b = [Pyruvate]    p = [Lactate]    q = [NAD]
X                                                  q = 0 at t=0
X
X
DEPENDENT VARIABLE:
X
vo = initial rate of conversion of pyruvate to lactate
X
END_OF_FILE
if test 11843 -ne `wc -c <'./Doc/Data_redn.doc'`; then
    echo shar: \"'./Doc/Data_redn.doc'\" unpacked with wrong size!
fi
# end of './Doc/Data_redn.doc'
fi
if test -f './Doc/Simplex.doc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'./Doc/Simplex.doc'\"
else
echo shar: Extracting \"'./Doc/Simplex.doc'\" \(7765 characters\)
sed "s/^X//" >'./Doc/Simplex.doc' <<'END_OF_FILE'
X NOTES ON THE SIMPLEX ALGORITHM
X
X
J. A. Rupley, Tucson, Arizona
X
X
X
SIMPLEX METHOD OF FUNCTION MINIMIZATION
X
X     The fitting of a model with variable parameters to a set of data
is handled by minimizing an error function, such as the weighted
sum of squares of the differences between observed values and values
calculated according to the model.
X
X     We describe here a simplex approximation procedure for estimating
the parameter values that give a function minimum.  The strategy is that
of Nelder and Mead, and their article (Computer Journal, vol 7, p 308,
X1965) should be consulted for more information than given below.
X
X     A simplex is constructed in parameter space, with vertices that are
arbitrary except they describe a volume that includes the minimum point.
X
X     The position of each vertex in parameter space is associated with a
set of parameter values, for which one calculates the function value for
that vertex.  The simplex has (M + 1) vertices, where M is the number of
free parameters.
X
X     In successive iterations the vertex with the highest function value
is moved, to obtain a new vertex position of smaller function value.
The movement is directed with respect to the center of the reduced
simplex, which is the simplex less the highest vertex.  The allowed
movements are the following:
X
X     reflection;
X     expansion beyond the reflected position;
X     reflection after failed expansion;
X     contraction toward the center of the reduced simplex from
X          the original position;
X     contraction from the reflected position;
X     if none of these operations gives a lower function value,
X          then the entire simplex is shrunk about the lowest
X          vertex.
X
X     Exit from the iterative minimization follows if the fractional RMS
deviation of the function values at the vertices is less than a test
value and if the centroid of the simplex is within two RMS deviations of
the mean of the vertices.  The default test value is 1E-8 times the mean
function value.
X
X     Thus at exit from the minimization, the centroid of the simplex
gives an arbitrarily close approximation of the parameter values at the
function minimum.
X
X     The routine that calculates the function values must be written for
the model to be optimized.
X
X     A text file is read to enter data and starting parameter and
control values.
X
X     Results can be written selectively to a file, in addition to their
display on the terminal.
X
X
X
QUADRATIC FIT TO THE LEAST SQUARES FUNCTION SURFACE
X
X     Standard deviations of the parameters are calculated by fitting a
quadratic function to the surface about the minimum and then using the
properties of this function to calculate the variance-covariance matrix.
The strategy is a modification of that of Nelder and Mead (1965).
X
The contracted simplex obtained through the minimization process is used
to define a new coordinate system of M oblique axes in parameter space.
The origin is at the centroid of the simplex.  Each axis corresponds to
a free parameter.  The unit value (the scale) for each axis is set in the
new system at the average of the absolute values of the deviations of
the vertices of the simplex from the centroid value.
X
In effect, one constructs in the new coordinate system a new simplex,
based on the one obtained by the minimization process.  The (M + 1)
vertices of the new construct are at the centroid (the zero of the new
system) and at the unit points on each free parameter axis.  The values
of the least squares function at these vertices describe a surface.  It
is this least squares surface to which a quadratic function is to be
fit.
X
X     If appropriate, the scales of the axes (the unit values) in the new
coordinate system are adjusted to give better behavior of the surface in
the vicinity of the minimum.
X
X     The diagonal matrix Q transforms the new coordinate system back to
the original system.  The elements of Q are for each parameter the
X(adjusted) average differences between the vertices and the centroid of
the original simplex.
X
X     The function value, y, in the region near the function minimum can
be approximated in the new coordinate system by the quadratic vector
function:
X
X      y  =  a(o)   +   2 a'.x   +   x'.B.x                   (1)
X
X     The elements of the vector x are the values of the M free
parameters at a point in parameter space.  The scalar a(o), the vector a
and the matrix B are determined by the shape of the surface y near the
minimum and can be calculated with the use of simple numerical
approximations, evaluated at x = 0 (the position of the centroid of the
original simplex):
X
X        a(0)   =   y(0)
X
X        ai     =   0.5 * (dy/dxi)
X               =   0.25 * (yi - y-i)
X
X        bij    =   0.5 * (d2y/dxi.dxj)
X               =   0.25 * (yij + y-i-j + 2 * y(0)
X                                   - y-i - y-j - yi - yj)
X
X        bii    =   0.5 * (d2y/dxi2)
X               =   0.5 * (yi + y-i - 2 * y(0))
X
X     The point x = 0 (the centroid position) may not be the true
position of the function minimum, although it is assumed to be close to
it.  A refined estimate of the minimum position, based on the quadratic
approximation of equation (1) above, is given by x(min):
X
X     xmin = - B-1.a
X
X
X     The function minimum at x(min) is given by y(min) :
X
X     ymin = a(0) - a'.B-1.a
X
X
X     The position of the minimum in the original coordinate system is
obtained by back transformation of x(min) with the matrix Q, giving the
point p(min):
X
X     pmin = p(0) - Q'.B-1.a
X
X     As a test of the quality of the quadratic approximation (1), there
should be close agreement between the function values y(min), y(p(min)),
and y(centroid), and between the sets of parameter values defining the
points p(min) and the centroid.
X
X     The variance-covariance matrix C is given by
X
X     C = Q'.B-1.Q
X
X     A diagonal element of C, multiplied by the mean square error (MSE),
gives the variance of the corresponding parameter:
X
X     var(i) = cii * MSE
X
X     MSE = ymin/(n - m)
X
where the divisor (n - m) = # degrees freedom:  n = # of data points
used in calcn of ymin, and m = # of free parameters; and ymin = the weighted
sum of residuals squared.
X
X     The standard deviation is the square root of the variance:
X
X     std-dev(i) = sqrt(var(i))
X
X     The quadratic fit can fail, giving negative values of the diagonal
elements of the variance-covariance matrix or a value of y(p(min)) >
y(centroid).  Failure can come from too tight a simplex, resulting in
too small values of the B matrix elements, or from too large a simplex,
resulting in non-quadratic behavior of the least squares function within
the region defined by the simplex.  The problem of too tight a simplex
does not arise if the precision of the floating point routines of the
program is sufficiently high (e.g., 14 digits).  The problem of
non-quadratic behavior commonly arises in the early stages of the
fitting, when the minimum has not yet been approached closely.  This can
be taken into account by modifying the fitting algorithm, or one can be
patient.  If a parameter moves so close to a bound that expansion of the
simplex toward the bound is not possible, then that parameter should be
fixed, i.e., held invariant.
X
It may be useful to cycle between short sessions of simplex fitting and
quadratic approximation, to more rapidly approach the least squares
function minimum.  Toward this end, the more accurate approximation of
the function minimum given by p(min) is used to replace the previously
obtained centroid value of the simplex.  That is, the simplex returned
to the simplex fitting routine would have M vertices at the "unit"
positions on the parameter axes and the (M+1)th vertex at p(min), not at
the "origin", i.e., the old centroid value.
X
END_OF_FILE
if test 7765 -ne `wc -c <'./Doc/Simplex.doc'`; then
    echo shar: \"'./Doc/Simplex.doc'\" unpacked with wrong size!
fi
# end of './Doc/Simplex.doc'
fi
echo shar: End of archive 4 \(of 4\).
cp /dev/null ark4isdone
MISSING=""
for I in 1 2 3 4 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 4 archives.
    rm -f ark[1-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0