Wednesday, August 17, 2011

Completed Program in MATLAB for error surface minimization

To fit maximum productivity to data.... that may have not met it's maximum productivity yet.
I wrote this code. Of course, in the current state it works for MY DATA. But you can make it work for yours, too!


% site_index.m


% this program will calculate maximum productivity for ws1 plots


% we base this on the idea that biomass follows a logistic function. the
% logistic function is defined as B(t) = K/(1+Ce^(-r*t)). We would like to
% fit a logistic function to our biomass curves and then solve for where
% the derivative of biomass (productivity) is maximized (or where the
% second derivative of biomass is 0). To do this we must fit a logistic
% function to the biomass data.


% Import biomass data. this data should be in a file with no headers and no
% plot ids, just literally columns of biomass in your interval years like
% so
% 0.007 | 0.01 | 0.100 |  etc.


file = importdata('biomat.csv',',');
data = file;


% Create a vector T representing the number of
% years after the first measurement


T =[0;4;8;12;16;22;28];


%dataT is transposed data. Now you should have an array that is number of plots
%columns and number of remeasurements rows.


dataT = data';


% now we will determine an appropriate range for "r" which is the rate of
% growth that goes into the logistic equation. To do this, following the
% methods I have outlined, we need to know the productivity rate at the
% point of inflection or at the end of the exponential growth period,
% whichever is applicable


%prod_rate is a vector that will hold the ANPP values between each interval


prod_rate = zeros(6, length(dataT));


%years is a vector that holds the interval length between periods.
%This can be calculated or just done in your brain, which is easier for
%this example.


years = [4 ; 4 ; 4 ; 4 ; 6 ; 6];


%calculate the difference in biomass for each interval, divide by the
%length of duration of the interval, store the results in the vector
%prod_rate


% sample below shows how to calculate ANPP for just one plot. recall that
% because we are fitting to the logistic we assume mortality is inclusive.


prod_rate(1,1) = (dataT(2,1)- dataT(1,1))/years(1);
prod_rate(2,1) = (dataT(3,1)- dataT(2,1))/years(2);
prod_rate(3,1) = (dataT(4,1)- dataT(3,1))/years(3);
prod_rate(4,1) = (dataT(5,1)- dataT(4,1))/years(4);
prod_rate(5,1) = (dataT(6,1)- dataT(5,1))/years(5);
prod_rate(6,1) = (dataT(7,1)- dataT(6,1))/years(6);


% loop through all plots, the number of plots is(length(dataT))
% and for each interval (i) and store the results in prod_rate
for j = 2:length(dataT)
    for i=1:6
        prod_rate(i,j) = (dataT(i+1,j)-dataT(i,j))/years(i);
    end
end


%create a vector telling us what the maximum productivity is:
inflection_value = zeros(length(prod_rate),1);


inflection_value(1,1) = max(prod_rate(:,1));
for j = 2:length(dataT)
    inflection_value(j,1) = max(prod_rate(:,j));
end


%indexOfMaxValue is an index that tells us the interval time when the
%maximum productivity occurs.


% we loop through the productivity rates to find when the rate is equal to
% the maximum rate


indexOfMaxValue = zeros(length(prod_rate),1);


indexOfMaxValue(1,1) = find(prod_rate(:,1) == max(prod_rate(:,1)));
for j = 2:length(prod_rate)
    indexOfMaxValue(j,1) = find(prod_rate(:,j)== max(prod_rate(:,j)));
end


% slopeAtInflection is a vector that we will use to hold point-slope values
% for "r" at the inflection point found in indexOfMaxValue


slopeAtInflection = zeros(length(prod_rate),1);


% oneAhead holds the biomass value one position ahead of the max.
% If the max occurs at the end of the interval, then the final value is
% the max value, and we will compute only the left sided slope


oneAhead = zeros(length(dataT),1);


oneAhead(1) = dataT((find(prod_rate(:,1) == max(prod_rate(:,1)))+1),1);
for j = 2:length(dataT)
    if indexOfMaxValue(j)~=6
        oneAhead(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))+1),j);
    else
        oneAhead(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))),j);
    end
end


% oneBehind finds the value one position behind the max. If the max occurs
% at the end of the interval the same procedure applies.


oneBehind = zeros(length(dataT),1);
oneBehind(1) = prod_rate((find(prod_rate(:,1) == max(prod_rate(:,1)))-1),1);
for j = 2:length(dataT)
    %if indexOfMaxValue(j)~=6
    oneBehind(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))-1),j);
    % else
    %  oneBehind(j) = prod_rate((find(prod_rate(:,j) == max(prod_rate(:,j)))),j);
    % end
end


% find the slope at the inflection point. This will help us define an
% appropriate range for "r".


% Initially I attempted to do this using the indexes, which was not correct
% ...do not use! Keeping in here as a lesson.
%slopeAtInflection(1,1) = (oneAhead(1)-oneBehind(1))/((indexOfMaxValue(1) + 1) -(indexOfMaxValue(1) -1));


% slopeAtInflection is the point slope calculated by the change in Y (bio)
% over the change in X (years). When the inflection is at the end of the
% interval, it's indexOfMaxValue is 6. In these cases, the point slope is
% calculated between position 6 and position 5, rather than over two
% intervals (5 to 3, for example).


slopeAtInflection(1,1) = (oneAhead(1)-oneBehind(1))/(T((indexOfMaxValue(1) + 1)) - T((indexOfMaxValue(1) -1)));
for j = 2:length(dataT)
    if indexOfMaxValue(j) ~=6
        slopeAtInflection(j) = (oneAhead(j)-oneBehind(j))/ (T((indexOfMaxValue(j)+1))-(T(indexOfMaxValue(j)-1)));
    else
        slopeAtInflection(j) =(oneAhead(j)-oneBehind(j))/ (T((indexOfMaxValue(j)))-(T(indexOfMaxValue(j)-1)));
    end
end


% K is the absolute largest biomass on our watershed
% this servs as the asymptote for the initial guess
% because of the method of calculation we will be using, the purpose of K
% is only to determine a range for "r" and it does not affect the fitting


K = max(max(dataT));


% alternate values of K can be used to check the effects of K on the output
%K = 200;
%K = 400;
%K = 600;


% using the power of differential equations we solved the ANPP function at
% the inflection point to find that we can approximate r = 4*slopeAtInflectionPoint/K
% this is only to determine a safe range for r.


r = zeros(length(slopeAtInflection),1);
r = (4.*slopeAtInflection)/K;


% now we will conduct a safe fitting procedure for r. It is generally
% advised that we look at the "r" values generated by the previous code,
% find their maxes and mins, and select a range encompassing these. In this
% case we will say that "r" is not less than 0.01 nor greater than 0.90, and
% we will graph in 50 equal increments. Adding more increments slows the
% processing.


% we will say that the inflection point must be between 0 and 28. We could
% push it out further if we wished to change the output (see the second t0)
% which is worth testing.


% linspace is a command that defines linearly spaced vectors. In a sense
% this means it creates an algerbraic grid that can be bent. If you want to
% see the linspace, uncomment figure 1 below.


% I have added extra subroutines for calculating special linspaces for
% larger and smaller parameterizations. i is for 10 as a maximum
% inflection occurance. nothing is for 50. ii is for 75 as a maximum
% inflection occurance. and iii is for 100 as a maximum 


linR=linspace(0.01,1,100);
t0=linspace(0,75,50);
%t0=linspace(0,20,50);
[linR,t0]=meshgrid(linR,t0);


%-------------------------------------------------------------
% the fitting part. I still haven't figured out how to loop through the
%
% For the first plot, this is how you will fit the parameters


% if I ever get the loops right these matrices will hold all the outputs.
%rD1mat=zeros(length(dataT),1);
%rD2mat=zeros(length(dataT),1);
%Krepmat=zeros(length(dataT),1);


%for k=1:length(dataT)


% [m,n] are the dimensions of the the errorspace, which is the same size as 
% the linear space (linR) we just defined. e is what we will call the error
% space


% MYERROR NOTES: 
% you will need to define a function "myerror" or whatever you want to call
% it outside of this program. The following code should be used:
% (make sure to comment it back it with one comment, highlight and press
% control + e.
%-----------------------MY ERROR-------------------------------------
% %function [ e] = myerror( x,t,m )
% %%UNTITLED3 Summary of this function goes here
% %   Detailed explanation goes here
% %r=x(1);
% t0=x(2);
% h=1./(1+exp(-r*(t-t0)));
% e=m'*m-(h'*m)^2/(h'*h);

% end
%---------------------------------------------------------------------


[m,n] = size(linR);
e = zeros(size(linR));
for i=1:m
    for j=1:n
        e(i,j) = myerror([linR(i,j); t0(i,j)], T, dataT(:,3));
    end
end


% % images of the first plot-- this one shows you the 3d errorspace. it
% eats memory. be careful!


% figure(1)
% mesh(linR, t0, e);
% xlabel('r')
% ylabel('inflection point')
% zlabel('e')
% title('plotting error versus parameters for the first plot');


% this one shows a contour plot where the error is minimized. it is less
% hungry. it is good to check these to be sure you don't have multiple
% minima.


figure(2)
contour(linR, t0, e,50)
xlabel('r')
ylabel('inflection');
title('contour map of error surface for plot 103');


% min finds the minimum error of that plot using the matlab function fminsearch
% the first output is the value of "r" and the second output is the inflection
% point (t0) when it occurs.
min = fminsearch(@myerror, [0.15, 50],[],T,dataT(:,3));


% pull out the values for the "r" and "inflection point" from your
% minimized error function
rD1 = min(1);
%rD1mat(k) = rD1;
rD2 = min(2);
%rD2mat(k) = rD2;


% H is from the linear proof which makes K a function of r and t0.
% Now solve for maximum biomass,which is Krep.


H = 1./(1+exp(-rD1*(T-rD2)));
Krep=(H'*dataT(:,3))/(H'*H);
%Krepmat(k) = Krep;


% figure 3 is your best fit plot. This is the one you want to use for
% output


% figure(3)
% t = linspace(0,100);
% y = Krep./(1+exp(-rD1*(t-rD2)));
% plot(T, dataT(:,2), 'o',t, y);
% xlabel('years after 1980')
% ylabel('biomass in Mg/Ha')
% title ('MLE of logistic fit based on minimized error surface')
% %end




% % for the fourth plot this is what you do
% [m4,n4] = size(linR);
% e = zeros(size(linR));
% for i=1:m4
%     for j=1:n4
%         e(i,j) = myerror([linR(i,j); t0(i,j)], T, dataT(:,4));
%     end
% end
%
% % find the minimum error of that plot, the first output is the value of "r"
% % and the second output is the inflection point when it occurs.
% min4 = fminsearch(@myerror, [0.1, 4],[],T,dataT(:,4));
%
% % pull out the values for the "r" and "inflection point" from your
% % minimized error function
% rD1 = min4(1);
% rD2 = min4(2);
%
% % now solve for maximum biomass,which is Krep.
%
% H = 1./(1+exp(-rD1*(T-rD2)));
% Krep4=(H'*dataT(:,1))/(H'*H);

Friday, August 12, 2011

not finished code


% site_index.m

% this program will calculate maximum productivity for ws1 plots

% we base this on the idea that biomass follows a logistic function. the
% logistic function is defined as B(t) = K/(1+Ce^(-r*t)). We would like to
% fit a logistic function to our biomass curves and then solve for where
% the derivative of biomass (productivity) is maximized (or where the
% second derivative of biomass is 0). To do this we must fit a logistic
% function to the biomass data.

% import biomass data

file = importdata('biomat.csv',',');
data = file;

% years after the first measurement
T =[0;4;8;12;16;22;28];

%transpose data

dataT = data';
prod_rate = zeros(6, length(dataT));

%create a vector "years" that holds the interval length between periods
years = [4 ; 4 ; 4 ; 4 ; 6 ; 6];

%calculate the difference in biomass-- later just import the ANPP matrix
%but we're doing this for now... no MORTALITY included at the moment!

% sample below shows how to do it for just one plot
prod_rate(1,1) = (dataT(2,1)- dataT(1,1))/years(1);
prod_rate(2,1) = (dataT(3,1)- dataT(2,1))/years(2);
prod_rate(3,1) = (dataT(4,1)- dataT(3,1))/years(3);
prod_rate(4,1) = (dataT(5,1)- dataT(4,1))/years(4);
prod_rate(5,1) = (dataT(6,1)- dataT(5,1))/years(5);
prod_rate(6,1) = (dataT(7,1)- dataT(6,1))/years(6);

% loop through all plots (lengthdataT) and for each interval (i) and store
% the results in the array "prod_rate"
for j = 2:length(dataT)
    for i=1:6
        prod_rate(i,j) = (dataT(i+1,j)-dataT(i,j))/years(i);
    end
end

%create a vector telling us what the maximum productivity is:
inflection_value = zeros(length(prod_rate),1);

inflection_value(1,1) = max(prod_rate(:,1));
for j = 2:length(dataT)
    inflection_value(j,1) = max(prod_rate(:,j));
end

%find that maximum productivity and WHEN IT OCCURS by interval time
% these are our initial guesses for the inflection point
indexOfMaxValue = zeros(length(prod_rate),1);

indexOfMaxValue(1,1) = find(prod_rate(:,1) == max(prod_rate(:,1)));
for j = 2:length(prod_rate)
    indexOfMaxValue(j,1) = find(prod_rate(:,j)== max(prod_rate(:,j)));
end

% to find the slope at the inflection point, which will generate the best
% guess value for "r" we need to find

slopeAtInflection = zeros(length(prod_rate),1);

% find the value one position ahead of the max. If the max occurs at the
% end of the interval, then save that value as the "one ahead" value
oneAhead = zeros(length(dataT),1);

oneAhead(1) = dataT((find(prod_rate(:,1) == max(prod_rate(:,1)))+1),1);
for j = 2:length(dataT)
    if indexOfMaxValue(j)~=6
    oneAhead(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))+1),j);
    else
        oneAhead(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))),j);
    end
end

% find the value one position behind the max. If the max occurs at the end
% of the interval it's cool.
oneBehind = zeros(length(dataT),1);
oneBehind(1) = prod_rate((find(prod_rate(:,1) == max(prod_rate(:,1)))-1),1);
for j = 2:length(dataT)
    %if indexOfMaxValue(j)~=6
    oneBehind(j) = dataT((find(prod_rate(:,j) == max(prod_rate(:,j)))-1),j);
   % else
      %  oneBehind(j) = prod_rate((find(prod_rate(:,j) == max(prod_rate(:,j)))),j);
   % end
end

% find the slope at the inflection point. This will help us find R.
slopeAtInflection(1,1) = (oneAhead(1)-oneBehind(1))/((indexOfMaxValue(1) + 1) -(indexOfMaxValue(1) -1));

for j = 2:length(dataT)
    if indexOfMaxValue(j) ~=6
        slopeAtInflection(j) = (oneAhead(j)-oneBehind(j))/ ((indexOfMaxValue(j)+1)-(indexOfMaxValue(j)-1));
    else
        slopeAtInflection(j) =(oneAhead(j)-oneBehind(j))/ ((indexOfMaxValue(j))-(indexOfMaxValue(j)-1));
    end
end

% absolute largest biomass on our watershed serves as the resounding
% asymptote for the initial guess

K = max(max(dataT));

% using the power of differential equations we solve to find that r =
% 4*slopeAtInflectionPoint/K

r = zeros(length(slopeAtInflection),1);
r = (4.*slopeAtInflection)/K;

% we will say that "r" is not less than 0.01 nor greater than 0.90, and we
% will graph in 50 equal increments.
% we will say that the inflection point must be between (logically) 1 and 6

linR=linspace(0.01,0.9,50);
t0=linspace(0,28,50);
[linR,t0]=meshgrid(linR,t0);

% for the first plot this is what you do
[m,n] = size(linR);
e = zeros(size(linR));
for i=1:m
    for j=1:n
        e(i,j) = myerror([linR(i,j); t0(i,j)], T, dataT(:,1));
    end
end

% % images of the first plot
% figure(1)
% mesh(linR, t0, e);
% xlabel('r')
% ylabel('inflection point')
% zlabel('e')
% title('plotting error versus parameters for the first plot');
%
% figure(2)
% contour(linR, t0, e,50)
% xlabel('r')
% ylabel('inflection');
% title('contour map');

% find the minimum error of that plot, the first output is the value of "r"
% and the second output is the inflection point when it occurs.
min = fminsearch(@myerror, [0.1, 4],[],T,dataT(:,1));

% pull out the values for the "r" and "inflection point" from your
% minimized error function
rD1 = min(1);
rD2 = min(2);

% now solve for maximum biomass,which is Krep.

H = 1./(1+exp(-rD1*(T-rD2)));
Krep=(H'*dataT(:,1))/(H'*H);

% for the fourth plot this is what you do
[m4,n4] = size(linR);
e = zeros(size(linR));
for i=1:m4
    for j=1:n4
        e(i,j) = myerror([linR(i,j); t0(i,j)], T, dataT(:,4));
    end
end

% find the minimum error of that plot, the first output is the value of "r"
% and the second output is the inflection point when it occurs.
min4 = fminsearch(@myerror, [0.1, 4],[],T,dataT(:,4));

% pull out the values for the "r" and "inflection point" from your
% minimized error function
rD1 = min4(1);
rD2 = min4(2);

% now solve for maximum biomass,which is Krep.

H = 1./(1+exp(-rD1*(T-rD2)));
Krep4=(H'*dataT(:,1))/(H'*H);

Saturday, July 23, 2011

Wednesday, July 6, 2011

sparklines in a table in LaTeX


\begin{table} [h]
\begin{center}
\begin{tabular*}{\hsize}{@{\extracolsep{\fill}}| l | l l l|}
\hline
PlotID & Biomass 1980 & BA/Ha 1980 & $\%$ HW \\
\hline
Upper Quartile & & &\\
\textbf{110} & 17.5 Mg/Ha & 6.05 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{110sparkhw.png}\\
\textbf{316} & 10.86 Mg/Ha & 3.92 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.2]{316sparkhw.png} \\
\textbf{212} & 7.31 Mg/Ha & 3.19 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{212sparkhw.png} \\
\textbf{213} & 7.23 Mg/Ha & 1.14 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{213sparkhw.png} \\
\textbf{109} & 6.31 Mg/Ha & 2.39 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{109sparkhw.png}\\
\textbf{108} & 4.85 Mg/Ha & 1.63 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{108sparkhw.png}\\
\textbf{214} & 4.10 Mg/Ha & 1.77 $\text{m}^2/\text{Ha}$ & \includegraphics[scale=0.3]{214sparkhw.png} \\
\hline
\end{tabular*}
\end{center}
\caption{Upper Quartile plots 1980 measurements and HW trajectory}
\label{tab:bioassay1}
\end{table}

In short, you can use insert graphic in the table.
Putting the "Upper Quartile" Line enables you to keep your graphic from eating your header. Which they like to do.

Friday, July 1, 2011

neat matlab program I wrote

%pcabio.m

% to compute and graph PCA for stand biometrics, as well as measure distances between points!
% this file has the 10 stand metrics currently measured. The first row is
% numeric. The first column is plot names.

%------------------------------------------------------------------------
% names of files to load-- comment in what you want
%load PCAEX80.dat;
load PCABIO1984.dat
%load PCABIO2007.dat;

format short g

% creation of the file headers and column numbers. Different columns in
% early file due to no ANPP

%For 2007
% fid = fopen('PCABIO2007.dat');
% C = textscan(fid,'%s %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32');
% fclose(fid);

%for 1984
fid = fopen('PCABIO1984.dat');
C = textscan(fid,'%s %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32');
fclose(fid);

%For files that are 80
%  fid = fopen('PCAEX80.dat');
%  C = textscan(fid,'%s %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32 %f32');
%  fclose(fid);

% these are the headings that correspond to the stand metrics, in order,

%2007
%headers = {'HARDWOOD' ,'BIO' ,'PERHARD','ANPP', 'BASAL', 'NUMT', 'STEMD', 'SHANNON', 'ASSAYNO','ASSAYBIO','NUMHW','PERNUMHW','STEMDHW','PERSTEMDHW'};

%1984
headers = {'HARDWOOD' ,'BIO' ,'PERHARD','ANPP', 'BASAL', 'NUMT', 'STEMD', 'ASSAYNO','ASSAYBIO','NUMHW','PERNUMHW','STEMDHW','PERSTEMDHW','SHANNON'};

% 1980
%headers = {'HARDWOOD' ,'BIO' ,'PERHARD','BASAL', 'NUMT', 'STEMD', 'ASSAYNO','ASSAYBIO','SHANNON','NUMHW','PERHWNUM','STEMDHW','PERSTEMDHW'};

% create a matrix that does not include the first row of PCAEX07 because it
% is text for the plot numbers (see above string in %s)

%X = PCAEX80(:,2:14);
X = PCABIO1984(:,2:15);
%X = PCABIO2007(:,2:15);

%------------------------------------------CONDUCTING PCA!!!------------

% if there are no NaN's!

[PC, scores, latent] = princomp(zscore(X));

% if NaN's are included, you must centralize and standardize each row
% individually, so comment in this code!

% % subtract out the mean
% for i = 1:length(X)
% thisRow = X(i,:);
% thisRow(isnan(thisRow)) = [];
% stdX(i) = std(thisRow);
% meanX(i) = mean(thisRow);
% centerX = X - repmat(meanX,10,1);
% centerStandardX = centerX./repmat(stdX,10,1);
% end


% % conduct principle components analysis
% [PC, scores, latent] = princomp(centerStandardX);

%----------------DISPLAYING THE CUMULATIVE OUTPUT OF THE VARIANCE!---

account = cumsum(latent)./sum(latent);
disp(account);

% name eigenvectors to the vectors that you are about.
vect1 = scores(:,1);
vect2 = scores(:,2);

% plot the principle components (FACTOR ANALYSIS PLOT) with the plot
% numbers-- must make sure these are STRINGS and not FLOATS/INTS
figure(1)
plot(vect1,vect2,'b.')
xlabel('1st Principal Component')
ylabel('2nd Principal Component')
title('Principal Components of Stand Biometerics')
text(vect1+0.1, vect2, C{1}, 'FontSize',8);

% plot a scree plot of cumulative variance explained
figure(2)
pareto(latent)
xlabel('Principal Component')
ylabel('Variance Explained (%)')
title('Scree plot of variance explained and CDF')

% plot a biplot for the first two axes... can be done for 3 as well, if
% needed, but is hard to interpret in my opinion. if disp(account)'s second
% entry is bigger than about 0.55, I don't do it.

figure(3)
biplot(PC(:,1:2), 'scores',scores(:,1:2),'varlabels',headers')


% for finding the distance between the sets of points and printing them out
% by labels! Will be used later in geostats

b = zeros(length(vect1),2);
for i = 1: length(vect1)
    b(i,1) = vect1(i);
    b(i,2) = vect2(i);
end

k = 0;
for i = 1:length(b)
    for j = 1:length(b)
        if j~=i
            k=k+1;
            index(k,:) = [i j];
        end;
    end;
end;

dist = ((b(index(:,2),1)-b(index(:,1),1)).^2 + (b(index(:,2),2)-b(index(:,2),2)).^2).^0.5;

% if you want to print it to the scree-- WARNING IT WILL OVERLOAD YOUR
% MEMORY IF YOU ARE RUNNING MUCH ELSE!

% fprintf(' #1 #2 distance\n');
% fprintf('%3d %3d %10.6f\n', ...
%    [index(:,1),index(:,2),dist].');
% fprintf('Total %11.6f\n',sum(dist));

m = [index(:,1),index(:,2),dist];

% output the distance matrix information to a file that saves in the Matlab
% directory. I think this format will be adaptable well to SPSS, but it
% would be better for Matlab to have in matrix form!

%dlmwrite('biometric_distances_80.csv',m,',');
dlmwrite('biometric_distances_804.csv',m,',');
%dlmwrite('biometric_distances_07.csv',m,',');

Sunday, June 26, 2011

This is kind of an elegant little solution

This code can be used to find the distance between x locations (stored in vect1) and y locations (stored in vect2). I modified this from something I found online. Works pretty good and I like the out put, also optimal in terms of speed.
Using this to construct similarity ranks! win!

% for finding the distance between the sets of points and printing them out
% by labels!

b = zeros(length(vect1),2);
for i = 1: length(vect1)
    b(i,1) = vect1(i);
    b(i,2) = vect2(i);
end

k = 0;
for i = 1:length(b)
    for j = 1:length(b)
        if j~=i
            k=k+1;
            index(k,:) = [i j];
        end;
    end;
end;

dist = ((b(index(:,2),1)-b(index(:,1),1)).^2 + (b(index(:,2),2)-b(index(:,2),2)).^2).^0.5;
m = [index(:,1),index(:,2),dist];

Saturday, June 25, 2011

Soil depth

I expected this but it still seems funny that soils are deepest at low... and high... elevations. 

It's because on this landscape the ridges are relatively flat, and are located between precipices out of the extent of the study area that are eroding soils onto them. Still. Funny. 

Friday, June 24, 2011

best help manual I've ever seen

http://www.box.net/shared/mugpnnjy21

Ikea couldn't have done it better.

How to read text for matlab

fid = fopen('biomatlabels.dat');
C = textscan(fid,'%s %f32 %f32 %f32 %f32 %f32 %f32 %f32');
fclose(fid);

The first line opens the file. You must save it as a .dat file!

The second line scans the file. it is also important to name it fid.
%s means "string" (like a word)
%f32 means "number"(in the broadest sense-- there are options for decimals, too)

The third line closes the file. When you want to call up one of your labels, call it as
C{1} or C{2}-- it saves as a cell array so brackets are key!

Wednesday, June 8, 2011

a code for stepwise regression with twelve possible fitting parameters

% litter.m


file = importdata('biomass_workbook2.csv',',');


% names of information | bio in 1980 | percenthardwood in 1980 |
% hardwood bio 1980|...etc.
bio80 = file(:,1);
perhard80 = file(:,2);
hardbio07 = file(:,3);
perhard07 = file (:,4);
litterdat = file(:,5);
anpp1 = file(:,7);
basal07 = file(:,8);
anpp6 = file(:,9);
bio07 = file(:,11);
num_tree = file(:, 12);
stemden = file(:,13);
logherb = file(:,14);
hwbio80 = file (:,15);


% poly fit finds the parmaters (pi) and the norms of the residuals (si) for
% linear and poly combinations of variables


[p1,s1] = polyfit(bio80, litterdat, 1);
[p2,s2] = polyfit(perhard80, litterdat, 1);
[p3,s3] = polyfit(hardbio07, litterdat, 1);
[p4,s4] = polyfit(perhard07, litterdat, 1);
[p5,s5] = polyfit(anpp1, litterdat, 1);
[p6,s6] = polyfit(basal07, litterdat, 1);
[p7,s7] = polyfit(anpp6, litterdat, 1);
[p8,s8] = polyfit(bio07, litterdat, 1);
[p9,s9] = polyfit(num_tree, litterdat, 1);
[p10,s10] = polyfit(stemden, litterdat, 1);
[p11,s11] = polyfit(logherb, litterdat, 1);
[p12,s12] = polyfit(hwbio80, litterdat, 1);


% polyval finds the correlation matrix
output1 = polyval(p1, bio80);
corr1 = corrcoef(litterdat,output1);


output2 = polyval(p2, perhard80);
corr2 = corrcoef(litterdat,output2);


output3 = polyval(p3, hardbio07);
corr3 = corrcoef(litterdat,output3);


output4 = polyval(p4, perhard07);
corr4 = corrcoef(litterdat,output4);


output5 = polyval(p5, anpp1);
corr5 = corrcoef(litterdat,output5);


output6 = polyval(p6, basal07);
corr6 = corrcoef(litterdat,output6);


output7 = polyval(p7, anpp6);
corr7 = corrcoef(litterdat,output7);


output8 = polyval(p8, bio07);
corr8 = corrcoef(litterdat,output8);


output9 = polyval(p9, num_tree);
corr9 = corrcoef(litterdat,output9);


output10 = polyval(p10, stemden);
corr10 = corrcoef(litterdat,output10);


output11 = polyval(p11, logherb);
corr11 = corrcoef(litterdat,output11);


output12 = polyval(p2, hwbio80);
corr12 = corrcoef(litterdat,output12);


% M is a matrix to hold thePearson's correlation coefficient
m = zeros(12,1);
m(1) = corr1(1,2);
m(2) = corr2 (1,2);
m(3) = corr3 (1,2);
m(4) = corr4 (1,2);
m(5) = corr5 (1,2);
m(6) = corr6 (1,2);
m(7) = corr7 (1,2);
m(8) = corr8 (1,2);
m(9) = corr9 (1,2);
m(10) = corr10 (1,2);
m(11) = corr11 (1,2);
m(12) = corr12 (1,2);


% yh(1) holds the fit of the predicted 


yh1 = polyval(p1,bio80);
yh2 = polyval(p2, perhard80);
yh3 = polyval(p3, hardbio07);
yh4 = polyval(p4, perhard07);
yh5 = polyval(p5, anpp1);
yh6 = polyval(p6, basal07);
yh7 = polyval(p7, anpp6);
yh8 = polyval(p8, bio07);
yh9 = polyval(p9, num_tree);
yh10 = polyval(p10, stemden);
yh11 = polyval(p11, logherb);
yh12 = polyval(p12, hwbio80);


% resid is what is not predicted


resid1 = litterdat - yh1;
resid2 = litterdat - yh2;
resid3 = litterdat - yh3;
resid4 = litterdat - yh4;
resid5 = litterdat - yh5;
resid6 = litterdat - yh6;
resid7 = litterdat -yh7;
resid8 = litterdat - yh8;
resid9 = litterdat - yh9;
resid10 = litterdat - yh10;
resid11 = litterdat - yh11;
resid12 = litterdat - yh12;


rss1 = sum((yh1- mean(litterdat)).^2);
tss1 = sum((litterdat - mean(litterdat)).^2);


% Rsq asks how much better predictedis thanmean
Rsq1 = 1-sum(resid1.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq2 = 1-sum(resid2.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq3 = 1-sum(resid3.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq4 = 1-sum(resid4.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq5 = 1-sum(resid5.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq6 = 1-sum(resid6.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq7 = 1-sum(resid7.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq8 = 1-sum(resid8.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq9 = 1-sum(resid9.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq10 = 1-sum(resid10.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq11 = 1-sum(resid11.^2)/sum((litterdat-mean(litterdat)).^2);
Rsq12 = 1-sum(resid12.^2)/sum((litterdat-mean(litterdat)).^2);


% n is a matrix of R sq
n= zeros(12,1);
n(1) = Rsq1;
n(2) = Rsq2;
n(3) = Rsq3;
n(4) = Rsq4;
n(5) = Rsq5;
n(6) = Rsq6;
n(7) = Rsq7;
n(8) = Rsq8;
n(9) = Rsq9;
n(10) = Rsq10;
n(11) = Rsq11;
n(12) = Rsq12;


% mat is a matrix of predictors
mat = zeros(length(litterdat),12);
mat(:,1:4) = file(:,1:4);
mat(:,5:7) = file(:,7:9);
mat(:,8:12) = file(:,11:15);


% this will do a stepwise fit. b isparameters, se is residuals, pval is
% values.. in model shows the process of gettingthere, statsis stats,next
% step and history are more dtailed
[b, se, pval, inmodel, stats, nextstep, history] = stepwisefit(mat, litterdat);


% model_fit is parameters
model_fit = inmodel.*b';
model_fit = model_fit';


% model 2is the predicted
model2 = zeros(length(mat),1);
model2= 4.4443 + model_fit(4).*mat(:,4) + model_fit(7)*mat(:,7) + model_fit(8).*mat(:,8);


% calculateresiduals
resid_step = litterdat-model2;


% calculate r2
Rsq_step = 1-sum(resid_step.^2)/sum((litterdat-mean(litterdat)).^2);


% calculate a fitting "line" to plot
[p_step, s_step] = polyfit(model2, litterdat,1);
a = p_step(1);
b = p_step(2);
Vcalc = a.*model2 + b;


% plot
figure(1)
plot(model2,litterdat,'k.');
hold on
plot(model2, Vcalc,'r-');
xlabel('Stepwise litterfall model')
ylabel('Litterfall data')
title ('Fitness testing for litterfall model');

Thursday, June 2, 2011

Processing Tutorials

 I'm working on my data visualization skills here a bit, and I thought I'd start trying to figure out how to work with Processing. Although ultimately I think VELMA will switch languages, Processing does have the advantage of NOT being in C and looking nice.
So to the right is the start of an example from
Data Visualization. Since I'm a total n00b I thought, I wonder what happens when we change the size of the image.

As you can see here, changing the first number cuts it off from the right to left, and the second number from the bottom to the top.

If we change the image numbers, we see that going from 0 to 100 shifts the image within the background.
This, I suppose, is like being in "data mode" in Arc.

Sunday, May 29, 2011

so much more to know

to be a great modeler, I must know more math.
to be a decent ecologist, I must know more ecology.
to be a great spatial analyst, I must become better at programming.
to understand forests better, I must work outside.
to be smart like everyone else, I must know hydrology, fire ecology, landscape ecology, statistics, bioenergetics, physics, chemistry, ecophysiology...

in short, I am behind. it's daunting. I know I need a year to just nose to the grind and learn. I don't know if I can afford it. But I think I need it. I think I could come out with some mediocre stuff in the meanwhile to keep me afloat-- you know, just helping out with spatial stuff here and there, valuation, etc.


One of many things I need to learn is Bayes theorem. I don't get it at all. I've never learned it in a class or talked to anyone about it. I've tried to read about it but it's light years ahead of me. I have it memorized, but it doesn't "click" without practice. This one passage about it, though, I found pretty helpful, so I thought I'd share.

From an article in Ecology by Subhash R. Lele (2010, vol. 91 (12)) - Big Fancy Models:


On the other hand, if the posterior distribution converges to a nondegenerate distribution, it implies non-estimability of the parameters. This nondegenerate distribution can be, and usually is, different than the prior distribution; there can be ‘‘Bayesian learning’’ without identifiability.
Consider a simple example. Let Yi conditional Mu ~ N (Mu, sigma^2)
and let Mu i ~ N (l, tau^2)
Then it is obvious that Yi ~N (l, sigma^2 + tau^2). The parameters sigma^2 and tau^2 are individually nonidentifiable. Suppose we put priors sigma^2~Unif(0, 100) and tau^2~ Unif(0, 100). Suppose the truth is such that sigma^2 + tau^2=  10. Then the marginal posterior distributions for sigma^2 and tau^2 necessarily get concentrated on the interval (0, 10) as the sample size increases. Their joint distribution will be concentrated along the diagonal of the square defined by the coordinates (0, 0), (0, 10), (10, 10), and (10,0). This distribution is different than the prior distribution. Thus, there is ‘‘Bayesian learning’’ but clearly existence of Bayesian learning does not imply that the parameters are identifiable or even that legitimate inferences can be drawn about the parameters for which Bayesian learning happens. If a part of the model is non-identifiable, it can make estimators of other parameters inconsistent. They converge to a single, but wrong point...



Ecologists know a great deal about the processes. While constructing mathematical models, they have a strong and admirable desire to include all the nuances. Unfortunately the data are not always informative enough to conduct inferences on all the complexities of the model. As a consequence, either the model parameters become non-identifiable or non-estimable. If estimation is possible, estimates tend to be extremely uncertain with large standard errors, thus precluding their use in effective decision making. I would urge ecologists to establish identifiability of the parameters in their models before conducting any scientific inferences...

Sunday, May 15, 2011

A program for calculating BAHA, Number of Trees, Biomass

This program can be used to calculate basal area per hectare, number of trees, and biomass/ha from an input file. It is modified from a program I wrote (With help!) this summer

% live_doug.csv


%---THE REAL PROGRAM BEGINS HERE!!!----------------------------------------%


% Import the data from donfile.csv into the variable "bio." This is the biomass
% data calculated directly with the equations from the forest inventories
% in the field. 
% | YEAR | TREE NO | DBH | BIO | TRANS | PLOT | PLOTID
% the domains for these inputs are years, index, index, index,
% concat(indexindex), concat (yearsindex),Mg, Mg, Mg, Mg, and Mg,
% respectively. Each row represents and individual tree.


FILE = 'live_doug_2007.csv';


% Import the LIDAR data which is courtesy of Keith. This data is in the
% following format:
%   | TRANSECT  | PLOT | PLOTID |AREA OF ELLIPSE 
% the domains for these inputs are index, index, concat(indexindex), and Ha, 
% respectively. The slope, aspect, and Ha of each ellipse was calculated
% using ArcGIS-- it is not part of this program. 


FILE_2 = 'ws01_attrib_lidar.csv';




%giving the files some easier names to use in the program
bio = importdata(FILE,',');
hectares = importdata(FILE_2,',');


% Find the unique values in the concatenated years & plot number
% column, and store their positions in upos
[uval,upos] = unique(bio(:,7));


% Define an Nx8 matrix to hold the sums down columns
% for every unique position
sum_bio = zeros(length(upos),6);


% Copy the years as column 1 of the sum matrix, the plot # as col 2,
% and the unique ID as column 3 of the sum matrix
sum_bio(:,1) = bio(upos,1);
sum_bio(:,2) = bio(upos,7);


% Sum down the columns of bio according to the elements of upos
% then divide by the appropriate element of the hectares matrix,
% and store the results in the sum_bio matrix


% sum_bio looks like (all sums divided by hectares):
% year-1966 | plot | ID | sum 1 | sum 2 | sum 3 | sum 4 | sum total
sum_bio(1,3:4) = sum(bio(1:upos(1),3:4))./hectares(find(hectares(:,3) == sum_bio(1,2)),4);
for j = 1:(length(upos)-1);
    sum_bio(j+1,3:4) = sum(bio((upos(j)+1):upos(j+1),3:4))./hectares(find(hectares(:,3) == sum_bio(j+1,2)),4);
end


% number of trees
sum_bio(1,5) = upos(1);


for i = 2:length(upos-1)
sum_bio(i,5) = upos(i)-upos(i-1);
end


%BAHA


sum_bio(1,6) = sum_bio(1,3)*0.00007854;


for i = 2:length(upos-1)
sum_bio(i,6) = sum_bio(i,3).*0.00007854;
end


% export all information to a separate CSV file in MatLab directory
%dlmwrite('bioassay2007.csv', sum_bio, ',');

Friday, May 13, 2011

when math doesn't speak english, part 2

I found this little math gem in a paper I"m reading about Canonical Correspondence Analysis, a technique which has the potential to be a nice "balance" between going completely geostatistical and sticking to the ecologist-friendly realm of P-Value-Ville.

So this little joy here simply means "Mean"-- it's the definition of the mean. Does that look like a "mu" to you? I have to pretend sometimes. 

The paper this is form is 
Canonical Correlation Analysis: An overview with application to learning methods. David R Hardoon, S. Szedmak, and John Shawe-Taylor. Technical Report: CSD-TR-03-02. 2003. 

Saturday, May 7, 2011

LaTeX for the NSF proposal-- bibtex do not apply!

Clearly, NSF must be trying to weed those of us out who are too faint of heart to figure out their obtuse LaTeX template.
 In particular, there is no \usepackage{hyperref}, which is the nice, kind bibliography I am used to from amstat.

Instead, we are faced with using the old fashioned package.
So without further ado, here's now to replicate the NSF bibliography style in LaTeX.

You would, of course, have more sources than this. But this is just to give you an idea of where you need to go.

\begin{thebibliography}{99}

\bibitem{paper01} Cathcart, J.F., J.D. Kline, M. Delauney, and M. Tilton {\em Carbon Storage and Oregon's Land Use Planning Program}, Journal of Forestry, 2007.\\
\bibitem{paper02} Smith, J., L.S. Heath, K.E. Skog, and R. Birdsey {\em Methods for calculating forest ecosystem and harvested carbon with standard estimates for forest tyoes of the United States}, Gen. Tech. Rep. NE-343. USDA Forest Service, Northeastern Research Station, Newton Square, PA, 2006.
\end{thebibliography}

Friday, April 29, 2011

The importance of typography

In early forestry publications in some journals, prior to today's typographical fireworks, articles could be confusing. This is because we use the term "log" to refer to... well... logs.

What does this mean to you?

"A relationship between the above-ground biomass and the diameter of trees has been demonstrated by Shinozaki et al. (1964),  Kira and Shidei (1967), Newbould (1967) and, in English Oaks, by Bunce (1968). A regression of the form log mass = 2.13 log diameter- 0.23 (r =0.96, p <0.01) was calculated...."


Hilton, G.M., J.R. Packham, and J. Willis. 1987. Effects of Experimental Defoliation on a Population of Pendulate Oak (Quercus rober L.). New Phytologist 107, 603-612.

If you didn't know that Pipe Model was an exponential scaling form (say, if you were a forester who isn't into ecological models),you might just think that mass was linearly related to diameter. Logically, this isn't correct, and you would soon figure out that your log did not have a mass of about 2 x its diameter, but it certainly could be confusing.

Sorry, that's not a tip or trick.

Later I'll have one for you. Probably related to .e00 files and geodatabases. I do think about those a lot.
For those who "social network" I have started Twitter for GeoTweeting. I want to catch my location as much as possible and see how I move. 




inspired by GIS

I really think the idea of VGI (volunteered geographic information) is just plain cool. I mean, as a bit of a data hound, there's really some fun in that-- and also in extracting the bias from it (no one is going to adamantly volunteer tweets such as'just buying a new toilet @ Lowes' or ' at the proctologist, @ hospital, what a pain in the butt!')

I made, for the sake of the geo-tweet, a Twitter for Lumber-geek. However, that name was already in use, so Lumber-geek's twitter is
http://twitter.com/#!/aSmallOne

The only posts I'll be making are geo-tweets and tech refernces. Don't expect updates on what I am doing outside of that. I'm too "emo" to be trusted with a non-directional twitter!

Thursday, April 28, 2011

GIS in the cloud

http://www.weather.com/outlook/weather-news/news/articles/tornado-outbreak-april-27_2011-04-27  

GIS in the cloud tells me of the damage to GA before I can find it on the news!

My family is OK, but some big torn-torns (recall the silly nickname i've had for these storms since I was a kid) hit right around Dekalb where my uncle Tom lives. Still I am lucky they are safe.

The real question, though, is "did the torn-torn that hit mountain city damage Goats on the Roof?"

I hope not.

Excel tip

I think I've spent most of my life doing things on Excel. In a way, if Matlab is an exciting night out on the town, Excel is an evening at home in bed with a blanket and a kids' movie.

I do love Excel 2010. SO much more than Excel 2007, which I did not love. And don't even get me going on Excel 2008. Mac users with no statistical analysis package? :(
I was sadly a Mac user at the time. Shhh... don't tell. I don't own any converse and I don't listen to much eclectic music.


Anyway. In Excel 2010 there is a header tab, the "data" tab. This tab is my best buddy on Excel due to the fact that it allows access to the "filter" feature (looks like a little conical funnel).  If you've not used filter I find it to be confusing at first due to needing to know where to click.

So to use filter, you want to highlight your first row. Which means you want to hold your cursor over the left side number (1) and right click on it-- in your spreadsheet, of course.


Now just press the funnel button to filter!


What you'll see if you press one of those down arrows is a list of all the values that is in that column. You can check on and off queries that you want to look at. For example, in the above, I might want to look at all events that happened in 1988. So I could turn my first column to 1988.

I find this to be a really useful tool for getting what I want from raw data. I know maybe to some people it seems obvious, but I thought I'd share, because you can't use tools if you don't know about them.

On a side note, I've been feeling a calling recently. I've noticed that for environmental science (forestry, geology, geography, hydro, etc.) research professionals (and just normal people who like science, there's really a dearth of good "intermediate level help." So what you end up with is a bunch of people with great questions and no tools to answer them. It's easy to google some question like, "how can I systematically delete every nth row in Matlab" and what you'll find is that there are easy help pages (it's Matlab! This is a program for computer processing. Let's write a script! "fprintf 'hello world'" > hello world!) and then there's really complicated (message board troll1 "just recompile your kernel using a batch processing script! write it in C#"! Come on!)

What there is a need for is a good source of intermediate help. I mean, hand-holding-style for the harder applications. It won't tell you "press 'new!' to open  the file" but it will say "and you can actually define interval in matlab by putting the interval between your coordinates. For example, 0:0.005:1 takes you from 0 to 1 with  an interval of 0.005. Now in a For Loop, that will look like this (screenshot).

I would love to partner with an Extension and gather up all the useful stuff I put on here, as well as other needs and tips I learn, and publish an official help column that would have a targeted audience....