Sunday, May 29, 2011

so much more to know

to be a great modeler, I must know more math.
to be a decent ecologist, I must know more ecology.
to be a great spatial analyst, I must become better at programming.
to understand forests better, I must work outside.
to be smart like everyone else, I must know hydrology, fire ecology, landscape ecology, statistics, bioenergetics, physics, chemistry, ecophysiology...

in short, I am behind. it's daunting. I know I need a year to just nose to the grind and learn. I don't know if I can afford it. But I think I need it. I think I could come out with some mediocre stuff in the meanwhile to keep me afloat-- you know, just helping out with spatial stuff here and there, valuation, etc.


One of many things I need to learn is Bayes theorem. I don't get it at all. I've never learned it in a class or talked to anyone about it. I've tried to read about it but it's light years ahead of me. I have it memorized, but it doesn't "click" without practice. This one passage about it, though, I found pretty helpful, so I thought I'd share.

From an article in Ecology by Subhash R. Lele (2010, vol. 91 (12)) - Big Fancy Models:


On the other hand, if the posterior distribution converges to a nondegenerate distribution, it implies non-estimability of the parameters. This nondegenerate distribution can be, and usually is, different than the prior distribution; there can be ‘‘Bayesian learning’’ without identifiability.
Consider a simple example. Let Yi conditional Mu ~ N (Mu, sigma^2)
and let Mu i ~ N (l, tau^2)
Then it is obvious that Yi ~N (l, sigma^2 + tau^2). The parameters sigma^2 and tau^2 are individually nonidentifiable. Suppose we put priors sigma^2~Unif(0, 100) and tau^2~ Unif(0, 100). Suppose the truth is such that sigma^2 + tau^2=  10. Then the marginal posterior distributions for sigma^2 and tau^2 necessarily get concentrated on the interval (0, 10) as the sample size increases. Their joint distribution will be concentrated along the diagonal of the square defined by the coordinates (0, 0), (0, 10), (10, 10), and (10,0). This distribution is different than the prior distribution. Thus, there is ‘‘Bayesian learning’’ but clearly existence of Bayesian learning does not imply that the parameters are identifiable or even that legitimate inferences can be drawn about the parameters for which Bayesian learning happens. If a part of the model is non-identifiable, it can make estimators of other parameters inconsistent. They converge to a single, but wrong point...



Ecologists know a great deal about the processes. While constructing mathematical models, they have a strong and admirable desire to include all the nuances. Unfortunately the data are not always informative enough to conduct inferences on all the complexities of the model. As a consequence, either the model parameters become non-identifiable or non-estimable. If estimation is possible, estimates tend to be extremely uncertain with large standard errors, thus precluding their use in effective decision making. I would urge ecologists to establish identifiability of the parameters in their models before conducting any scientific inferences...

Sunday, May 15, 2011

A program for calculating BAHA, Number of Trees, Biomass

This program can be used to calculate basal area per hectare, number of trees, and biomass/ha from an input file. It is modified from a program I wrote (With help!) this summer

% live_doug.csv


%---THE REAL PROGRAM BEGINS HERE!!!----------------------------------------%


% Import the data from donfile.csv into the variable "bio." This is the biomass
% data calculated directly with the equations from the forest inventories
% in the field. 
% | YEAR | TREE NO | DBH | BIO | TRANS | PLOT | PLOTID
% the domains for these inputs are years, index, index, index,
% concat(indexindex), concat (yearsindex),Mg, Mg, Mg, Mg, and Mg,
% respectively. Each row represents and individual tree.


FILE = 'live_doug_2007.csv';


% Import the LIDAR data which is courtesy of Keith. This data is in the
% following format:
%   | TRANSECT  | PLOT | PLOTID |AREA OF ELLIPSE 
% the domains for these inputs are index, index, concat(indexindex), and Ha, 
% respectively. The slope, aspect, and Ha of each ellipse was calculated
% using ArcGIS-- it is not part of this program. 


FILE_2 = 'ws01_attrib_lidar.csv';




%giving the files some easier names to use in the program
bio = importdata(FILE,',');
hectares = importdata(FILE_2,',');


% Find the unique values in the concatenated years & plot number
% column, and store their positions in upos
[uval,upos] = unique(bio(:,7));


% Define an Nx8 matrix to hold the sums down columns
% for every unique position
sum_bio = zeros(length(upos),6);


% Copy the years as column 1 of the sum matrix, the plot # as col 2,
% and the unique ID as column 3 of the sum matrix
sum_bio(:,1) = bio(upos,1);
sum_bio(:,2) = bio(upos,7);


% Sum down the columns of bio according to the elements of upos
% then divide by the appropriate element of the hectares matrix,
% and store the results in the sum_bio matrix


% sum_bio looks like (all sums divided by hectares):
% year-1966 | plot | ID | sum 1 | sum 2 | sum 3 | sum 4 | sum total
sum_bio(1,3:4) = sum(bio(1:upos(1),3:4))./hectares(find(hectares(:,3) == sum_bio(1,2)),4);
for j = 1:(length(upos)-1);
    sum_bio(j+1,3:4) = sum(bio((upos(j)+1):upos(j+1),3:4))./hectares(find(hectares(:,3) == sum_bio(j+1,2)),4);
end


% number of trees
sum_bio(1,5) = upos(1);


for i = 2:length(upos-1)
sum_bio(i,5) = upos(i)-upos(i-1);
end


%BAHA


sum_bio(1,6) = sum_bio(1,3)*0.00007854;


for i = 2:length(upos-1)
sum_bio(i,6) = sum_bio(i,3).*0.00007854;
end


% export all information to a separate CSV file in MatLab directory
%dlmwrite('bioassay2007.csv', sum_bio, ',');

Friday, May 13, 2011

when math doesn't speak english, part 2

I found this little math gem in a paper I"m reading about Canonical Correspondence Analysis, a technique which has the potential to be a nice "balance" between going completely geostatistical and sticking to the ecologist-friendly realm of P-Value-Ville.

So this little joy here simply means "Mean"-- it's the definition of the mean. Does that look like a "mu" to you? I have to pretend sometimes. 

The paper this is form is 
Canonical Correlation Analysis: An overview with application to learning methods. David R Hardoon, S. Szedmak, and John Shawe-Taylor. Technical Report: CSD-TR-03-02. 2003. 

Saturday, May 7, 2011

LaTeX for the NSF proposal-- bibtex do not apply!

Clearly, NSF must be trying to weed those of us out who are too faint of heart to figure out their obtuse LaTeX template.
 In particular, there is no \usepackage{hyperref}, which is the nice, kind bibliography I am used to from amstat.

Instead, we are faced with using the old fashioned package.
So without further ado, here's now to replicate the NSF bibliography style in LaTeX.

You would, of course, have more sources than this. But this is just to give you an idea of where you need to go.

\begin{thebibliography}{99}

\bibitem{paper01} Cathcart, J.F., J.D. Kline, M. Delauney, and M. Tilton {\em Carbon Storage and Oregon's Land Use Planning Program}, Journal of Forestry, 2007.\\
\bibitem{paper02} Smith, J., L.S. Heath, K.E. Skog, and R. Birdsey {\em Methods for calculating forest ecosystem and harvested carbon with standard estimates for forest tyoes of the United States}, Gen. Tech. Rep. NE-343. USDA Forest Service, Northeastern Research Station, Newton Square, PA, 2006.
\end{thebibliography}