Sunday, May 29, 2011

so much more to know

to be a great modeler, I must know more math.
to be a decent ecologist, I must know more ecology.
to be a great spatial analyst, I must become better at programming.
to understand forests better, I must work outside.
to be smart like everyone else, I must know hydrology, fire ecology, landscape ecology, statistics, bioenergetics, physics, chemistry, ecophysiology...

in short, I am behind. it's daunting. I know I need a year to just nose to the grind and learn. I don't know if I can afford it. But I think I need it. I think I could come out with some mediocre stuff in the meanwhile to keep me afloat-- you know, just helping out with spatial stuff here and there, valuation, etc.


One of many things I need to learn is Bayes theorem. I don't get it at all. I've never learned it in a class or talked to anyone about it. I've tried to read about it but it's light years ahead of me. I have it memorized, but it doesn't "click" without practice. This one passage about it, though, I found pretty helpful, so I thought I'd share.

From an article in Ecology by Subhash R. Lele (2010, vol. 91 (12)) - Big Fancy Models:


On the other hand, if the posterior distribution converges to a nondegenerate distribution, it implies non-estimability of the parameters. This nondegenerate distribution can be, and usually is, different than the prior distribution; there can be ‘‘Bayesian learning’’ without identifiability.
Consider a simple example. Let Yi conditional Mu ~ N (Mu, sigma^2)
and let Mu i ~ N (l, tau^2)
Then it is obvious that Yi ~N (l, sigma^2 + tau^2). The parameters sigma^2 and tau^2 are individually nonidentifiable. Suppose we put priors sigma^2~Unif(0, 100) and tau^2~ Unif(0, 100). Suppose the truth is such that sigma^2 + tau^2=  10. Then the marginal posterior distributions for sigma^2 and tau^2 necessarily get concentrated on the interval (0, 10) as the sample size increases. Their joint distribution will be concentrated along the diagonal of the square defined by the coordinates (0, 0), (0, 10), (10, 10), and (10,0). This distribution is different than the prior distribution. Thus, there is ‘‘Bayesian learning’’ but clearly existence of Bayesian learning does not imply that the parameters are identifiable or even that legitimate inferences can be drawn about the parameters for which Bayesian learning happens. If a part of the model is non-identifiable, it can make estimators of other parameters inconsistent. They converge to a single, but wrong point...



Ecologists know a great deal about the processes. While constructing mathematical models, they have a strong and admirable desire to include all the nuances. Unfortunately the data are not always informative enough to conduct inferences on all the complexities of the model. As a consequence, either the model parameters become non-identifiable or non-estimable. If estimation is possible, estimates tend to be extremely uncertain with large standard errors, thus precluding their use in effective decision making. I would urge ecologists to establish identifiability of the parameters in their models before conducting any scientific inferences...

No comments:

Post a Comment