I should like to give a contrarian comment about this, because it is on top of the front page and seems to be being received positively. This book is probably not a good way to learn about statistical inference. It has quite confused explanations of both Bayesian and frequentist approaches. The preface seems to imply that programmers, by virtue of being able to use computers, don't need to take a rigorous mathematical course in Bayesian methods. However the text actually uses mathematical notation throughout, and as far as I could tell it is often not explained. I noticed at least one case where a probability distribution (gamma) was described only through plots i.e. without specifying its pdf or how you could derive the pdf. I think the kind of discourse that this book exemplifies is halfway to cargo cult 'statistics'.
I've got exactly the same feeling. Could you suggest a good introductory textbook into MCMC? They left it as a mysterious blackbox and I'm not very uncomfortable with using mathematical blackboxes I don't understand.
They are covered near the end of the book. It should be enough to familarize yourself with and understand the basic concepts of MCMC. Anything more in-depth will require a strong mathematical background.
BTW : There are probably a ton of books that cover MCMC out there - that's just one I liked and which is freely downloadable.
I have some background (grad student in cfd, thinking about switching to some sort of data analysis later on) but my measure theory and probability skills are rusty (on the other hand numerical linear algebra, functional calculus and complex analysis are superb). What would be a good book for my level?
http://www.amazon.com/Data-Analysis-Bayesian-Tutorial-Public... is short and reputable on Bayesian statistics. On MCMC specifically, I don't know, but MCMC is really a kind of algorithm that lets you find the answer to a mathematical question (so I think understanding the math is the right thing to start with).
PS. There a second edition of that book, but I've heard that the first edition is better, because the second edition added a different author and expanded the book.
Well, no, because I don't know any good reasons for using Bayesian methods (except when prior probability distributions can be found objectively through some previous experiment etc).
how do you reconcile "I don't know any good reasons for using Bayesian methods" with the fact that Bayesian methods revolutionized spam-filtering? (or maybe you disagree they did?)
Naive Bayes revolutionzed spam filtering because it is incredibly easy to implement and understand, and was reasonably effective for early spam, not because it was the best model for detecting spam. There's a reason we started seeing ads for "v1agra" and snippets of prose -- it's pretty easy to game Naive Bayes.
On the other hand, the GP's assertion that there is seldom a need for using Bayesian methods is also unwarranted; they are the basis for so many machine learning algorithms in common use -- particle filters, for example.
That's a good question and I was asking myself that I after I wrote that comment. I think my objection is more to the 'Bayesian' and less to the 'methods', if that makes sense. That is, I think constructing and updating models using Bayes' theorem can be (as people doing spam-filtering have shown) a good way of making predictions, but that it is the frequentist properties of the models that actually matter (cf. the ubiquity of cross-validation: 'the proof of the pudding is in the eating'), not the fact that they let you maintain a probability distribution over parameters.
>without specifying its pdf or how you could derive the pdf
Would you or anyone happen to know of a good book that discusses the derivation of various advanced probability distributions? It is quite frustrating that every ML or stats book I come across run through various distributions without giving the reader any sort of motivation or intuition behind them. Without that intuition how am I supposed to have any idea when to apply one vs another?
I honestly can't recommend a book for this. The best resource I've found is MathWorld. I've picked up a bunch of very helpful intuitions from it, including:
- Cauchy: the horizontal distance from the origin at which an arrow shot at a random angle from a point below the origin hits the x-axis
- Gamma: how long you have to wait for the nth event in a
Poisson process
It's amazing how popular the term "Bayesian" is amongst people who don't really know what it means or quite where it fits in the context of other statistical paradigms.
That's an interesting point. In fact, the expectation of the random walk is always the same as its starting value, because, although most of the walks go to 0 like you observed, there are very occasionally walks that drift upward to astronomical values.
As for the usefulness of this kind of walk: the process we're modelling is an evolutionary one, where the rate of change is fixed (in this case within the species) and we'd like to detect 'random' (non-selected) evolutionary paths by comparing simulations to historical data.
Yeah, exactly. I've used the mobile apps of both, and Rdio's has 9 confusing icons and requires you to learn a complicated mental model of your music library will work, whereas Spotify just has 'Playlists', 'Search', and 'What's New' - couldn't be simpler.
Hey. I voted 'use Coffeescript a lot and JS a little'. I like to use the best tools, so to me learning JS was just a step on the way to learning Coffeescript as soon as I had heard about it. It always boggles me when people don't want to use better tools.
In terms of complaints, I'd say the main one is a few things missing from the big page of documentation that's on coffeescript.org - #'s are comments, function calls with multiple arguments need to have the left paren start immediately after the function name, how to pass anonymous functions. Those cost me a few hours taken together starting out.
Childrens names & DoB, previous addresses, employment status, national insurance number. That info alone is enough for someone experienced to do damage.
You can obtain that sort of info, by dumpster diving say, but not in anything like the scale.
Imagine that you can get this info and a pretty good idea of salary and lifestyle by running a db search in a few seconds. You can easily focus your attention on the most lucrative propositions and get info from even those that are careful to not put such info out there. Census completion is a legal requirement, everyone should be on one.