That is certainly a valid way of doing it and is the methodology that I have used in several search-engine projects in the past (like this: http://intrasitesearchsupport.com/)
However in this case I am offloading the work of building the site-map to the WDG Validation service as I didn't want to have to obtain new servers to provide a free service. This means that I don't get the site-map until the WDG results come back...
- Build a site-map, like parsers build a syntax tree.
- Follow that to validate one page at a time.