I figured it out mostly from first principles. It's such a niche crawling method...

I figured it out mostly from first principles. It's such a niche crawling method that was perfectly limited to my use-case, and there's alot to say. But the main idea is that you can inject a crawling script in the html of the site via a proxy you control. E.g proxy.yoursite.com?url=<SITE_YOU_WANT_TO_CRAWL>. Then once you've got the data you can call window.postMessage(data) to communicate with the main window.

It's somewhat similar to how browser proxies like: https://proxyium.com/ and https://www.proxysite.com/ fetch the html on your behalf.