Right! I used the official Reddit API. I created an APP, got the credentials for the API. Then used the Python library PRAW to consume the API. https://praw.readthedocs.io/en/latest/
It took me 36 hours to collect the 4M posts. Reddit API returns results in batches of 100 results, and then sleeps for 2 seconds.