I want an app which let's me design webscrapping workflows visually.
Then once workflow is completed, upload it to cloud and run N instances of this workflow.
There was a software called SpiderClimber for this. Unfortunately, the author disappeared and software too.
Edit: i want to extract ad banners from websites, think Google ads or any other ad network ads. Pop ads etc... Including redirect sequence and page's screenshot and source.
This task is not simple as extracting a table as CSV from a webpage.
I've struggled to find anything which is flexible enough to:
1. Visit porn/torrent and other shady websites
2. Save the banner image
3. Click the banner link and save the redirect sequence
4. Screenshot the page which appears after clicking the banner and save its HTML source.
Why i want to do this?
To extract scam ads run by scammers. I want to create a directory of all ads which can be queried to see what ads are running. If anyone is running fake news ads or spreading hate speech, they'll caught using this method.
Not for adfraud.
My profile contains my email, if you want to discuss, just email me.
It sounds like Selenium may be in the general area of your wishes: afaik there are browser extensions that create a workflow for Selenium, which I think you then run with scripts in e.g. Python. If you're in data science, Python is probably already useful for data processing.
Alternatively, other Selenium-like ‘headless browsers’ may also have visual tools for creating workflows—e.g. PhantomJS and such. See https://alternativeto.net/software/selenium/
Maybe there's even something for Scrapy, which is made for scraping in the first place, as the name indicates.
data toolbar is a modified version of chrome, it includes visual workflow selector method, content grabbing, automated tasking, and exporting/importing workflows as xml documents.
I used it to auto-renew craigslist ads at a company I used to work for.
Then once workflow is completed, upload it to cloud and run N instances of this workflow.
There was a software called SpiderClimber for this. Unfortunately, the author disappeared and software too.
Edit: i want to extract ad banners from websites, think Google ads or any other ad network ads. Pop ads etc... Including redirect sequence and page's screenshot and source.
This task is not simple as extracting a table as CSV from a webpage.
I've struggled to find anything which is flexible enough to:
1. Visit porn/torrent and other shady websites
2. Save the banner image
3. Click the banner link and save the redirect sequence
4. Screenshot the page which appears after clicking the banner and save its HTML source.
Why i want to do this? To extract scam ads run by scammers. I want to create a directory of all ads which can be queried to see what ads are running. If anyone is running fake news ads or spreading hate speech, they'll caught using this method.
Not for adfraud.
My profile contains my email, if you want to discuss, just email me.