I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.
For context I am trying to scrape books myself that I can’t seem to find elsewhere so I can use and post them for others.
The scraper tutorial
Hackernoon tutorial by Ethan Jarell
I initially tried to follow this but I kept having a “couldn’t find module” error. Since I have never touched python prior to this, I am unaware how to fix this and the help links are not exactly helpful. If there’s someone who could guide me through this tutorial that would be great.
Selenium
I don’t really get what this is but I think its some sort of python pack and it tells me to download using the pip command but that doesn’t seem to work (syntax error). I don’t know how to manually add it in because, again, I have little idea of what I’m doing.
Scrapy
This one seemed like it’d be an out-of-box deal but not only does it need the pip command to download but it has like 5 other dependencies it needs to function which complicates it more for me.
I am not criticizing these wares, I am just asking for help and if someone could help with the simplification of it all or maybe even point me to an easier method that would be amazing!
Updates
- Figured out that I am supposed to run the command for pip in the command prompt thing on my computer, not the python runner.
py -m
followed by the pip request
-
Got the Ethan Jarrell tutorial to work and managed to add in selenium, which made me realize that selenium isn’t really helpful with the project. rip xP
-
Spent a bunch of time trying to workshop the basic scraper to work with dynamic sites, unsuccessful
-
Online self-help doesn’t go in as much as I would like, probably due to the legal grey area
I have quite an extensive history of scraping web sites for various data over the years, I’d be happy to help you out but I can’t really know how to help without knowing what website your trying to scrape, different sites have their own challenges (maybe behind a login, or using JavaScript to load content - in which case a http response won’t give you what you’re after, or any number of things really).
If you give me a link to a book you want to download as an example I can take a look and help guide you through it
100% this. Every website is different, though after doing this kind of thing for long enough, there are often common patterns and frameworks/libraries. Even general obfuscation can be reasonably reverse engineered with enough time and effort.