Gotta Duplicates 👎🏻
It do really is a super great concept software... Just a lot of inaccurate things!
It just really disappoints me that almost 98% are duplicate, and minus to it the empty email information.
At first I was shock I thought there is like for restaurant alone over 24 cities yet (been scrapinggg for 9hours...) out of 36k cities in all US I already scraped 47k information so I was kinna like hyped upp... like damn, so I stop the scrapping🔥I think the w8 was worth it? but after that if you filter by Phone & Email - half of it is duplicate (21k)... Then if filter it again by businesses name it removes like 6k but if you filter with website link it detect half duplicate on 10k+?? So I check the website link, it is also inaccurate too it adding the same links on other businesses...
Another one....
1 city alone (New Jersey) like scraped 70k (it's a very small niche) - was so happy at start, using VPS and stuff letting it scrape for almost whole day... but when I filter email and phone remove duplicate, only 600+ 💀
Don't wanna talk to much waste of my time but here something you should improve that I'm looking forward ASAP, cuz maybe I'll keep this tool, buy more license, and change my review.... (In-Order)
1. Tons Duplicates - idk maybe on Yellowpage problem but fix it with your software side having some filters going on in the back end...
2. Some Inaccurate data - as what I've said for "website links" that is what I just notice for now, maybe their is more...
3. Slow Scraping - I notice that it kinna like it's first 5k-10k or under it scrapes fast, over 10k-20k kinna OK, over it start frekinngg soooooo slowww.... But i think it's more on percent(%) basis cuz I test scrape on lower volume like around the same percentage of overall scrape it slowwss down a lot... (Slow Scraping is really understandable though if it was just accurate data being scraped - which is also means you really need to fix the Duplicates Issues cuz it will meean also not to scrape huge contacts anymore.) **Dynamic Crawling is disabled btw on those test, how much more if enabled...
Feature request:
1. Should have Negative Keyword Filters on scraping let's say I want to scrape restaurants in California, we'll there's crazy a lot of mcdonald, burgerking, etc.
ReoonSupport
May 9, 2024Hi, Thank you very much for sharing your experiences.
About the duplicates, there are a few things to keep in mind.
1. A lot of companies have branches in different cities and a lot of companies list themselves within the neighboring cities as well.
2. When you perform a search with multiple cities, the software transfers them to the YellowPages' website and extracts the data as it is.
3. In that case, instead of going city by city, going with the state name only can provide better results with fewer duplicates.
4. We do have a plan to implement an automatic duplicate removal feature in the future.
About the website link, we also collect that from the YellowPages directly. Some keywords can have less accurate website info as companies may not be updating the data.
About the scraping speed, we recommend you disable the website crawling when extracting the data from YellowPages. You can crawl the websites separately later. The software uses computer resources to extract the data. The scraping speed also depends on many other factors including internet speed, yellowpages's server response rate, the speed of the proxy (if being used) and a lot of other things.
Another thing, we do not recommend scraping the 36k cities altogether. That list is there so that anyone can find the necessary information. But scraping all the cities together can put pressure on the system and performance may drop.
If you can share a screenshot of the scraping setup tab, we can check if the inputs are correct.
We hope we will be able to help you with the issues you are currently facing and then hopefully you will reconsider the tacos. Please contact us at support@reoon.com so that we can help you further. Have a wonderful day. Thanks.