Faster scraping (Fundus, CC_NEWS dataset)

Hey! I have been trying to scrape a lot of newspaper articles using fundus library and cc_news dataset. So far i have been able to scrape around 40k in around 10 hours. Which is very slow for my goal. 1) Scraping is done on CPU, would there be any benefit for me to google colab that shit and use a A100. (Chat gpt said it wouldnt help) 2) the library documentation says the code automatically uses all available cores, how can I check if it is true. Task manager shows my cpu usage isnt that high 3) can I run multiple scripts at the same time, I assume if the limitation is something else than cpu power this could help 4) if i walk to class closing the lid (idk how to call it) would the script stop working (i guess the computer would go to sleep and i would have no internet access) If you know that can make this process faster pls lmk!

Madison Howard

Share Your Mood

guywiththemonocle

Faster scraping (Fundus, CC_NEWS dataset)