Hi every one, maybe I’m a bit late to this, but I wanted to share my findings.
I parsed every page up to 40k in DS9 3 times and results matched by distribution with PeoplesElbow findings (no content after page 14k and a lot of dublications) BUT I parsed 4 times more unique urls 246_079 (still 2x short of official size).
And a strange thing is that on second pass (one day after the first one) I started receiving new urls on old pages.
Hi every one, maybe I’m a bit late to this, but I wanted to share my findings. I parsed every page up to 40k in DS9 3 times and results matched by distribution with PeoplesElbow findings (no content after page 14k and a lot of dublications) BUT I parsed 4 times more unique urls 246_079 (still 2x short of official size). And a strange thing is that on second pass (one day after the first one) I started receiving new urls on old pages.
Here is stat by file type:
count | file type --------+------ 1 | ts 8 | mov 236 | mp4 244326 | pdf 73 | m4a 1 | vob 1 | docx 1 | doc 9 | m4v 1422 | avi 1 | wmv