News Filtered by : index
URL Search Tool!
commoncrawl.org
on 03/05/2013
Excerpt: A couple months ago we announced the creation of the Common Crawl URL Index and followed it up with a guest post by Jason Ronallo describing how he had used the URL Index. Today we are happy to announce a tool that makes it even easier for you to take advantage of the URL Index! URL Search is a web application that allows you to search for any URL, URL prefix, subdomain or top-level domain. The results of your search show the number of files in the Common Crawl corpus that came from that URL and provide a downloadable JSON metadata file with the address and offset of the data for each URL. Once you download the JSON file, you can drop it into your code so that you only run your job against the subset of the corpus you specified.... read the full post.
Tags: API-Evangelist, API-Stack, Googlereader, Hacker, Ifttt, Index, News, Search, Tools
/news/tag.php