Fork me on GitHub
I'm organizing my code libraries tonight. These are PHP, JavaScript, Regular Expressions, SQL, and other tools I use for different purposes.

One such purpose is harvesting and scraping. I have an extensive library of PHP code I've used in the last 5 years to pull web pages, parse tables, submit forms, and what not.

As I'm organizing these snippets of code into Snippely, I'm thinking about all the effort I've put into getting content.

I've harvested government data, craigslist posting, real estate listings, and a wide variety of news, products, and geo-data.

If I need some data, I much prefer using an API, but if I have a need and there is data available on a web page...I just harvest it.

If a content provider does not have an open API, I view them much differently than if they do. I see them as a source of content, there is no real relationship. When a content provider has an open API, I will use their API. If the API offers enough value, I will pay to use it, and establish a relationship with the content provider.

Content providers will have far more control over their content by providing an open API, even if the content is also available on their site. This control will allow owners to track usage and even monetize it.

I would much prefer to pull data and other content from an API rather than scrape or harvest it.



comments powered by Disqus