I had a demo of a new data extraction service today called Import.io. The service allows you to harvest or scrape data from websites and then output in machine readable formats like JSON. This is very similar to Needlebase, a popular scraping tool that was acquired and then shut down by Google early in 2012. Except I’d say Import.io represents a simpler, yet at the same time a more sophisticated approach to harvesting of web data and publishing than Needlebase.
Using Import.io you can target web pages, where the content resides that you wish to harvest, define the rows of data, label and associate them with columns in table you where the system will ultimately put your data, then extract the data complete with querying, filtering, pagination and other aspects of browsing the web you will need to get at all the data you desire.
After defining the data that will be extracted, and how it will be store you can stop and use the data as is, or you can setup a more ongoing, real-time connection with the data you are harvesting. Using Import.io connectors you pull the data regularly, identify when it changes, merge from multiple sources and remix data as needed.
Put The Data To Work
Using Import.io you can immediately extract the data you need and get to work, or establish an ongoing connection with your sources of data and use via the Import.io web app or you can manage and access via the Import.io API--giving you full control over your web harvesting connections, and the resulting data.
When getting to work using Import.io, you have the option to build your own connectors or explore a marketplace of existing data connectors, tailored to pull from some common sources like the Guardian or ESPN. The Import.io marketplace of connectors is a huge opportunity for data consumers as well as data scraping junkies (like me) to put their talents to use building unique and desireable data harvesting scripts.
I’ve written about database to API services like EmergentOne and SlashDB, I would put Import.io into the Harvest to API or ScrAPI category--allowing you to deploy APIs and machine readable datasets from any publicly available data, even if you aren’t a programmer.
I think ScrAPI services and tools will play an important role in the API economy. While data will almost always originate from a database, often times you can’t navigate existing IT bottlenecks to properly connect and deploy an API from that data source. Sometimes problem owners will have to circumvent existing IT infrastructure and harvest where the data is published on the open web. Taking it upon themselves to generate the necessary API or machine readable formats that will be needed for the last mile of mobile and big data apps that will ultimately consume and depend on this data.
|Harvest to API, Import.io, Scrape to API, ScrAPI, Web to API|
blog comments powered by Disqus
Latest Blog Posts
- APIs in DFW
- Adding API Broker Under Monitoring for API Aggregators
- The Dark Matter That Make APIs Work
- Potential for API Aggregators to Provide Valuable Industry Data
- My Talk Tomorrow Night at the Dallas-Forth Worth API Professionals Meetup
- The White House Releases An Open Data Strategy
- When API Success Signals Begin Working Against You
- Get To Know Which Languages Your API Developers Are Using
- Twitters Developer Area is More Embeddable Than API
- Overview Of Backend as a Service (BaaS) White Paper
- Make Sure And Have Multiple KPIs For Your APIs
- API Enabled Toys For Our Children
- I Am Speaking At The Dallas-Forth Worth API Professionals Meetup May 14th
- How Much Do You Spend Attracting and Supporting Freemium API Developers?
- What Does The API Evangelist Do?
- Startups Need To Work Together on API Definitions
- Parse Is Successful By Truly Solving Problems for Mobile Developers
- API Commandment: Thou Shalt Not Forego Talking to a Person
- API Trends
- API Priorities
- Have You Taken A Look At AT&T APis Lately?
- Helping People Understand APIs Through Real World Examples
- Evolving Beyond API Service Providers and Tools to Goal Based API Toolkits
- APIs & The Federal Government
- After Last Couple of Weeks, It's Clear There Is Big Opportunity In The API Space