Something Vast This Way Comes
The Vast.com (developer) Preview is finally available! If you’ve been wondering what we’ve been up to, here it is, in a nutshell – we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.
A little more detail:
– We are crawling the web and large parts of the blogosphere with a general crawler, similar to the ones operated by Yahoo!, Google, Ask, MSN, and Gigablast.
– The crawler activates forms, and digs deep to find even dynamic data (although it certainly doesn’t fill in any logins and passwords)
– We automatically recognize classifieds listings – currently cars for sale, job postings, and personals profiles, and extract and normalize the surrounding metadata (make, model, price, mileage, salary, location, title, age, gender, etc.).
Currently, we have some of the largest databases anywhere, of over 15 Million classified listings across these three categories, automatically extracted and structured with no human oversight, from nearly 50,000 web sites and blogs. (We actually crawled many, many times that number, but these are just the sites that have results to date).
If you are an end-user, you should be able to search for that hard-to-find listing without having to visit hundreds of sites, and compare cross-site results, with images, sorting, and statistics.
If you are a web-site owner or web developer, we’re offering a no-hassle API to show this data to your visitors, or to mash it up to your hearts content. You can use it build a huge destination site, an interesting application, or to supplement content and listings that you have today. You CAN use it for commercial purposes, and as long as it’s being shown to real end users, there’s NO LIMIT on the number of queries. Everything you see on the site is built on our API, so you should be able to replicate Vast.com on your own site or blog.
If you have a classifieds site or a blog and would like your ads to be included in our results, you shouldn’t have to do anything. Just post like you normally would, and we’ll find you. If we’re not getting your results or not getting them all, drop us a note at help – at – vast – dot – com and we’ll try and fix it.
We’re going to keep this site and the API as open as possible, and like a good net citizen, link directly back to the results. We don’t compete with the people that we crawl by taking direct listings. We don’t rely on explicit tagging. And we do an enormous amount of de-duplication and spam filtering to keep the results clean.
Of course, this is a search service, not a listing service, so you can expect some spam and mis-classified results will sneak through. Some links will break due to changes, expirations, and finicky databases that were not designed to be “deep crawled.” In those cases, the cache is your friend. There’re also rivers of pornographic content that had to be filtered out, and occasionally, we miss a few. Please help out by reporting bad results using the links next to each result.
We will be adding more sources, better crawling, improved classification, and many more categories over time – this is just a start. We want to support the web community that wants to take highly-structured content and build applications on top of these massive data flows. When we start making revenue through syndicating this data, we will share it with the developers and sites distributing it via the API.
What more would people like to see? How can we help or improve?
Update: Some coverage of the launch and reviews from TechCrunch, Paul Kedrosky, Peter Ripand CNet.