Something Vast This Way Comes

The (developer) Preview is finally available! If you’ve been wondering what we’ve been up to, here it is, in a nutshell – we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.

A little more detail:

– We are crawling the web and large parts of the blogosphere with a general crawler, similar to the ones operated by Yahoo!, Google, Ask, MSN, and Gigablast.

– The crawler activates forms, and digs deep to find even dynamic data (although it certainly doesn’t fill in any logins and passwords)

– We automatically recognize classifieds listings – currently cars for sale, job postings, and personals profiles, and extract and normalize the surrounding metadata (make, model, price, mileage, salary, location, title, age, gender, etc.).

Currently, we have some of the largest databases anywhere, of over 15 Million classified listings across these three categories, automatically extracted and structured with no human oversight, from nearly 50,000 web sites and blogs. (We actually crawled many, many times that number, but these are just the sites that have results to date).

If you are an end-user, you should be able to search for that hard-to-find listing without having to visit hundreds of sites, and compare cross-site results, with images, sorting, and statistics.

If you are a web-site owner or web developer, we’re offering a no-hassle API to show this data to your visitors, or to mash it up to your hearts content. You can use it build a huge destination site, an interesting application, or to supplement content and listings that you have today. You CAN use it for commercial purposes, and as long as it’s being shown to real end users, there’s NO LIMIT on the number of queries. Everything you see on the site is built on our API, so you should be able to replicate on your own site or blog.

If you have a classifieds site or a blog and would like your ads to be included in our results, you shouldn’t have to do anything. Just post like you normally would, and we’ll find you. If we’re not getting your results or not getting them all, drop us a note at help – at – vast – dot – com and we’ll try and fix it.

We’re going to keep this site and the API as open as possible, and like a good net citizen, link directly back to the results. We don’t compete with the people that we crawl by taking direct listings. We don’t rely on explicit tagging. And we do an enormous amount of de-duplication and spam filtering to keep the results clean.

Of course, this is a search service, not a listing service, so you can expect some spam and mis-classified results will sneak through. Some links will break due to changes, expirations, and finicky databases that were not designed to be “deep crawled.” In those cases, the cache is your friend. There’re also rivers of pornographic content that had to be filtered out, and occasionally, we miss a few. Please help out by reporting bad results using the links next to each result.

We will be adding more sources, better crawling, improved classification, and many more categories over time – this is just a start. We want to support the web community that wants to take highly-structured content and build applications on top of these massive data flows. When we start making revenue through syndicating this data, we will share it with the developers and sites distributing it via the API.

What more would people like to see? How can we help or improve?

Update: Some coverage of the launch and reviews from TechCrunch, Paul Kedrosky, Peter Ripand CNet.

11 thoughts on “Something Vast This Way Comes

  1. is an interesting idea. I found you by way of the Riya blog. Glad to see you beat them, but I’d sure like to see Riya go live soon as well. I did a search for a Fiat Pininfarina on and found 9 results. When clicking through, the car was sold. I had hoped for a "report this item as sold" or something similar in the frameset. Upon receipt of that type of response, could respider the page and remove from the listings if the result is consistent with the user report.Thanks and Good Luck,Brandon

  2. Not exactly a common car, Brandon, but thanks for checking it out. Our refresh rate is low starting out since the company has just launched, and didn’t want to frequently spider sites who didn’t know what we’re doing. We’re improving it now.

  3. Congrats on the launch. I’ve been talking up your tech ever since you showed me the demo, and it’s nice that people can finally try it out for themselves. 🙂

  4. So the way vast is set up, can you easily add a vertical where it will search for promotions/sales/deals across all e-commerce sites? That would really make it mad smart.

  5. Naval – Neat concept, was looking for something like this for a long time. I would like to discuss extending the scrapping feature. Could you please e-mail me at my personal e-mail id? I could not locate yours. Thank you.

  6. Wow… it activates forms? That’s a bit dangerous…….. I hope it doesn’t crawl and hit the "nuke korea" form button :-/Kevin

  7. Enoirme on March 30, 2011 @midsouthsatelliteis there a way to ptmlanery set the tv to one of the input channels (eg. AV1 or AV2)

  8. Posted on I just sultbmed across your site and I love it. I am all about being healthy but I have no motivation. I admire you!

Leave a Reply

Your email address will not be published.