Thursday, October 1, 2009

Amazon Public Data Sets

I follow the Amazon Web Services Blog and recently saw several announcements about new Public Data Sets being available.  What is a public data set you ask?  From the web-site:

Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications.

What is cool about this is that it greatly simplifies getting access to large sets of data that while public in the past, were difficult to work with as you had to download, install, load, and otherwise provide the infrastructure for and manage all the data yourself.

The one that caught my eye was the Daily Global Weather data set.  Now, I haven't used this yet and there are other great resources for local weather station data like the Weather Underground (where I upload my station data by the way), but this is a great way to gain access to a bunch of historical weather data.  Other data sets include census data, Wikipedia data, geographic data, and more.

One drawback for data sets that are continuing to be updated each day, like historical weather data, is that you aren't accessing a continually updated data store, rather you are creating your own EBS volume from a snapshot.  This means (if I understand all this correctly) that if you need the most recent data, the snapshot must be updated regularly and you would have to update your EBS volume from it manually.  Or perhaps there are tools to assist with that.

All in all, a useful service, and a great move by Amazon to drive more users to EC2.

No comments:

Post a Comment