[Insight-users] Public Data Sets on Amazon Web Services (AWS)

Luis Ibanez luis.ibanez at kitware.com
Sat Dec 6 20:23:45 EST 2008


http://aws.amazon.com/publicdatasets/


Public Data Sets on AWS provides a centralized repository of public data
sets that can be seamlessly integrated into AWS cloud-based
applications. AWS is hosting the public data sets at no charge for the
community, and like all AWS services, users pay only for the compute and
storage they use for their own applications. An initial list of data
sets is already available, and more will be added soon.

Previously, large data sets such as the mapping of the Human Genome and
the US Census data required hours or days to locate, download,
customize, and analyze. Now, anyone can access these data sets from
their Amazon Elastic Compute Cloud (Amazon EC2) instances and start
computing on the data within minutes. Users can also leverage the entire
AWS ecosystem and easily collaborate with other AWS users. For example,
users can produce or use prebuilt server images with tools and
applications to analyze the data sets. By hosting this important and
useful data with cost-efficient services such as Amazon EC2, AWS hopes
to provide researchers across a variety of disciplines and industries
with tools to enable more innovation, more quickly.


How It Works
============

Select public data sets are hosted on Amazon EC2 for free as Amazon
Elastic Block Store (Amazon EBS) snapshots. Amazon EC2 customers can
access this data by creating their own personal Amazon EBS volumes,
using the public data set snapshots as a starting point. They can then
access, modify and perform computation on these volumes directly using
their Amazon EC2 instances and just pay for the compute and storage
resources that they use. If available, researchers can also use
pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by
BioTeam to perform their analysis.

To get started using the Public Data Sets on AWS, simply perform these
three easy steps:

    1. Sign up for an Amazon EC2 account.

    2. Launch an Amazon EC2 instance.

    3. Create an Amazon EBS volume using the
       Snapshot ID listed in the catalog above
       for your chosen snapshot.

The ElasticFox Getting Started Guide provides a simple walkthrough of
how to launch an instance and create an Amazon EBS volume using
ElasticFox, a convenient FireFox plug-in. Or, see the Amazon EC2 Getting
Started Guide.


How to Share a Public Data Set on AWS
=====================================

If you have a public domain or non-proprietary data set that you think
is useful and interesting to the AWS community, please submit a request
below and the AWS team will review your submission and get back to you.
Typically the data sets in the repository are between 1 GB to 1 TB in
size (based on the Amazon EBS volume limit), but we can work with you to
host larger data sets as well. You must have the right to make the data
freely available.

To get started, simply fill out the request form below, and a member of
our team will contact you regarding your public data set. We will walk
you through publishing your data set to the data repository.



More information about the Insight-users mailing list