The objective of this demo is to show how it is possible to use ExaStat, along with R, to do exploratory data analysis on huge data sets. The data set used as an example is the 2000 U.S. Census 5% IPUMS sample.
ExaStat is used to process the data, to estimate models, and to put the results into a form suitable for graphing. R is used to display the results as Trellis plots.
The demo shows how to distribute the computations across computers, but the computations can easily be done on a single computer.
To get more of an idea of what the demo involves before downloading it, you can take a look at the Census Demo document by itself. This document is also included in the Census Demo setup, which you can download below.
Click here to view the Census Demo document.
Feedback of any type about the Census Demo and ExaStat is greatly appreciated, at censusdemo@exametrix.com.
In order to use the demo, you must do the following 4 things:
1. Install the ExaStat Win32 Binaries, which you can do above. You do not need to have Visual Studio or any other compiler on your system to go through the demo.
2. Install the Census Demo code and documentation, which you can do with the link below:
Download size: 4.6 MB Updated: February 1, 2007
Click
here to
register to download the Census Demo code and document.
3. Install the latest version of R. You can do this at:
http://www.r-project.org/
4. Install one or more of three possible Census data files. These are provided here as ZIP files, which you will need to upzip into the directory into which you install the Census Demo (the default is ExaStat\CensusDemo). The files when unzipped are ExaStat DataFiles.
File 1: CensusDemoData.zip. This is the recommended file. It contains all of the rows of the entire IPUMS file, but only contains those variables used in the demo. The download size is 90 MB, and the unzipped size is 630 MB.
Click
here to
register to download CensusDemoData.zip.
File 2: CensusIp20001.zip. This contains the entire IPUMS file. The download size is 1 GB, and the unzipped size is 17 GB. You should download this if you are interested in exploring the Census beyond what is shown in the demo. Computation time for the demo is about the same whether you use the CensusDemoData file or this file, because ExaStat only reads the data it needs.
Click
here to
register to download CensusIp20001.zip.
File 3: CensusWageExtract.zip. This file is about 63 MB zipped and about 386 MB unzipped. Use it as your primary file only if you do not have enough room on your computer for the CensusDemoData file. It contains a subselection of the rows but also adds some new variables. The equivalent of this file is created during the course of the demo. Much but not all of the demo can be done using just this file. If you download this file, you can skip the steps in the demo that create the file CensusWageExtract1.xdf.
Click
here to
register to download CensusWageExtract.zip.
View the license information for ExaStat. |