next up previous contents
Next: Filenames Used Up: A Formal Checklist Previous: Add MCD-group codes to

Aggregate and Merge Electoral Data and Census STF3a data.

This is the final step where we aggregate each dataset on the basis of the MCD-group columns we gave them in earlier stages, and then merge the precinct-level electoral data with the STF3a census dataset. The datasets on both sides are aggregated at the MCD-group level. For precinct-level electoral data, this means aggregating the precincts into MCD-groups; for STF3a, it means the MCD units will become MCD-groups. At the end of this step both precinct-level electoral data and STF3a should have identical numbers of rows: as many as the number of MCD-groups. The only tricky part is that some variables must be dealt with differently for aggregation: while most are simply summed, those that are originally percentage-based must be dealt with differently.

Although the precinct-level electoral data are split into four years for each state, we decided to integrate them all into the final merged dataset for matching with STF3a, so that only one MCD-group-level dataset would result for each state. This was possible because the precinct-level electoral data variable names contain a code for the two-digit year and are thus each variable is unique among a state's four precinct-level electoral data year datasets. There were numerous identifier variables which were not unique. These were unimportant or nonsensical when aggregated to the MCD-group level, so we simply removed these before aggregating and merging.

  1. Aggregate the MCD-level STF3a into MCD-groups.

    This step involves dealing with aggregation issues for variables that cannot be simply summed in the STF3a dataset

    1. Identify which variables must be aggregated in an non-summation way, deal with these as special cases. It turns out that this is not a major problem in the STF3a dataset.
    2. Use SAS commands to collapse the dataset into a one-line per MCD-group level dataset. The program file should be named mgstfxx.sas. Save the new dataset as mgstfxx.sd2.

    What should be left over:





    mgstfok.sas.
    Program to aggregate all four precinct-level electoral data year datasets into MCD-group level datasets.
    mgstfok.sd2.
    MCD-group-level dataset of STF3a data.

  2. Aggregate the precinct-level precinct-level electoral data into MCD-groups.

    1. Identify which variables must be aggregated in an non-summation way, deal with these as special cases.
    2. Eliminate the non-year-specific and identifying variables which are unnecessary for the aggregated dataset.
    3. Use SAS commands to collapse the dataset into a one-line per MCD-group level dataset. The program files should be named mgpvxxyy.sas. Save the new datasets as mgpvxxyy.sd2. Do this step for each year of the precinct-level electoral data files.

    What should be left over:





    mgpvok.sas.
    Program to aggregate all four precinct-level electoral data year datasets into MCD-group level datasets.
    mgpvok.sd2.
    SAS dataset of four years merged together, with irrelevant variables eliminated.
    mgpvok84.sd2.
    mgpvok86.sd2.
    mgpvok88.sd2.
    mgpvok90.sd2.
    Four MCD-group-level datasets of precinct-level electoral data data.

  3. Merge the MCD-group level precinct-level electoral data and STF3a datasets.

    Use the program mg_xx.sas to produce the final dataset mg_xx.sd2.

    What should be left over:





    mg_ok.sas.
    SAS program to join the MCD-group-level datasets mgstfok.sd2 and mgpvok.sd2.
    mg_ok.sd2.
    The final, MCD-group-level dataset containing both voting and census data.


next up previous contents
Next: Filenames Used Up: A Formal Checklist Previous: Add MCD-group codes to
Copyright © 1997-2004 [ROAD Home] Questions? Contact the ROAD webmaster.