Friday, September 26, 2014

How to Merge Datasets in SPSS

This entry will illustrate how to combine databases in SPSS. The assumptions are that you have 2 or more databases in Microsoft Excel and or SPSS, the files have the same cases (like the same 15 survey respondents).

1. Duplicate your files just in case anything goes wrong with the dataset merger. If the cases are unnumbered, assign a unique ID number to each case. Replicate this in all other databases. Ensure the variable name and definitions (like variable type, length and so on) are the same in SPSS. For instance, 'ID' and not 'ID number' in one data file but 'ID' in another data file.

2. If the files are in Microsoft Excel, convert each to SPSS first. To do this, do the following. Within SPSS, select 'File', 'Open', "Format' drop down menu to select 'Excel', select the Excel file, 'Open', Double check that the automatically selected sheet number and cell ranges in the popup message are correct. That is the source that SPSS will use for extracting data.

3. Before beginning, open each file and sort all cases by the case ID. This is an essential step since SPSS will use that variable as its key base variable for merging together the databases. To do this, make the following selections. 'File', 'Open', 'Data ...', Select the file (aka database), 'Sort Ascending' within 'Data View' by right clicking on the variable name 'ID'.
How to Combine Datasets in SPSS

4. To combine the datasets, use the following steps. Open both datasets. Within the first file (say 'file1'), select 'Data' (from top menu bar), 'Merge Files', 'Add Variables' (since we already have the same cases), the following popup dialog box will appear. If you have the second file (say 'file2') open, that second file will appear within this box for selection. Once that second file is selected, click the 'Continue' button.
How to Combine Datasets in SPSS

5. The following dialog box will appear. My file1 has 2 variables: ID and Sunshine. My file2 has 2 variables: ID and Water. Notice how SPSS proposes a 'New Active Database' with the 3 unique variables. 


BTW, you will also notice the following markup (*) and (+). The (*) refers to variables in the active or open file from which you are combining datasets while the (+) refers to those in the second file from which you will add variables to the active (i.e. currently open) file. For instance, 'sunshine (*)' comes from the active file while 'water (+)' will be added to the 'new active dataset' from the second file.
How to Merge Datasets in SPSS
6. Select 'Match cases on key variables in sorted files' and 'both files provide cases'. Then select and move the common (ID) variable from the 'Excluded Variables' box to the 'Key Variables' box as pictured below. Then click 'OK'. You will notice that the 'ID' variables (*) and (+) from the 2 files no longer appear in the 'excluded variables' and 'new active dataset' boxes because they are now merged into one, i.e. the key variable for merging. BTW, if there was a 3rd variable named 'fertilizers' that you appeared in 'new active dataset' that you did not want, simply highlight it and move it over to the 'excluded variables' box.
How to Merge Datasets in SPSS
When done, your first (active) file will also contain the variables of the second file as shown below.
How to Merge Datasets in SPSS




No comments:

Post a Comment

Note: Only a member of this blog may post a comment.