SPSSX Discussion

matching values from two different datasets

Classic

List

Threaded

8 messages Options

E. Bernardo

matching values from two different datasets

Dear SPSS Experts,

The 50 item self-administered survey questionnaire(with items varied from demographic variables to likert type items) was successfully administered to n=2000 people. Then the filled-out questionnaires were scanned two times and the datafiles were save separately as data1.sav and data2.sav. The items in the datasets are labeled Q1, Q2, ...Q50. Then the values from the two datasets are matched. The goal is to create a third datafile (data3.sav) such that the values are 0 and 1, 1 if matched and 0 if unmatched.

For example, if the response of a person to Q1 in data1 and data2 are the same or matched, then the Q1 for data3 would be 1; otherwise 0.

How to do this in SPSS?

Thank you for your help.

Albert-Jan Roskam

Re: matching values from two different datasets

this works, although you could make it more generic:

file handle dataset_a /name = "C:\Program Files\IBM\SPSS\Statistics\20\Samples\English\employee data.sav".
file handle dataset_b /name = "C:\Program Files\IBM\SPSS\Statistics\20\Samples\English\employee data.sav".

begin program.
import spss, spssaux
spss.Submit("get file = dataset_a.")
vnames = [v for v in spssaux.GetVariableNamesList() if v != "id"]
spss.Submit("rename variables (%s=%s)." % (" ".join(vnames), " ".join([v + "_a" for v in vnames])))
spss.Submit(["sort cases by id.", "dataset name a."])
spss.Submit("get file = dataset_b.")
vnames = [v for v in spssaux.GetVariableNamesList() if v != "id"]
spss.Submit("rename variables (%s=%s)." % (" ".join(vnames), " ".join([v + "_b" for v in vnames])))
spss.Submit(["sort cases by id.", "dataset name b."])
spss.Submit("match files /file = a /file = b /by id.")
spss.Submit(["compute %s = %s eq %s." % (v, v + "_a", v + "_b") for v in vnames])
spss.Submit("add files /file = * /keep = id %s" % " ".join(vnames))
end program.
exe.

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: E. Bernardo <[hidden email]>
To: [hidden email]
Sent: Thursday, July 11, 2013 10:44 AM
Subject: [SPSSX-L] matching values from two different datasets

Dear SPSS Experts,

The 50 item self-administered survey questionnaire(with items varied from demographic variables to likert type items) was successfully administered to n=2000 people. Then the filled-out questionnaires were scanned two times and the datafiles were save separately as data1.sav and data2.sav. The items in the datasets are labeled Q1, Q2, ...Q50. Then the values from the two datasets are matched. The goal is to create a third datafile (data3.sav) such that the values are 0 and 1, 1 if matched and 0 if unmatched.

For example, if the response of a person to Q1 in data1 and data2 are the same or matched, then the Q1 for data3 would be 1; otherwise 0.

How to do this in SPSS?

Thank you for your help.

Andy W

Re: matching values from two different datasets

Very nice Albert! Here is what I was writing for a non-python solution (although not quite as general).

1) Rename the variables in data2 so they don't conflict with data1, 2) after you match the files a simple do repeat structure will work. Just replace my [Q2List = Q2_1 to Q2_50] with whatever the names of the variables are in the second file.

vector Q3_(50,F1.0).
do repeat Q1List = Q1 to Q50 /Q2List = Q2_1 to Q2_50 /Q3List = Q3_1 to Q3_50.
compute Q3List = (Q1List EQ Q2List).
end repeat.

This presumes the Q1 ... Q50 variables are contiguous in the datafile. The renaming of variables is annoying, but you could either write a simple macro to take care of that for you or just make a set set of variables via vector with non-conflicting names.

Example on smaller datasets below.

*******************************************************.
data list free /ID Q1 Q2 Q3.
begin data
1 4 5 4
2 2 1 2
3 3 3 3
end data.
dataset name data1.

data list free /ID Q1 Q2 Q3.
begin data
1 4 4 4
2 2 2 2
3 3 3 3
end data.
dataset name data2.
rename variables (Q1 = Q2_1)
(Q2 = Q2_2)
(Q3 = Q2_3).

match files file = 'data1'
/file = 'data2'
/by ID.
dataset name data3.

vector Q3_(3,F1.0).
do repeat Q1List = Q1 to Q3 /Q2List = Q2_1 to Q2_3 /Q3List = Q3_1 to Q3_3.
compute Q3List = (Q1List EQ Q2List).
end repeat.
exe.
list.
*******************************************************.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Jon K Peck

Re: matching values from two different datasets

In reply to this post by E. Bernardo

Use Data > Compare Datasets (COMPARE DATASETS).

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: "E. Bernardo" <[hidden email]>
To: [hidden email],
Date: 07/11/2013 02:45 AM
Subject: [SPSSX-L] matching values from two different datasets
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Dear SPSS Experts,

The 50 item self-administered survey questionnaire(with items varied from demographic variables to likert type items) was successfully administered to n=2000 people. Then the filled-out questionnaires were scanned two times and the datafiles were save separately as data1.sav and data2.sav. The items in the datasets are labeled Q1, Q2, ...Q50. Then the values from the two datasets are matched. The goal is to create a third datafile (data3.sav) such that the values are 0 and 1, 1 if matched and 0 if unmatched.

For example, if the response of a person to Q1 in data1 and data2 are the same or matched, then the Q1 for data3 would be 1; otherwise 0.

How to do this in SPSS?

Thank you for your help.

David Marso

Re: matching values from two different datasets

Administrator

In reply to this post by Andy W

ANPS (Another Non Python Solution).
--
data list free /ID Q1 Q2 Q3.
begin data
1 4 5 4
2 2 1 2
3 3 3 3
end data.
dataset name data1 .
SAVE OUTFILE 'G:\TEMP\d1'.

data list free /ID Q1 Q2 Q3.
begin data
1 4 4 4
2 2 2 2
3 3 3 3
end data.
dataset name data2 .
SAVE OUTFILE 'G:\TEMP\d2'.

dataset declare compare.
MATRIX .
GET D1 / FILE = 'G:\TEMP\d1' / VARIABLES= ALL/ NAMES=Vname.
GET D2 / FILE = 'G:\TEMP\d2' / VARIABLES =ALL/ NAMES=Vname.
SAVE (D1-D2) / OUTFILE compare / NAMES=VNAME.
END MATRIX.
DATASET ACTIVATE compare.
RECODE ALL (0=1)(ELSE=0).
LIST.

ID Q1 Q2 Q3

1.00 1.00 .00 1.00
1.00 1.00 .00 1.00
1.00 1.00 1.00 1.00

Number of cases read: 3 Number of cases listed: 3

Andy W wrote

Very nice Albert! Here is what I was writing for a non-python solution (although not quite as general).

1) Rename the variables in data2 so they don't conflict with data1, 2) after you match the files a simple do repeat structure will work. Just replace my [Q2List = Q2_1 to Q2_50] with whatever the names of the variables are in the second file.

vector Q3_(50,F1.0).
do repeat Q1List = Q1 to Q50 /Q2List = Q2_1 to Q2_50 /Q3List = Q3_1 to Q3_50.
compute Q3List = (Q1List EQ Q2List).
end repeat.

This presumes the Q1 ... Q50 variables are contiguous in the datafile. The renaming of variables is annoying, but you could either write a simple macro to take care of that for you or just make a set set of variables via vector with non-conflicting names.

Example on smaller datasets below.

*******************************************************.
data list free /ID Q1 Q2 Q3.
begin data
1 4 5 4
2 2 1 2
3 3 3 3
end data.
dataset name data1.

data list free /ID Q1 Q2 Q3.
begin data
1 4 4 4
2 2 2 2
3 3 3 3
end data.
dataset name data2.
rename variables (Q1 = Q2_1)
(Q2 = Q2_2)
(Q3 = Q2_3).

match files file = 'data1'
/file = 'data2'
/by ID.
dataset name data3.

vector Q3_(3,F1.0).
do repeat Q1List = Q1 to Q3 /Q2List = Q2_1 to Q2_3 /Q3List = Q3_1 to Q3_3.
compute Q3List = (Q1List EQ Q2List).
end repeat.
exe.
list.
*******************************************************.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Bruce Weaver

Re: matching values from two different datasets

Administrator

Very nice, David! But I would add a FORMATS command to get rid of the decimals in your "compare" dataset.

RECODE ALL (0=1)(ELSE=0).
FORMATS ALL(f1).
LIST.

I find 1 1 0 1 etc to be much easier on the eyes than 1.00 1.00 .00 1.00.

David Marso wrote

ANPS (Another Non Python Solution).
--
data list free /ID Q1 Q2 Q3.
begin data
1 4 5 4
2 2 1 2
3 3 3 3
end data.
dataset name data1 .
SAVE OUTFILE 'G:\TEMP\d1'.

data list free /ID Q1 Q2 Q3.
begin data
1 4 4 4
2 2 2 2
3 3 3 3
end data.
dataset name data2 .
SAVE OUTFILE 'G:\TEMP\d2'.

dataset declare compare.
MATRIX .
GET D1 / FILE = 'G:\TEMP\d1' / VARIABLES= ALL/ NAMES=Vname.
GET D2 / FILE = 'G:\TEMP\d2' / VARIABLES =ALL/ NAMES=Vname.
SAVE (D1-D2) / OUTFILE compare / NAMES=VNAME.
END MATRIX.
DATASET ACTIVATE compare.
RECODE ALL (0=1)(ELSE=0).
LIST.

ID Q1 Q2 Q3

1.00 1.00 .00 1.00
1.00 1.00 .00 1.00
1.00 1.00 1.00 1.00

Number of cases read: 3 Number of cases listed: 3

Andy W wrote

Very nice Albert! Here is what I was writing for a non-python solution (although not quite as general).

1) Rename the variables in data2 so they don't conflict with data1, 2) after you match the files a simple do repeat structure will work. Just replace my [Q2List = Q2_1 to Q2_50] with whatever the names of the variables are in the second file.

vector Q3_(50,F1.0).
do repeat Q1List = Q1 to Q50 /Q2List = Q2_1 to Q2_50 /Q3List = Q3_1 to Q3_50.
compute Q3List = (Q1List EQ Q2List).
end repeat.

This presumes the Q1 ... Q50 variables are contiguous in the datafile. The renaming of variables is annoying, but you could either write a simple macro to take care of that for you or just make a set set of variables via vector with non-conflicting names.

Example on smaller datasets below.

*******************************************************.
data list free /ID Q1 Q2 Q3.
begin data
1 4 5 4
2 2 1 2
3 3 3 3
end data.
dataset name data1.

data list free /ID Q1 Q2 Q3.
begin data
1 4 4 4
2 2 2 2
3 3 3 3
end data.
dataset name data2.
rename variables (Q1 = Q2_1)
(Q2 = Q2_2)
(Q3 = Q2_3).

match files file = 'data1'
/file = 'data2'
/by ID.
dataset name data3.

vector Q3_(3,F1.0).
do repeat Q1List = Q1 to Q3 /Q2List = Q2_1 to Q2_3 /Q3List = Q3_1 to Q3_3.
compute Q3List = (Q1List EQ Q2List).
end repeat.
exe.
list.
*******************************************************.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

E. Bernardo

Re: matching values from two different datasets

Albert, Jon, Andy, David, Bruce and all:

Thank you for your help. Problem solved!

Eins

From: Bruce Weaver <[hidden email]>
To: [hidden email]
Sent: Thursday, July 11, 2013 11:45 PM
Subject: Re: matching values from two different datasets

Very nice, David! But I would add a FORMATS command to get rid of the
decimals in your "compare" dataset.

RECODE ALL (0=1)(ELSE=0).
*FORMATS ALL(f1).*
LIST.

I find 1 1 0 1 etc to be much easier on the eyes than 1.00 1.00 .00
1.00.

David Marso wrote

> ANPS (Another Non Python Solution).
> --
> data list free /ID Q1 Q2 Q3.
> begin data
> 1 4 5 4
> 2 2 1 2
> 3 3 3 3
> end data.
> dataset name data1 .
> SAVE OUTFILE 'G:\TEMP\d1'.
>
> data list free /ID Q1 Q2 Q3.
> begin data
> 1 4 4 4
> 2 2 2 2
> 3 3 3 3
> end data.
> dataset name data2 .
> SAVE OUTFILE 'G:\TEMP\d2'.
>
> dataset declare compare.
> MATRIX .
> GET D1 / FILE = 'G:\TEMP\d1' / VARIABLES= ALL/ NAMES=Vname.
> GET D2 / FILE = 'G:\TEMP\d2' / VARIABLES =ALL/ NAMES=Vname.
> SAVE (D1-D2) / OUTFILE compare / NAMES=VNAME.
> END MATRIX.
> DATASET ACTIVATE compare.
> RECODE ALL (0=1)(ELSE=0).
> LIST.
>
> ID Q1 Q2 Q3
>
> 1.00 1.00 .00 1.00
> 1.00 1.00 .00 1.00
> 1.00 1.00 1.00 1.00
>
>
> Number of cases read: 3 Number of cases listed: 3
> Andy W wrote
>> Very nice Albert! Here is what I was writing for a non-python solution
>> (although not quite as general).
>>
>> 1) Rename the variables in data2 so they don't conflict with data1, 2)
>> after you match the files a simple do repeat structure will work. Just
>> replace my [Q2List = Q2_1 to Q2_50] with whatever the names of the
>> variables are in the second file.
>>
>> vector Q3_(50,F1.0).
>> do repeat Q1List = Q1 to Q50 /Q2List = Q2_1 to Q2_50 /Q3List = Q3_1 to
>> Q3_50.
>> compute Q3List = (Q1List EQ Q2List).
>> end repeat.
>>
>> This presumes the Q1 ... Q50 variables are contiguous in the datafile.
>> The renaming of variables is annoying, but you could either write a
>> simple macro to take care of that for you or just make a set set of
>> variables via vector with non-conflicting names.
>>
>> Example on smaller datasets below.
>>
>> *******************************************************.
>> data list free /ID Q1 Q2 Q3.
>> begin data
>> 1 4 5 4
>> 2 2 1 2
>> 3 3 3 3
>> end data.
>> dataset name data1.
>>
>> data list free /ID Q1 Q2 Q3.
>> begin data
>> 1 4 4 4
>> 2 2 2 2
>> 3 3 3 3
>> end data.
>> dataset name data2.
>> rename variables (Q1 = Q2_1)
>> (Q2 = Q2_2)
>> (Q3 = Q2_3).
>>
>> match files file = 'data1'
>> /file = 'data2'
>> /by ID.
>> dataset name data3.
>>
>> vector Q3_(3,F1.0).
>> do repeat Q1List = Q1 to Q3 /Q2List = Q2_1 to Q2_3 /Q3List = Q3_1 to
>> Q3_3.
>> compute Q3List = (Q1List EQ Q2List).
>> end repeat.
>> exe.
>> list.
>> *******************************************************.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/matching-values-from-two-different-datasets-tp5721102p5721110.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: matching values from two different datasets

walang anuman, Eins ;-)

Regards,
Albert-Jan

From: E. Bernardo <[hidden email]>
To: [hidden email]
Sent: Friday, July 12, 2013 5:04 AM
Subject: Re: [SPSSX-L] matching values from two different datasets

Albert, Jon, Andy, David, Bruce and all:

Thank you for your help. Problem solved!

Eins

From: Bruce Weaver <[hidden email]>
To: [hidden email]
Sent: Thursday, July 11, 2013 11:45 PM
Subject: Re: matching values from two different datasets

Very nice, David! But I would add a FORMATS command to get rid of the
decimals in your "compare" dataset.

RECODE ALL (0=1)(ELSE=0).
*FORMATS ALL(f1).*
LIST.

I find 1 1 0 1 etc to be much easier on the eyes than 1.00 1.00 .00
1.00.

David Marso wrote

> ANPS (Another Non Python Solution).
> --
> data list free /ID Q1 Q2 Q3.
> begin data
> 1 4 5 4
> 2 2 1 2
> 3 3 3 3
> end data.
> dataset name data1 .
> SAVE OUTFILE 'G:\TEMP\d1'.
>
> data list free /ID Q1 Q2 Q3.
> begin data
> 1 4 4 4
> 2 2 2 2
> 3 3 3 3
> end data.
> dataset name data2 .
> SAVE OUTFILE 'G:\TEMP\d2'.
>
> dataset declare compare.
> MATRIX .
> GET D1 / FILE = 'G:\TEMP\d1' / VARIABLES= ALL/ NAMES=Vname.
> GET D2 / FILE = 'G:\TEMP\d2' / VARIABLES =ALL/ NAMES=Vname.
> SAVE (D1-D2) / OUTFILE compare / NAMES=VNAME.
> END MATRIX.
> DATASET ACTIVATE compare.
> RECODE ALL (0=1)(ELSE=0).
> LIST.
>
> ID Q1 Q2 Q3
>
> 1.00 1.00 .00 1.00
> 1.00 1.00 .00 1.00
> 1.00 1.00 1.00 1.00
>
>
> Number of cases read: 3 Number of cases listed: 3
> Andy W wrote
>> Very nice Albert! Here is what I was writing for a non-python solution
>> (although not quite as general).
>>
>> 1) Rename the variables in data2 so they don't conflict with data1, 2)
>> after you match the files a simple do repeat structure will work. Just
>> replace my [Q2List = Q2_1 to Q2_50] with whatever the names of the
>> variables are in the second file.
>>
>> vector Q3_(50,F1.0).
>> do repeat Q1List = Q1 to Q50 /Q2List = Q2_1 to Q2_50 /Q3List = Q3_1 to
>> Q3_50.
>> compute Q3List = (Q1List EQ Q2List).
>> end repeat.
>>
>> This presumes the Q1 ... Q50 variables are contiguous in the datafile.
>> The renaming of variables is annoying, but you could either write a
>> simple macro to take care of that for you or just make a set set of
>> variables via vector with non-conflicting names.
>>
>> Example on smaller datasets below.
>>
>> *******************************************************.
>> data list free /ID Q1 Q2 Q3.
>> begin data
>> 1 4 5 4
>> 2 2 1 2
>> 3 3 3 3
>> end data.
>> dataset name data1.
>>
>> data list free /ID Q1 Q2 Q3.
>> begin data
>> 1 4 4 4
>> 2 2 2 2
>> 3 3 3 3
>> end data.
>> dataset name data2.
>> rename variables (Q1 = Q2_1)
>> (Q2 = Q2_2)
>> (Q3 = Q2_3).
>>
>> match files file = 'data1'
>> /file = 'data2'
>> /by ID.
>> dataset name data3.
>>
>> vector Q3_(3,F1.0).
>> do repeat Q1List = Q1 to Q3 /Q2List = Q2_1 to Q2_3 /Q3List = Q3_1 to
>> Q3_3.
>> compute Q3List = (Q1List EQ Q2List).
>> end repeat.
>> exe.
>> list.
>> *******************************************************.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/matching-values-from-two-different-datasets-tp5721102p5721110.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD