SPSSX Discussion

Match Files

Classic

List

Threaded

16 messages Options

tlrodarte

Match Files

Hi,

I have two big database in two different years like this:

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two. Is that possible?

Thank you in advance for advice.

Bruce Weaver

Re: Match Files

Administrator

Take a look at the /TABLE sub-command for MATCH FILES.

http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_match_files_table.htm

You can see an example of it in the "one-to-many merge" example here:

http://www.ats.ucla.edu/stat/spss/modules/merge.htm

Note that the example given there is a bit dated in that it uses only files stored on the hard disk rather than two open datasets.

One more thing: Because your example has variables V1 and V2 in both files, you'll also need to pay attention to the /RENAME sub-command.

HTH.

tlrodarte wrote

Hi,

I have two big database in two different years like this:

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two. Is that possible?

Thank you in advance for advice.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

David Marso

Re: Match Files

Administrator

In reply to this post by tlrodarte

Flatten the Year2 file using CASESTOVARS and then just do a normal 1:1 match into Year1 BY ID.
If this doesn't address the issue then you will have to take the time to explain your problem in greater detail (ie example of actual inputs and desired output).

tlrodarte wrote

Hi,

I have two big database in two different years like this:

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two. Is that possible?

Thank you in advance for advice.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

David Marso

Re: Match Files

Administrator

In reply to this post by Bruce Weaver

Bruce,
Are you sure that applies to the OP's request?

Bruce Weaver wrote

Take a look at the /TABLE sub-command for MATCH FILES.

http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_match_files_table.htm

You can see an example of it in the "one-to-many merge" example here:

http://www.ats.ucla.edu/stat/spss/modules/merge.htm

Note that the example given there is a bit dated in that it uses only files stored on the hard disk rather than two open datasets.

One more thing: Because your example has variables V1 and V2 in both files, you'll also need to pay attention to the /RENAME sub-command.

HTH.

tlrodarte wrote

Hi,

I have two big database in two different years like this:

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two. Is that possible?

Thank you in advance for advice.

Bruce Weaver

Re: Match Files

Administrator

No, I'm not sure. But then I'm not entirely sure what the OP wants the final file to look like, now that I reread it. ;-)

"I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two."

It would seem that they need at least 2 new variables. In the example provided, IDs 1 and 2 each have two rows of data, so two V1 values and two V2 values. Does the OP want the values from the first row only, or all values?

Anyway, your approach of applying CASESTOVARS before 1:1 matching is probably the way to go.

David Marso wrote

Bruce,
Are you sure that applies to the OP's request?

Bruce Weaver wrote

Take a look at the /TABLE sub-command for MATCH FILES.

http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_match_files_table.htm

You can see an example of it in the "one-to-many merge" example here:

http://www.ats.ucla.edu/stat/spss/modules/merge.htm

Note that the example given there is a bit dated in that it uses only files stored on the hard disk rather than two open datasets.

One more thing: Because your example has variables V1 and V2 in both files, you'll also need to pay attention to the /RENAME sub-command.

HTH.

tlrodarte wrote

Hi,

I have two big database in two different years like this:

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

I need to create a third column in Year one that returns the observed values of V1 and V2
in Year two. Is that possible?

Thank you in advance for advice.

bdates

Re: Match Files

I think we need more information from the OP. I could be that the third column in Year One might be contain a transformation of V1 and V2 in Year Two, e.g., sum, product, etc.

B
________________________________________
From: SPSSX(r) Discussion [[hidden email]] on behalf of Bruce Weaver [[hidden email]]
Sent: Sunday, January 19, 2014 1:17 PM
To: [hidden email]
Subject: Re: Match Files

No, I'm not sure. But then I'm not entirely sure what the OP wants the final
file to look like, now that I reread it. ;-)

"I need to create a third column in Year one that returns the observed
values of V1 and V2
in Year two."

It would seem that they need at least 2 new variables. In the example
provided, IDs 1 and 2 each have two rows of data, so two V1 values and two
V2 values. Does the OP want the values from the first row only, or all
values?

Anyway, your approach of applying CASESTOVARS before 1:1 matching is
probably the way to go.

David Marso wrote

> Bruce,
> Are you sure that applies to the OP's request?
> Bruce Weaver wrote
>> Take a look at the /TABLE sub-command for MATCH FILES.
>>
>> http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_match_files_table.htm
>>
>> You can see an example of it in the "one-to-many merge" example here:
>>
>> http://www.ats.ucla.edu/stat/spss/modules/merge.htm
>>
>> Note that the example given there is a bit dated in that it uses only
>> files stored on the hard disk rather than two open datasets.
>>
>> One more thing: Because your example has variables V1 and V2 in both
>> files, you'll also need to pay attention to the /RENAME sub-command.
>>
>> HTH.
>>
>> tlrodarte wrote
>>> Hi,
>>>
>>> I have two big database in two different years like this:
>>>
>>> Year 1 Year 2
>>>
>>> ID V1 V2 ID V1 V2
>>> 1 a c 1 b e
>>> 2 a j 1 a j
>>> 3 b k 2 a k
>>> 4 b l 2 c l
>>> end 4 b l
>>> 7 c k
>>> 100 d e
>>> .
>>> .
>>> .
>>>
>>> I need to create a third column in Year one that returns the observed
>>> values of V1 and V2
>>> in Year two. Is that possible?
>>>
>>> Thank you in advance for advice.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Match-Files-tp5723992p5723996.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

tlrodarte

Re: Match Files

Instead of third column i should have written third and fourth columns for that example; in fact there are 7 columns (V1...V7) in my database. Also, i only need the information in the first row. Before finding this discussion group i tried to do like the "one-to-many merge" above, but it failed in the sense that the information from the second year did not appear in the Data Editor: the variables (renamed) were there, but the values were missing.

I will try CASESTOVARS and return to you later. Thanks again!

tlrodarte

Re: Match Files

Correction:

... "the information from the FIRST year did not appear in the Data Editor"...

Bruce Weaver

Re: Match Files

Administrator

In reply to this post by tlrodarte

I suspect you just needed an EXECUTE after your MATCH FILES. When that is the case, you'll see "Transformations pending" in an information area at the bottom of the data editor.

tlrodarte wrote

Instead of third column i should have written third and fourth columns for that example; in fact there are 7 columns (V1...V7) in my database. Also, i only need the information in the first row. Before finding this discussion group i tried to do like the "one-to-many merge" above, but it failed in the sense that the information from the second year did not appear in the Data Editor: the variables (renamed) were there, but the values were missing.

I will try CASESTOVARS and return to you later. Thanks again!

tlrodarte

Re: Match Files

I see that message in the information area, but using EXECUTE does not change the result: the informations of the first year "disappear" from the database.

David Marso

Re: Match Files

Administrator

OK! At no point do you post your syntax, the specifics of what the outcome should look like etc!

If you expect people to assist you might want to be more informative!

I stand by my original suggestion to flatten the file with CASESTOVARS and then MATCH 1:1 by ID

On Sun, Jan 19, 2014 at 3:55 PM, tlrodarte [via SPSSX Discussion] <[hidden email]> wrote:

I see that message in the information area, but using EXECUTE does not change the result: the informations of the first year "disappear" from the database.

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Match-Files-tp5723992p5724001.html

To unsubscribe from Match Files, click here.
NAML

Bruce Weaver

Re: Match Files

Administrator

Yes, providing more info at the outset would have been good. Here are the two data sets shown in the first post in the thread.

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

1. What do you want the merged dataset to look like?

2. What syntax are you using to try to merge the datasets?

David Marso wrote

OK! At no point do you post your syntax, the specifics of what the outcome
should look like etc!
If you expect people to assist you might want to be more informative!
I stand by my original suggestion to flatten the file with CASESTOVARS and
then MATCH 1:1 by ID

On Sun, Jan 19, 2014 at 3:55 PM, tlrodarte [via SPSSX Discussion] <
[hidden email]> wrote:

> I see that message in the information area, but using EXECUTE does not
> change the result: the informations of the first year "disappear" from the
> database.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://spssx-discussion.1045642.n5.nabble.com/Match-Files-tp5723992p5724001.html
> To unsubscribe from Match Files, click here<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723992&code=ZGF2aWQubWFyc29AZ21haWwuY29tfDU3MjM5OTJ8LTkzOTgzMDYxMw==>
> .
> NAML<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

David Marso

Re: Match Files

Administrator

It rarely gets easier than this ;-)
CASESTOVARS /ID=ID.
MATCH FILES
/ FILE year1 /IN=IN_Year1
/ FILE year2 /IN=IN_year2
/ BY ID .
LIST.

ID v1 v2 v1.1 v1.2 v2.1 v2.2 IN_Year1 IN_year2

1 a c b a e j 1 1
2 a j a c k l 1 1
3 b k 1 0
4 b l b c l k 1 1

Bruce Weaver wrote

Yes, providing more info at the outset would have been good. Here are the two data sets shown in the first post in the thread.

Year 1 Year 2

ID V1 V2 ID V1 V2
1 a c 1 b e
2 a j 1 a j
3 b k 2 a k
4 b l 2 c l
end 4 b l
7 c k
100 d e
.
.
.

1. What do you want the merged dataset to look like?

2. What syntax are you using to try to merge the datasets?

David Marso wrote

OK! At no point do you post your syntax, the specifics of what the outcome
should look like etc!
If you expect people to assist you might want to be more informative!
I stand by my original suggestion to flatten the file with CASESTOVARS and
then MATCH 1:1 by ID

On Sun, Jan 19, 2014 at 3:55 PM, tlrodarte [via SPSSX Discussion] <
[hidden email]> wrote:

> I see that message in the information area, but using EXECUTE does not
> change the result: the informations of the first year "disappear" from the
> database.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://spssx-discussion.1045642.n5.nabble.com/Match-Files-tp5723992p5724001.html
> To unsubscribe from Match Files, click here<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723992&code=ZGF2aWQubWFyc29AZ21haWwuY29tfDU3MjM5OTJ8LTkzOTgzMDYxMw==>
> .
> NAML<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

tlrodarte

Re: Match Files

In reply to this post by Bruce Weaver

Ok, i will explain better:

Below is the syntax that i used and resulted into the error i mentioned before (data missing):

Syntax1.sps

My files refers to data related to under graduate students and those students are identified by the numeric variable 'CO ALUNOS'. I have some informations (for example access or not to public financing) about the students in the year of 2009 and i have these students in 2010 and many others. I need to "capture" the students of 2009 in the database of 2010 to get the same informations that i have of 2009 in the data base of 2010 in order to find out the evolution of the students situation. So the merged data should contain the students of 2009 and the informations related to them in 2009 and in 2010.

Rich Ulrich

Re: Match Files

Without fixing your eventual problem, here is what is wrong with
this .sps file, and how to improve it.

1) You match by CO_ALUNOS: Therefore, you do not want to RENAME it, and
you do not want to DROP it.

2) The file renames everything in the first file, and then DROPs them.
Why bother? Use KEEP= CO_ALUNOS and those others will never appear.
Or, if you want them, you can't DROP them.

David's solution still seems like it was the good one, but you seem to
have changed the description of the problem so that it does not apply.

If you can put all the data in one file, with ID and YEAR as identifier,
and using the one set of variable names... dropping the duplicated or
unneeded variables... then you can use VarsToCases to combine the several
years.

--
Rich Ulrich

----------------------------------------

> Date: Sun, 19 Jan 2014 14:43:12 -0800
> From: [hidden email]
> Subject: Re: Match Files
> To: [hidden email]
>
> Ok, i will explain better:
>
> Below is the syntax that i used and resulted into the error i mentioned
> before (data missing):
>
> Syntax1.sps
> <http://spssx-discussion.1045642.n5.nabble.com/file/n5724005/Syntax1.sps>
>
> My files refers to data related to under graduate students and those
> students are identified by the numeric variable 'CO ALUNOS'. I have some
> informations (for example access or not to public financing) about the
> students in the year of 2009 and i have these students in 2010 and many
> others. I need to "capture" the students of 2009 in the database of 2010 to
> get the same informations that i have of 2009 in the data base of 2010 in
> order to find out the evolution of the students situation. So the merged
> data should contain the students of 2009 and the informations related to
> them in 2009 and in 2010.
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Match Files

In reply to this post by tlrodarte

At 05:43 PM 1/19/2014, tlrodarte wrote:

>My files refers to data related to under graduate students and those
>students are identified by the numeric variable 'CO_ALUNOS'. I have
>some informations (for example access or not to public financing)
>about the students in the year of 2009 and i have these students in
>2010 and many others. I need to "capture" the students of 2009 in
>the database of 2010 to get the same informations that i have of
>2009 in the data base of 2010 in order to find out the evolution of
>the students situation. So the merged data should contain the
>students of 2009 and the informations related to them in 2009 and in 2010.

You have two things that are making this difficult. One is that the
same variables occur in both files, so that you have to rename when
matching. The other is that you appear to have (according to your
example data) more than one record per student in one of the files.
You wrote, earlier, "I only need the information in the first row."
Does that mean, only the first record for each student, if there's
more than one?

See if this does what you want. You have to rename the variables in
Year 2 so they don't conflict with names in Year 1; CASESTOVARS is a
tricky way to do that.

See if this does what you want.

|-----------------------------|---------------------------|
|Output Created |20-JAN-2014 00:03:06 |
|-----------------------------|---------------------------|
[Year_1]
Student V1 V2
1 a c
2 a j
3 b k
4 b l
Number of cases read: 4 Number of cases listed: 4

DATASET ACTIVATE Year_2 WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created |20-JAN-2014 00:03:06 |
|-----------------------------|---------------------------|
[Year_2]
Student V1 V2
1 b e
1 a j
2 a k
2 c l
4 b l
7 c k
Number of cases read: 6 Number of cases listed: 6

* Get Year 2 data ready for merge: .

DATASET ACTIVATE Year_2 WINDOW=FRONT.
* Keep only the first record for each student .

ADD FILES
/FILE= *
/BY Student
/FIRST=Record1.

SELECT IF Record1.

* Rename the variables .

CASESTOVARS
/ID = Student
/INDEX = Record1
/GROUPBY = VARIABLE
/AUTOFIX = NO .

Cases to Variables
Notes [suppressed]

LIST.
List
|-----------------------------|---------------------------|
|Output Created |20-JAN-2014 00:03:06 |
|-----------------------------|---------------------------|
[Year_2]
Student V1.1 V2.1
1 b e
2 a k
4 b l
7 c k
Number of cases read: 4 Number of cases listed: 4

* Merge Year 1 and Year 2 data: .

MATCH FILES
/FILE=Year_1 /IN=In_Yr_1
/FILE=Year_2
/BY Student.

SELECT IF In_Yr_1.

LIST.
List
|-----------------------------|---------------------------|
|Output Created |20-JAN-2014 00:03:07 |
|-----------------------------|---------------------------|
Student V1 V2 V1.1 V2.1 In_Yr_1
1 a c b e 1
2 a j a k 1
3 b k 1
4 b l b l 1
Number of cases read: 4 Number of cases listed: 4
=============================
APPENDIX: Test data and code
=============================
* C:\Documents and Settings\Richard\My Documents .
* \Technical\spssx-l\Z-2014\ .
* 2014-01-19 tlrodarte-Match Files.SPS .

* In response to posting .
* Date: Sun, 19 Jan 2014 14:43:12 -0800 .
* From: tlrodarte <[hidden email]> .
* Subject: Re: Match Files .
* To: [hidden email] .
* and earlier in the same thread. .

* It appears to be a straightforward MATCH FILES, adding data from .
* one year to student data from another year, complicated by .
* having more than one record per student in File 2. .

DATA LIST LIST/
Student V1 V2
(F2, 2A1).
BEGIN DATA
1 a c
2 a j
3 b k
4 b l
END DATA.
DATASET NAME Year_1.

DATA LIST LIST/
Student V1 V2
(F2, 2A1).
BEGIN DATA
1 b e
1 a j
2 a k
2 c l
4 b l
7 c k
END DATA.
DATASET NAME Year_2.

DATASET ACTIVATE Year_1 WINDOW=FRONT.
LIST.

DATASET ACTIVATE Year_2 WINDOW=FRONT.
LIST.

* GET Year 2 data ready for merge: .

DATASET ACTIVATE Year_2 WINDOW=FRONT.

* Keep only the first record for each student .

ADD FILES
/FILE= *
/BY Student
/FIRST=Record1.

SELECT IF Record1.

* Rename the variables .

CASESTOVARS
/ID = Student
/INDEX = Record1
/GROUPBY = VARIABLE
/AUTOFIX = NO .

LIST.

* Merge Year 1 and Year 2 data: .

MATCH FILES
/FILE=Year_1 /IN=In_Yr_1
/FILE=Year_2
/BY Student.

SELECT IF In_Yr_1.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD