Adding new variables with file-specific names to monthly data via SPSS Macro?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding new variables with file-specific names to monthly data via SPSS Macro?

Patrick Burns
I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.

Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.

In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.


*********************************************************

SET MPRINT OFF.

DATA LIST FREE(TAB) /file1(A50)   file2(A7)           file3(A8)
marker(A8)      a1(A8) a2(A8) a3(A8) a4(A8)            a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.

*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
                                         file2 = !TOKENS(1) /
                                         file3 = !TOKENS(1) /
                                         marker = !TOKENS(1) /
                                         a1 = !TOKENS(1) /
                                         a2 = !TOKENS(1) /
                                         a3 = !TOKENS(1) /
                                         a4 = !TOKENS(1) /
                                         a5 = !TOKENS(1) ).
!DO !var=1 !TO 12

GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).

COMPUTE !marker=1.
RECODE Adr_St_Type  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.

STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.

VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.

SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.

*///////////////////////////////.

*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate



Any advice would be helpful.

==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Patrick Burns, Senior Researcher Economic Roundtable 315 W. 9th Street, Suite 502 Los Angeles, CA, 90015-4200 http://www.economicrt.org
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Adding new variables with file-specific names to monthly data via SPSS Macro?

Kevan Edwards (MDH)

I am currently out of the office and will return on Tues July 10th.  I will have limited to no email access during that time.  I look forward to responding to your email when I return.

 

Thank You! 

 

Reply | Threaded
Open this post in threaded view
|

Automatic reply: Adding new variables with file-specific names to monthly data via SPSS Macro?

Mikki Haegle
In reply to this post by Patrick Burns

I am off enjoying some vacation time.

I will return on July 10th and will respond to messages once I am back in the Lab.

 

~M

 

 

 

 

Mikki Haegle

Psychology Lab Coordinator

700 E 7th Street, NM-L202

St Paul, MN 55106

651.793.1354

 

Reply | Threaded
Open this post in threaded view
|

Automatic reply: Adding new variables with file-specific names to monthly data via SPSS Macro?

Looman, Wendy
In reply to this post by Patrick Burns

I am out of the office until Monday, July 9, 2012. If you need assistance prior to July 9th, please contact: David M. Bass, Vice President for Research, Benjamin Rose Institute on Aging, Margaret Blenkner Research Institute, 11900 Fairhill Road, #300, Cleveland, OH 44120-1053. Phone: 216-373-1664. Email: [hidden email]

 

 

-----------------------------------------------------------------------------------------------------------------
CONFIDENTIALITY NOTICE: The documents accompanying this email may contain confidential information, which is legally privileged. This information is intended only for the use of the recipient named above. If you are not the addressee or the employee or agent of the intended recipient, you are hereby notified that you are strictly prohibited from printing, storing, disseminating, distributing, or copying this communication. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.
-----------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Adding new variables with file-specific names to monthly data via SPSS Macro?

Jenenne Geske
In reply to this post by Patrick Burns

I will be out of the office until Wednesday, July 11. I will respond to your e-mail as soon as possible on my return.

Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Art Kendall
In reply to this post by Patrick Burns
top to make the variable names specific to the file.
This seems like an unusual thing to do.
Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot.  I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros?  Or might the newer approach - python- be advisable?

Art Kendall
Social Research Consultants
On 7/3/2012 8:34 PM, Patrick Burns wrote:
I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.

Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.

In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.


*********************************************************

SET MPRINT OFF.

DATA LIST FREE(TAB) /file1(A50)   file2(A7)           file3(A8)
marker(A8)      a1(A8) a2(A8) a3(A8) a4(A8)            a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.

*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
                                         file2 = !TOKENS(1) /
                                         file3 = !TOKENS(1) /
                                         marker = !TOKENS(1) /
                                         a1 = !TOKENS(1) /
                                         a2 = !TOKENS(1) /
                                         a3 = !TOKENS(1) /
                                         a4 = !TOKENS(1) /
                                         a5 = !TOKENS(1) ).
!DO !var=1 !TO 12

GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).

COMPUTE !marker=1.
RECODE Adr_St_Type  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.

STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.

VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.

SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.

*///////////////////////////////.

*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate



Any advice would be helpful.

==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Andy W
In reply to this post by Patrick Burns

It appears you have some general confusion about how macros are implemented, but fortunately it doesn't appear that much updating to your code will be needed. When you write a macro you typically pass it arguments directly, the macro is then expanded to have those string arguments included in literal syntax (as Raynald says on his website, macros are just a string parser). You don't pass data from an SPSS matrix into the macro.

So for instance in your current macro, if you call

!geographyupdate file1 = file1

The GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")). part of the macro is expanded to call the syntax command GET FILE = "d:\LMI\Data\file1.sav". Similarly if passed the argument file1 = whatever it would be expanded to open the file "d:\LMI\Data\whatever.sav". Hopefully this clears up what macros do (basically insert defined strings into specific places).

Also to note, it appears you perhaps thought the !DO !var=1 !TO 12 looped through the individual cases in the original data file you defined at the beginning of the code. It does not. It would expand to the exact same code 12 times, because you don't call the iterating !var variable anywhere in the subsequent code.

So to make the code do what I think you want it to do will only take a few amendments. 1) Take out the do loop within the macro 2) assign the arguments when you make the macro call explicitly. For example, amending the first case in the original data list command to call the macro would look like (or at least my best guess what it would look like, the tabbed data list was a bit garbled pasting into NABBLE). Also note I'm not sure how the file1 token with spaces will be passed, but conceptually this is how you would do it.

!geographyupdate file1 = 2010\2010-01 Jan\Case Addresses 2010-01 file 2 = 2010-01 file3 = Jan 2010 marker = CAy10m01 a1 = A1y10m01 a2 = A2y10m01 a3 = A3y10m01 a4 = A4y10m01 a5 = A5y10m01

It isn't real clear to me what your goals are and how your data are currently set up, so I can't really comment if there is an obviously better way to do things. Hopefully this clears up some confusion on how macros operate, and you can either adapt the code to be more useable for your circumstance or ask a more refined question.

I'd help to re-write what you currently have to what I think you want it to do, but I'm not quite sure what tokens are actually meant to be file1, file2 and file3. Also I'm highly suspect that making a bunch of different files with different variable names is a good workflow (so perhaps a more abstract explanation of what you are trying to accomplish would be helpful).

HTH, Andy

PS: The SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")). part did you intend for !file to be !file2 or something else?

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Art Kendall
In reply to this post by Art Kendall
I'll have to look at his again later.
I maybe wrong but it sounds like you are going for a "wide" layout rather than a "long" layout.�
I still don't see why you can't have a unique id for each household and have multiple records each having a field that says which file it came from, and a field for year and month..
That way variable names could be the same no matter what the source and you could do the transforms on one pass.

Depending how you data is not stored, IIRC there are already python procedures to take all files in a folder and concatenate them adding a variables that says which file a record came from/

To speed things up you could find the duplicate household records and only send unique address to geocoding.

Art Kendall
Social Research Consultants
On 7/4/2012 1:41 PM, Patrick Burns wrote:

Art,

Thanks very much for your reply and questions.� Another Listserv member (Ruben van den Berg) is also encouraging me to learn Python instead, and I’ve only been looking at SPSS Macros for the last week or so.� My group only uses SPSS Syntax, and has no experience using SPSS Macros before, so switching to Python is fine as long as I can find some good resources on-line (including code examples) to learn from.

My data situation:� I have 9 years of monthly files of addresses (9 x 12 = 108 files total), hence my interest in ‘automating’ my approach to modifying them via SPSS Macros or however.� (I just used one year of address files, 2010, in my syntax below for simplicity’s sake.)

For each of the 108 monthly files, I am seeking to:

  1. Open it
  2. � RECODE two of the variables, Adr_St_Type� and Adr_St_Dir, so that the improper naming conventions are corrected.� Adr_St_Type represents the type of address such as street, road, avenue, boulevard) while Adr_St_Dir is the address direction, such as North, South, etc.� The abbreviations used in the original data set are flawed; I’m correcting them so that I can geocode the addresses later using GIS software.
  3. Concatenate a set of string variables (Adr_Num, Adr_Frac, Adr_St_Dir, Adr_Street, Adr_St_Type) into a new address variable.� These need to be concatenated into a single variable for later use in other programs for geocoding (GIS) and probabilistic matching (seeking persons with the same address).� This variable will be given a name that is file-specific, containing a reference to the year and month, such as “a1y10m01,” which signifies address segment one in year 2010, month January.
  4. Computing new versions of existing variables for the Apartment/Unit #, City, State and ZIP code, with trailing white space trimmed off using RTRIM. These variables will also be given a name that is file-specific, containing a reference to the year and month.
  5. Add a new ‘marker’ variable with the value = 1.� This variable will be given a name that is file-specific, containing a reference to the year and month, such as “CAy10m01,” which signifies Case Address in year 2010, month January.
  6. Add formatting to the new variables, such as labels, level (nominal), display width, value labels, etc.
  7. Save it and close.

I am doing this clean-up and renaming of address variables so that I can merge files over time based upon a household ID (using UPDATE).� (Giving the address variables names that are unique and date specific will help me aggregate them into a single file using UPDATE.)� � I want to be able to see address changes over time, linked to changes in employment, incidents of homelessness (one goal is knowing the last address before becoming homeless), etc.� In the end, the addresses will be geocoded for mapping and also run through a probabalistics matching program to identify simialr addresses.� For geocodinga nd probabilistic mathcing, it makes sense to consolidate the address data into a single file, rather than geogcode 108 different files where most of the addresses won’t change over time.

I don’t know if the approach I started with below makes sense, but I am learning SPSS Macros for the first time.� If it makes better sense to do this with Python Scripts or Python Programs, then I should probably learn that more contemporary approach.� In addition to this work project of building up address information, I have other sets of 108 files that I want to merge/update, so that’s why learning SPSS Macros seemed to make sense.

Thanks,

PATRICK

==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104� x203� � � F: (213) 291-9245
http://www.economicrt.org


From: Art Kendall [[hidden email]]
Sent: Wednesday, July 04, 2012 6:55 AM
To: Patrick Burns
Cc: [hidden email]
Subject: Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

top to make the variable names specific to the file.
This seems like an unusual thing to do.

Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot.� I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros?� Or might the newer approach - python- be advisable?


Art Kendall
Social Research Consultants

On 7/3/2012 8:34 PM, Patrick Burns wrote:

I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.
Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.
In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.
*********************************************************
SET MPRINT OFF.
DATA LIST FREE(TAB) /file1(A50)� �  file2(A7)� � �  � � � � � � � file3(A8)
marker(A8)� � � � �  a1(A8) a2(A8) a3(A8) a4(A8)� � � � � � � � � � �  a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.
*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  file2 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  file3 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  marker = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a1 = !TOKENS(1) /
� � � � � � � �  � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � a2 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a3 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a4 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a5 = !TOKENS(1) ).
!DO !var=1 !TO 12
GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).
COMPUTE !marker=1.
RECODE Adr_St_Type�  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.
STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.
VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.
SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.
*///////////////////////////////.
*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate
Any advice would be helpful.
==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Albert-Jan Roskam
Perhaps with SPSSINC PROCESS FILES, though I'm not sure it is intended to be used to append files.
 
Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

From: Art Kendall <[hidden email]>
To: [hidden email]
Sent: Wednesday, July 4, 2012 10:04 PM
Subject: Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

I'll have to look at his again later.
I maybe wrong but it sounds like you are going for a "wide" layout rather than a "long" layout.Â
I still don't see why you can't have a unique id for each household and have multiple records each having a field that says which file it came from, and a field for year and month..
That way variable names could be the same no matter what the source and you could do the transforms on one pass.

Depending how you data is not stored, IIRC there are already python procedures to take all files in a folder and concatenate them adding a variables that says which file a record came from/

To speed things up you could find the duplicate household records and only send unique address to geocoding.

Art Kendall
Social Research Consultants
On 7/4/2012 1:41 PM, Patrick Burns wrote:
Art,
Â
Thanks very much for your reply and questions. Another Listserv member (Ruben van den Berg) is also encouraging me to learn Python instead, and I’ve only been looking at SPSS Macros for the last week or so. My group only uses SPSS Syntax, and has no experience using SPSS Macros before, so switching to Python is fine as long as I can find some good resources on-line (including code examples) to learn from.
Â
My data situation: I have 9 years of monthly files of addresses (9 x 12 = 108 files total), hence my interest in ‘automating’ my approach to modifying them via SPSS Macros or however. (I just used one year of address files, 2010, in my syntax below for simplicity’s sake.)
Â
For each of the 108 monthly files, I am seeking to:
  1. Open it
  2.  RECODE two of the variables, Adr_St_Type and Adr_St_Dir, so that the improper naming conventions are corrected. Adr_St_Type represents the type of address such as street, road, avenue, boulevard) while Adr_St_Dir is the address direction, such as North, South, etc. The abbreviations used in the original data set are flawed; I’m correcting them so that I can geocode the addresses later using GIS software.
  3. Concatenate a set of string variables (Adr_Num, Adr_Frac, Adr_St_Dir, Adr_Street, Adr_St_Type) into a new address variable. These need to be concatenated into a single variable for later use in other programs for geocoding (GIS) and probabilistic matching (seeking persons with the same address). This variable will be given a name that is file-specific, containing a reference to the year and month, such as “a1y10m01,� which signifies address segment one in year 2010, month January.
  4. Computing new versions of existing variables for the Apartment/Unit #, City, State and ZIP code, with trailing white space trimmed off using RTRIM. These variables will also be given a name that is file-specific, containing a reference to the year and month.
  5. Add a new ‘marker’ variable with the value = 1. This variable will be given a name that is file-specific, containing a reference to the year and month, such as “CAy10m01,� which signifies Case Address in year 2010, month January.
  6. Add formatting to the new variables, such as labels, level (nominal), display width, value labels, etc.
  7. Save it and close.
I am doing this clean-up and renaming of address variables so that I can merge files over time based upon a household ID (using UPDATE). (Giving the address variables names that are unique and date specific will help me aggregate them into a single file using UPDATE.)  I want to be able to see address changes over time, linked to changes in employment, incidents of homelessness (one goal is knowing the last address before becoming homeless), etc. In the end, the addresses will be geocoded for mapping and also run through a probabalistics matching program to identify simialr addresses. For geocodinga nd probabilistic mathcing, it makes sense to consolidate the address data into a single file, rather than geogcode 108 different files where most of the addresses won’t change over time.
I don’t know if the approach I started with below makes sense, but I am learning SPSS Macros for the first time. If it makes better sense to do this with Python Scripts or Python Programs, then I should probably learn that more contemporary approach. In addition to this work project of building up address information, I have other sets of 108 files that I want to merge/update, so that’s why learning SPSS Macros seemed to make sense.
Thanks,
PATRICK
==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104Â x203Â Â Â F: (213) 291-9245
http://www.economicrt.org

From: Art Kendall [[hidden email]]
Sent: Wednesday, July 04, 2012 6:55 AM
To: Patrick Burns
Cc: [hidden email]
Subject: Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?
Â
top to make the variable names specific to the file.
 Â 
 Â 
This seems like an unusual thing to do.
Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot. I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros? Or might the newer approach - python- be advisable?


Art Kendall
Social Research Consultants
On 7/3/2012 8:34 PM, Patrick Burns wrote:
I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.
 Â 
Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.
 Â 
In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.
 Â 
 Â 
*********************************************************
 Â 
SET MPRINT OFF.
 Â 
DATA LIST FREE(TAB) /file1(A50)Â Â  file2(A7)Â Â Â  Â Â Â Â Â Â Â file3(A8)
marker(A8)Â Â Â Â Â  a1(A8) a2(A8) a3(A8) a4(A8)Â Â Â Â Â Â Â Â Â Â Â  a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.
 Â 
*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
                                         file2 = !TOKENS(1) /
                                         file3 = !TOKENS(1) /
                                         marker = !TOKENS(1) /
                                         a1 = !TOKENS(1) /
                                         a2 = !TOKENS(1) /
                                         a3 = !TOKENS(1) /
                                         a4 = !TOKENS(1) /
                                         a5 = !TOKENS(1) ).
!DO !var=1 !TO 12
 Â 
GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).
 Â 
COMPUTE !marker=1.
RECODE Adr_St_Type  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.
 Â 
STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.
 Â 
VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.
 Â 
SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.
 Â 
*///////////////////////////////.
 Â 
*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate
 Â 
 Â 
 Â 
Any advice would be helpful.
 Â 
==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA
 Â 
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
 Â 
Â


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Jon K Peck
PROCESS FILES iterates over a batch of input files specified by a list of names or a wildcard expression.  For each iteration, it executes a specified syntax file, which can open and close files using file handles or macros that are predefined for each file by PROCESS FILES.  If you are creating an output file, you can leave the file open as you iterate, so can accumulate cases across the inputs, or you can close each file (or have PROCESS FILES do this for you).  The AFTERSYNTAX keyword allows you to run a cleanup at the end.

To use this, you need to have installed the Python Essentials from the SPSS Community site and then download and install the command itself from the Extension Commands section of the site.  You do not need to know anything about Python to use this.  However, if you want to learn Python, which is vastly superior to the macro facility (it can even be used to define macros for later use in your traditional SPSS code), a good place to start is the Programming and Data Management book also downloadable from the Community site in the Books and Articles section.  It focuses on doing typical tasks with SPSS taking advantage of Python.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email]
Date:        07/04/2012 02:21 PM
Subject:        Re: [SPSSX-L] Adding new variables with file-specific names to              monthly data              via              SPSS Macro?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Perhaps with SPSSINC PROCESS FILES, though I'm not sure it is intended to be used to append files.
 
Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


From: Art Kendall <[hidden email]>
To:
[hidden email]
Sent:
Wednesday, July 4, 2012 10:04 PM
Subject:
Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?


I'll have to look at his again later.
I maybe wrong but it sounds like you are going for a "wide" layout rather than a "long" layout.Â
I still don't see why you can't have a unique id for each household and have multiple records each having a field that says which file it came from, and a field for year and month..
That way variable names could be the same no matter what the source and you could do the transforms on one pass.

Depending how you data is not stored, IIRC there are already python procedures to take all files in a folder and concatenate them adding a variables that says which file a record came from/

To speed things up you could find the duplicate household records and only send unique address to geocoding.

Art Kendall
Social Research Consultants

On 7/4/2012 1:41 PM, Patrick Burns wrote:
Art,
Â
Thanks very much for your reply and questions. Another Listserv member (Ruben van den Berg) is also encouraging me to learn Python instead, and I’ve only been looking at SPSS Macros for the last week or so. My group only uses SPSS Syntax, and has no experience using SPSS Macros before, so switching to Python is fine as long as I can find some good resources on-line (including code examples) to learn from.
Â
My data situation: I have 9 years of monthly files of addresses (9 x 12 = 108 files total), hence my interest in ‘automating’ my approach to modifying them via SPSS Macros or however. (I just used one year of address files, 2010, in my syntax below for simplicity’s sake.)
Â
For each of the 108 monthly files, I am seeking to:
1.        Open it
2.        Â RECODE two of the variables, Adr_St_Type and Adr_St_Dir, so that the improper naming conventions are corrected. Adr_St_Type represents the type of address such as street, road, avenue, boulevard) while Adr_St_Dir is the address direction, such as North, South, etc. The abbreviations used in the original data set are flawed; I’m correcting them so that I can geocode the addresses later using GIS software.
3.        Concatenate a set of string variables (Adr_Num, Adr_Frac, Adr_St_Dir, Adr_Street, Adr_St_Type) into a new address variable. These need to be concatenated into a single variable for later use in other programs for geocoding (GIS) and probabilistic matching (seeking persons with the same address). This variable will be given a name that is file-specific, containing a reference to the year and month, such as “a1y10m01,â€� which signifies address segment one in year 2010, month January.
4.        Computing new versions of existing variables for the Apartment/Unit #, City, State and ZIP code, with trailing white space trimmed off using RTRIM. These variables will also be given a name that is file-specific, containing a reference to the year and month.
5.        Add a new ‘marker’ variable with the value = 1. This variable will be given a name that is file-specific, containing a reference to the year and month, such as “CAy10m01,â€� which signifies Case Address in year 2010, month January.
6.        Add formatting to the new variables, such as labels, level (nominal), display width, value labels, etc.
7.        Save it and close.
I am doing this clean-up and renaming of address variables so that I can merge files over time based upon a household ID (using UPDATE). (Giving the address variables names that are unique and date specific will help me aggregate them into a single file using UPDATE.)  I want to be able to see address changes over time, linked to changes in employment, incidents of homelessness (one goal is knowing the last address before becoming homeless), etc. In the end, the addresses will be geocoded for mapping and also run through a probabalistics matching program to identify simialr addresses. For geocodinga nd probabilistic mathcing, it makes sense to consolidate the address data into a single file, rather than geogcode 108 different files where most of the addresses won’t change over time.
I don’t know if the approach I started with below makes sense, but I am learning SPSS Macros for the first time. If it makes better sense to do this with Python Scripts or Python Programs, then I should probably learn that more contemporary approach. In addition to this work project of building up address information, I have other sets of 108 files that I want to merge/update, so that’s why learning SPSS Macros seemed to make sense.
Thanks,
PATRICK
==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104Â x203Â Â Â F: (213) 291-9245

http://www.economicrt.org



From: Art Kendall [mailto:Art@...]
Sent:
Wednesday, July 04, 2012 6:55 AM
To:
Patrick Burns
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

Â
top to make the variable names specific to the file.
 Â
 Â
This seems like an unusual thing to do.
Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot. I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros? Or might the newer approach - python- be advisable?


Art Kendall
Social Research Consultants
On 7/3/2012 8:34 PM, Patrick Burns wrote:
I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.
 Â
Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.
 Â
In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.
 Â
 Â
*********************************************************
 Â
SET MPRINT OFF.
 Â
DATA LIST FREE(TAB) /file1(A50)Â Â  file2(A7)Â Â Â  Â Â Â Â Â Â Â file3(A8)
marker(A8)Â Â Â Â Â  a1(A8) a2(A8) a3(A8) a4(A8)Â Â Â Â Â Â Â Â Â Â Â  a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.
 Â
*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
                                         file2 = !TOKENS(1) /
                                         file3 = !TOKENS(1) /
                                         marker = !TOKENS(1) /
                                         a1 = !TOKENS(1) /
         Â                                a2 = !TOKENS(1) /
                                         a3 = !TOKENS(1) /
                                         a4 = !TOKENS(1) /
                                         a5 = !TOKENS(1) ).
!DO !var=1 !TO 12
 Â
GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).
 Â
COMPUTE !marker=1.
RECODE Adr_St_Type  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.
 Â
STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.
 Â
VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.
 Â
SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.
 Â
*///////////////////////////////.
 Â
*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate
 Â
 Â
 Â
Any advice would be helpful.
 Â
==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA
 Â
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
 Â
Â


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Art Kendall
In reply to this post by Art Kendall
IIRC means "If I recall correctly".� �
As Albert-Jan and Jon wrote there is PROCESS FILES.
However, I seem to recall that Jon wrote something in Python that just concatenated all of the files in a folder and added a variable to indicate which file a record came from.
You could then use a MATCH with a table to adjoin all of the variable contents that would be particular to that file.

In my experience, long layouts are faster to run than wide ones.
Also I think writing the syntax for prepping and cleaning long files is often much simpler than for wide files.

Of course there are CASESTOVARS and VARS TO CASES to go back and forth from wide to long.

Art Kendall
Social Research Consultants
On 7/4/2012 5:05 PM, Patrick Burns wrote:

Art,

Thanks; only what your time and patience might allow.

We traditionally make ‘wide’ files at my workplace, either 1-row-per-person or 1-row-per-hosuehold, which makes our data sets more straightforward.� (Many public data sets in the social sciences, such as the US Census Survey of Population and Housing and American Community Surveys are this way by default, so we often mold our other datasets to this layout.)� I am comfortable using other file structures, but I ultimately need to share 1-row-per-person files with my co-worker, since his SPSS data management skills are not that developed.

A typical monthly file of addresses has 1.2 million cases, so I guess I am also worried about making a file that is too long, although � � That said, I do see the advantages of a file layout that you describe: A unique ID for each household/family unit (which we already have) and multiple records per ID, with each record having a field that says which file it came from, and a field for year and month.

I will need to make my aggregated address information a ‘long’ file in order to geocode the addresses, you’re absolutely right.

(I don’t understand your reference to IIRC is relative to python and data storage.� Our SPSS analysis is all conducting on desktop PC workstations, so that puts a little bit of limitation on file size and numbers of cases.� I have a Dell Precision 690 with 16 MB or RAM and running a 64-bit, although my SPSS software is a bit dated: version 16, 32-bit.� I just downloaded ActivePython-2.7.2.5-win32-x86.msi, and am figuring out how to install it.)

PATRICK

==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104� x203� � � F: (213) 291-9245
http://www.economicrt.org


From: Art Kendall [[hidden email]]
Sent: Wednesday, July 04, 2012 1:05 PM
To: [hidden email]; SPSSX-L post
Subject: Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

I'll have to look at his again later.
I maybe wrong but it sounds like you are going for a "wide" layout rather than a "long" layout.�
I still don't see why you can't have a unique id for each household and have multiple records each having a field that says which file it came from, and a field for year and month..
That way variable names could be the same no matter what the source and you could do the transforms on one pass.

Depending how you data is not stored, IIRC there are already python procedures to take all files in a folder and concatenate them adding a variables that says which file a record came from/

To speed things up you could find the duplicate household records and only send unique address to geocoding.


Art Kendall
Social Research Consultants

On 7/4/2012 1:41 PM, Patrick Burns wrote:

Art,

Thanks very much for your reply and questions.� Another Listserv member (Ruben van den Berg) is also encouraging me to learn Python instead, and I’ve only been looking at SPSS Macros for the last week or so.� My group only uses SPSS Syntax, and has no experience using SPSS Macros before, so switching to Python is fine as long as I can find some good resources on-line (including code examples) to learn from.

My data situation:� I have 9 years of monthly files of addresses (9 x 12 = 108 files total), hence my interest in ‘automating’ my approach to modifying them via SPSS Macros or however.� (I just used one year of address files, 2010, in my syntax below for simplicity’s sake.)

For each of the 108 monthly files, I am seeking to:

  1. Open it
  2. � RECODE two of the variables, Adr_St_Type� and Adr_St_Dir, so that the improper naming conventions are corrected.� Adr_St_Type represents the type of address such as street, road, avenue, boulevard) while Adr_St_Dir is the address direction, such as North, South, etc.� The abbreviations used in the original data set are flawed; I’m correcting them so that I can geocode the addresses later using GIS software.
  3. Concatenate a set of string variables (Adr_Num, Adr_Frac, Adr_St_Dir, Adr_Street, Adr_St_Type) into a new address variable.� These need to be concatenated into a single variable for later use in other programs for geocoding (GIS) and probabilistic matching (seeking persons with the same address).� This variable will be given a name that is file-specific, containing a reference to the year and month, such as “a1y10m01,” which signifies address segment one in year 2010, month January.
  4. Computing new versions of existing variables for the Apartment/Unit #, City, State and ZIP code, with trailing white space trimmed off using RTRIM. These variables will also be given a name that is file-specific, containing a reference to the year and month.
  5. Add a new ‘marker’ variable with the value = 1.� This variable will be given a name that is file-specific, containing a reference to the year and month, such as “CAy10m01,” which signifies Case Address in year 2010, month January.
  6. Add formatting to the new variables, such as labels, level (nominal), display width, value labels, etc.
  7. Save it and close.

I am doing this clean-up and renaming of address variables so that I can merge files over time based upon a household ID (using UPDATE).� (Giving the address variables names that are unique and date specific will help me aggregate them into a single file using UPDATE.)� � I want to be able to see address changes over time, linked to changes in employment, incidents of homelessness (one goal is knowing the last address before becoming homeless), etc.� In the end, the addresses will be geocoded for mapping and also run through a probabalistics matching program to identify simialr addresses.� For geocodinga nd probabilistic mathcing, it makes sense to consolidate the address data into a single file, rather than geogcode 108 different files where most of the addresses won’t change over time.

I don’t know if the approach I started with below makes sense, but I am learning SPSS Macros for the first time.� If it makes better sense to do this with Python Scripts or Python Programs, then I should probably learn that more contemporary approach.� In addition to this work project of building up address information, I have other sets of 108 files that I want to merge/update, so that’s why learning SPSS Macros seemed to make sense.

Thanks,

PATRICK

==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104� x203� � � F: (213) 291-9245
http://www.economicrt.org


From: Art Kendall [[hidden email]]
Sent: Wednesday, July 04, 2012 6:55 AM
To: Patrick Burns
Cc: [hidden email]
Subject: Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

top to make the variable names specific to the file.
This seems like an unusual thing to do.

Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot.� I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros?� Or might the newer approach - python- be advisable?



Art Kendall
Social Research Consultants

On 7/3/2012 8:34 PM, Patrick Burns wrote:

I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.
Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.
In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.
*********************************************************
SET MPRINT OFF.
DATA LIST FREE(TAB) /file1(A50)� �  file2(A7)� � �  � � � � � � � file3(A8)
marker(A8)� � � � �  a1(A8) a2(A8) a3(A8) a4(A8)� � � � � � � � � � �  a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.
*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  file2 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  file3 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  marker = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a1 = !TOKENS(1) /
� � � � � � � �  � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � a2 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a3 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a4 = !TOKENS(1) /
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �  a5 = !TOKENS(1) ).
!DO !var=1 !TO 12
GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).
COMPUTE !marker=1.
RECODE Adr_St_Type�  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.
STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.
VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.
SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.
*///////////////////////////////.
*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate
Any advice would be helpful.
==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Adding new variables with file-specific names to monthly data via SPSS Macro?

Jon K Peck
If all that is needed is to merge all the files in a directory, it would be trivial to use Python to generate ADD FILES for these.  Something like this code.  If there are fewer than 50 files, this could be simplified, but ADD FILES has a limit of 50 at a time.  (That used to seem like plenty.)
The source subcommand in ADD FILES will not give the file name or distinguish all the pieces.  If there is another way to tell such as restarting a list of id values, that could be worked into this.  I'll await details on that.  It's a holiday, after all.

begin program.
import spss, glob
filespec = r"c:/temp/parts/*.sav"    # modify as needed
base = r"""ADD FILES FILE=*"""
files = glob.glob(filespec)
for block in range(0, len(files), 49):
  addlist = ["/FILE='" + item + "'\n" for item in files[block:block+49]]
  addcmd = base + " ".join(addlist)
  spss.Submit(addcmd)
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        07/04/2012 03:52 PM
Subject:        Re: [SPSSX-L] Adding new variables with file-specific names to              monthly data              via SPSS Macro?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




IIRC means "If I recall correctly".
As Albert-Jan and Jon wrote there is PROCESS FILES.
However, I seem to recall that Jon wrote something in Python that just concatenated all of the files in a folder and added a variable to indicate which file a record came from.
You could then use a MATCH with a table to adjoin all of the variable contents that would be particular to that file.

In my experience, long layouts are faster to run than wide ones.
Also I think writing the syntax for prepping and cleaning long files is often much simpler than for wide files.

Of course there are CASESTOVARS and VARS TO CASES to go back and forth from wide to long.

Art Kendall
Social Research Consultants

On 7/4/2012 5:05 PM, Patrick Burns wrote:
Art,
Thanks; only what your time and patience might allow.
We traditionally make ‘wide’ files at my workplace, either 1-row-per-person or 1-row-per-hosuehold, which makes our data sets more straightforward. (Many public data sets in the social sciences, such as the US Census Survey of Population and Housing and American Community Surveys are this way by default, so we often mold our other datasets to this layout.) I am comfortable using other file structures, but I ultimately need to share 1-row-per-person files with my co-worker, since his SPSS data management skills are not that developed.
A typical monthly file of addresses has 1.2 million cases, so I guess I am also worried about making a file that is too long, although That said, I do see the advantages of a file layout that you describe: A unique ID for each household/family unit (which we already have) and multiple records per ID, with each record having a field that says which file it came from, and a field for year and month.
I will need to make my aggregated address information a ‘long’ file in order to geocode the addresses, you’re absolutely right.
(I don’t understand your reference to IIRC is relative to python and data storage. Our SPSS analysis is all conducting on desktop PC workstations, so that puts a little bit of limitation on file size and numbers of cases. I have a Dell Precision 690 with 16 MB or RAM and running a 64-bit, although my SPSS software is a bit dated: version 16, 32-bit. I just downloaded ActivePython-2.7.2.5-win32-x86.msi, and am figuring out how to install it.)
PATRICK

==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104 x203 F: (213) 291-9245

http://www.economicrt.org




From: Art Kendall [mailto:Art@...]
Sent:
Wednesday, July 04, 2012 1:05 PM
To:
patrickburns@...; SPSSX-L post
Subject:
Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

I'll have to look at his again later.
I maybe wrong but it sounds like you are going for a "wide" layout rather than a "long" layout.
I still don't see why you can't have a unique id for each household and have multiple records each having a field that says which file it came from, and a field for year and month..
That way variable names could be the same no matter what the source and you could do the transforms on one pass.

Depending how you data is not stored, IIRC there are already python procedures to take all files in a folder and concatenate them adding a variables that says which file a record came from/

To speed things up you could find the duplicate household records and only send unique address to geocoding.


Art Kendall
Social Research Consultants
On 7/4/2012 1:41 PM, Patrick Burns wrote:
Art,
Thanks very much for your reply and questions. Another Listserv member (Ruben van den Berg) is also encouraging me to learn Python instead, and I’ve only been looking at SPSS Macros for the last week or so. My group only uses SPSS Syntax, and has no experience using SPSS Macros before, so switching to Python is fine as long as I can find some good resources on-line (including code examples) to learn from.
My data situation: I have 9 years of monthly files of addresses (9 x 12 = 108 files total), hence my interest in ‘automating’ my approach to modifying them via SPSS Macros or however. (I just used one year of address files, 2010, in my syntax below for simplicity’s sake.)
For each of the 108 monthly files, I am seeking to:
1.        Open it
2.        RECODE two of the variables, Adr_St_Type and Adr_St_Dir, so that the improper naming conventions are corrected. Adr_St_Type represents the type of address such as street, road, avenue, boulevard) while Adr_St_Dir is the address direction, such as North, South, etc. The abbreviations used in the original data set are flawed; I’m correcting them so that I can geocode the addresses later using GIS software.
3.        Concatenate a set of string variables (Adr_Num, Adr_Frac, Adr_St_Dir, Adr_Street, Adr_St_Type) into a new address variable. These need to be concatenated into a single variable for later use in other programs for geocoding (GIS) and probabilistic matching (seeking persons with the same address). This variable will be given a name that is file-specific, containing a reference to the year and month, such as “a1y10m01,” which signifies address segment one in year 2010, month January.
4.        Computing new versions of existing variables for the Apartment/Unit #, City, State and ZIP code, with trailing white space trimmed off using RTRIM. These variables will also be given a name that is file-specific, containing a reference to the year and month.
5.        Add a new ‘marker’ variable with the value = 1. This variable will be given a name that is file-specific, containing a reference to the year and month, such as “CAy10m01,” which signifies Case Address in year 2010, month January.
6.        Add formatting to the new variables, such as labels, level (nominal), display width, value labels, etc.
7.        Save it and close.

I am doing this clean-up and renaming of address variables so that I can merge files over time based upon a household ID (using UPDATE). (Giving the address variables names that are unique and date specific will help me aggregate them into a single file using UPDATE.) I want to be able to see address changes over time, linked to changes in employment, incidents of homelessness (one goal is knowing the last address before becoming homeless), etc. In the end, the addresses will be geocoded for mapping and also run through a probabalistics matching program to identify simialr addresses. For geocodinga nd probabilistic mathcing, it makes sense to consolidate the address data into a single file, rather than geogcode 108 different files where most of the addresses won’t change over time.

I don’t know if the approach I started with below makes sense, but I am learning SPSS Macros for the first time. If it makes better sense to do this with Python Scripts or Python Programs, then I should probably learn that more contemporary approach. In addition to this work project of building up address information, I have other sets of 108 files that I want to merge/update, so that’s why learning SPSS Macros seemed to make sense.

Thanks,

PATRICK

==============================
Patrick Burns, Senior Researcher
Economic Roundtable
315 W. 9th Street, Suite 1209
Los Angeles, CA, 90015-4213
W: (213) 892-8104 x203 F: (213) 291-9245

http://www.economicrt.org




From: Art Kendall [mailto:Art@...]
Sent:
Wednesday, July 04, 2012 6:55 AM
To:
Patrick Burns
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] Adding new variables with file-specific names to monthly data via SPSS Macro?

top to make the variable names specific to the file.
 
 
This seems like an unusual thing to do.
Is it possible that you could achieve your goal by keeping the names the same and adding variables such as which file the cases came from,

If you are just starting out it might be more help in the long run if you describe what you are trying to do in more detail.
It might turn out that for this application macros are the way to go. This might be the case if your group already has a lot invested in macros.
It might turn out that for this application there is a syntax way to go.
It might turn out that for this application python would be the way to go.

I don't have a firm handle on what macros can do that python cannot. I have the impression that for new applications python would be more advisable as it is said to do things that macros cannot easily do.

Please explain in more detail what you are trying to do.
Also are you working with a group that already has a lot of learning invested in macros? Or might the newer approach - python- be advisable?



Art Kendall
Social Research Consultants
On 7/3/2012 8:34 PM, Patrick Burns wrote:
I am trying to write a macro for the first time (!), and am not sure if I am
approaching it correctly.
 
Using the syntax below, I am trying to open a series of 10 files, adding
about 6 variables to each one, and using a macro and the DATA LIST at the
top to make the variable names specific to the file.
 
In a sense, I’m trying to use the DATA LIST at the top as ‘scratch’ data
that the macro uses, one row at a time.
 
 
*********************************************************
 
SET MPRINT OFF.
 
DATA LIST FREE(TAB) /file1(A50)   file2(A7)           file3(A8)
marker(A8)      a1(A8) a2(A8) a3(A8) a4(A8)            a5(A8).
BEGIN DATA.
2010\2010-01 Jan\Case Addresses 2010-01 2010-01 Jan 2010 CAy10m01
A1y10m01 A2y10m01 A3y10m01 A4y10m01 A5y10m01
2010\2010-02 Feb\Case Addresses 2010-02 2010-02 Feb 2010 CAy10m02
A1y10m02 A2y10m02 A3y10m02 A4y10m02 A5y10m02
2010\2010-03 Mar\Case Addresses 2010-03 2010-03 Mar 2010 CAy10m03
A1y10m03 A2y10m03 A3y10m03 A4y10m03 A5y10m03
2010\2010-04 Apr\Case Addresses 2010-04 2010-04 Apr 2010 CAy10m04
A1y10m04 A2y10m04 A3y10m04 A4y10m04 A5y10m04
2010\2010-05 May\Case Addresses 2010-05 2010-05 May 2010 CAy10m05
A1y10m05 A2y10m05 A3y10m05 A4y10m05 A5y10m05
2010\2010-06 Jun\Case Addresses 2010-06 2010-06 Jun 2010 CAy10m06
A1y10m06 A2y10m06 A3y10m06 A4y10m06 A5y10m06
2010\2010-07 Jul\Case Addresses 2010-07 2010-07 Jul 2010 CAy10m07
A1y10m07 A2y10m07 A3y10m07 A4y10m07 A5y10m07
2010\2010-08 Aug\Case Addresses 2010-08 2010-08 Aug 2010 CAy10m08
A1y10m08 A2y10m08 A3y10m08 A4y10m08 A5y10m08
2010\2010-09 Sep\Case Addresses 2010-09 2010-09 Sep 2010 CAy10m09
A1y10m09 A2y10m09 A3y10m09 A4y10m09 A5y10m09
2010\2010-10 Oct\Case Addresses 2010-10 2010-10 Oct 2010 CAy10m10
A1y10m10 A2y10m10 A3y10m10 A4y10m10 A5y10m10
2010\2010-11 Nov\Case Addresses 2010-11 2010-11 Nov 2010 CAy10m11
A1y10m11 A2y10m11 A3y10m11 A4y10m11 A5y10m11
2010\2010-12 Dec\Case Addresses 2010-12 2010-12 Dec 2010 CAy10m12
A1y10m12 A2y10m12 A3y10m12 A4y10m12 A5y10m12
END DATA.
 
*///////////////////////////////.
DEFINE !geographyupdate ( file1 = !TOKENS(1) /
                                         file2 = !TOKENS(1) /
                                         file3 = !TOKENS(1) /
                                         marker = !TOKENS(1) /
                                         a1 = !TOKENS(1) /
                                         a2 = !TOKENS(1) /
                                         a3 = !TOKENS(1) /
                                         a4 = !TOKENS(1) /
                                         a5 = !TOKENS(1) ).
!DO !var=1 !TO 12
 
GET FILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file1,".sav")).
 
COMPUTE !marker=1.
RECODE Adr_St_Type  ('AL'='ALY') ('AR'='AVE') ('AV'='AVE') ('BO'='BLVD')
('BY'='BYP') ('CA'='CALLE') ('CE'='CT') ('CI'='CIR') ('CO'='CT')
 ('DV'='DR') ('GV'='GRV') ('HW'='HWY') ('LA'='LN') ('LN'='LN') ('MS'='')
('PA'='PARK') ('PB'='PO B') ('PL'='PL') ('PW'='PKWY') ('PZ'='PLZ')
 ('RE'='RTE') ('RO'='RD') ('SE'='ST') ('SM'='ST') ('SN'='ST') ('TE'='TER')
('TL'='TRL') ('VA'='VLY') ('VI'='VIEW') ('VT'='VISTA') ('WA'='WAY')
 ('WY'='WAY')
/ Adr_St_Dir ('EA'='E') ('NO'='N') ('SO'='S') ('WE'='W').
EXECUTE.
 
STRING !a1 (A36) / !a2 (A4) / !a3 (A15) / !a4 (A2) / !a5 (A5).
COMPUTE !a1 = concat(rtrim(Adr_Num)," ",rtrim(Adr_Frac),"
",rtrim(Adr_St_Dir)," ",rtrim(Adr_Street)," ",rtrim(Adr_St_Type)).
COMPUTE !a2 = rtrim(Adr_Apt).
COMPUTE !a3 = rtrim(Adr_City).
COMPUTE !a4 = rtrim(Adr_State).
COMPUTE !a5 = rtrim(Adr_Zip5).
EXECUTE.
 
VARIABLE LABELS !marker !QUOTE(!CONCAT(!file3," Case Marker"))
/!a1 'Address, CASE Address file'
/!a2 'Unit, CASE Address file'
/!a3 'City, CASE Address file'
/!a4 'State, CASE Address file'
/!a5 'ZIP, CASE Address file'.
VARIABLE WIDTH !a1 (20) / !a2 (8) / !a3 (12) / !a4 (8) / !a5 (8).
VARIABLE LEVEL !marker (NOMINAL).
VALUE LABELS !marker 1 !QUOTE(!CONCAT(!file3," Case Record")).
EXECUTE.
 
SAVE OUTFILE=!QUOTE(!CONCAT("d:\LMI\Data\",!file,".sav")).
!DOEND
!ENDDEFINE.
 
*///////////////////////////////.
 
*Next line starts the whole process.
SET MPRINT=yes.
!geographyupdate
 
 
 
Any advice would be helpful.
 
==============================
Patrick Burns
Economic Roundtable
Los Angeles, CA, USA
 
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
 



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD