Basic loop-vector command help?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Basic loop-vector command help?

Brian-31
I'm new to SPSS, and would appreciate some help on a very basic question.

I would like to be able to read in a list of state FIPS codes and
abbreviations, then use that in a "Do if" statement to computer another
variable in a large SPSS file that I have. I know that this is simple, but
after many hours I'm just stuck. I can't seem to understand how the vector
command works, its not like any array variable I've ever used before. I'm
trying to create a two-dimensional array, then read from the array,
replace some values in the "Do if" loop, and then compute a new variable,
but my I'm missing something basic here. Here's the code. Any help would
be greatly appreciated.
thanks, Brian


Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A).
Begin Data
01AL
02AK
04AZ
05AR
06CA
08CO
End Data.

*Some kind of vector command?.
*Some kind of loop comand?.
*I would like to be able to use vectors and loops to insert the
appropriate state abbreviation and state fips code into the following if
loop, rather than type out the following replacement/compute command 51
times.
*Help!.


Do if (To_St = "00") & (To_State_Abbr = "AL").
 Compute To_St_2 = "01".
Else if (To_St = "00") & (To_State_Abbr = "AK").
 Compute To_St_2 = "02".
End if.

Execute.
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Maguin, Eugene
Brian,

I may not understand you at all but it seems to me that you might be trying
to work with two files. The first contains the FIPS (whatever they are) and
state abbreviations. The second is the 'large spss file' you mention having.
If this is true and the second file has the FIPS codes as a variable but not
the abbreviations, then you should use the table keyword of the match files
command. Arrays and vectors aren't involved.

If you don't have two files then, perhaps you can better explain what you do
have because I can't imagine what you are describing.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Brian-31
In reply to this post by Brian-31
Hi Gene, thanks for the reply. I'll try to provide more information.

I am writing a syntax file that I will use the run command to perform
operations on my larger file (Master_SPSS). My intent with the syntax file
is to automate some data changes in some fields in the Master_SPSS file. I
can currently "run" the Syntax in the syntax file that I submitted and it
will make changes on the Master file, but I'd perfer not to have to write
two lines of code for every state. I was hoping that I could read into a
vector file (that I seen as an array, maybe I'm wrong) the state FIPS
codes and abbreviations, then use the data in the vector to fill in a
variable in the Do if loop that would test for two values in the larger
file (1.5million records) and then if correct, a value in a third field
would be changed. I'm keeping the names of the internal syntax fields
different than the names of the fields for the larger SPSS file. In other
languages, I'd write some type of array and then use a substitution
variable, and I could iterate the Do if loop through the different records
in the array. I think that I must have some fundamental misunderstanding
of how vectors and arrays interact. I hope that this helps?

best,

Brian
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Brian-31
In reply to this post by Brian-31
HI Gene, this is more detail. yes, the code that I sent you has variables
in the Do if loop that belong to the larger Master SPSS file. What I want
to do is to substitute some of those values for values from the vector, so
I don't have to have a separate line for each state (AL FIPS 01, AK FIPS
02, etc.)

I tried various ways to assign the value of the vector to the do if loop,
as follows, but clearly it's not working. I don't understand how to assign
the first and second value of a vector/array to a value, and the SPSS
reference documents don't help.
Thanks for any help you can provide. Brian

==========

Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A).
Begin Data
01AL
02AK
04AZ
08CO
End Data.

Vector V=State_FIPSV to State_AbbrV.

Loop #Counter 1 to 4.

Do if (To_St = "00") & (To_State_Abbr = (V(#Counter)).
 Compute To_St_2 = ("State_FIPS2 ... this is one place of confusion.
End if.
End Loop.

Execute.
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Maguin, Eugene
In reply to this post by Brian-31
Brian,

I still don't really get what you are doing. So now it sounds like you have
a single file, the 'master file', and that file contains a FIPS variable and
a state abbreviation variable and some other variables.

You say:
I was hoping that I could read into a
vector file (that I seen as an array, maybe I'm wrong) the state FIPS
codes and abbreviations, then use the data in the vector to fill in a
variable in the Do if loop that would test for two values in the larger
file (1.5million records) and then if correct, a value in a third field
would be changed.

In spss you could do this.
Numeric FIPS1 to FIPS50(f2.0).
String ABBR1 to ABBR50(A2).
Do repeat x=FIPS1 to FIPS50/
   y=1 2 3 4 5 ..... 50.
+  compute x=y.
End repeat.
Do repeat x=ABBR1 to ABBR50/
   y='AL' 'AR' ..... 'WY'.
+  compute x=y.
End repeat.

But, you have now added 100 variables to every record in your master file.

Now to use use those variables, I'll bet that you have to do search. Let
M_FIPS and M_ABBR be the variables in your master file. It could be that you
are wanting to check whether the M_FIPS and M_ABBR are both on the 'list'
represented by FIPS1-50 and ABBR1-50. If so, you could do

Vector FIPS=FIPS1 to FIPS50/ABBR=ABBR1 to ABBR50.
Do if (ABBR(M_FIPS) eq M_ABBR).
+   some computations follow.
Else.
+  some other computations follow.
End if.

Of couse, if M_FIPS does not go from 1 to 50, then you have to search FIPS
to find the correct value which then tells you the index to use to compare
abbreviations.


Having said all that, let me add that spss can handle matrix data. Look at
and read the section of the documentation on the Matrix-End matrix command
set. If this is really what you are after, then there are others on the list
that are especially good with that command set.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Art Kendall-2
In reply to this post by Brian-31
somebody asked what FIPS codes are. Federal Information Processing
Standards are used to assign standard values to variables.  In this
instance the Geography Division of Census has assigned codes to every
state, county, place, tract and block.    There are also internationally
agreed standard codes for countries, provinces, and cities.  The use of
these standard codes allows data from different sources to be matched.
Many mapping and Geographic Information Systems (GIS) use these to match
user data to geographic data.

If  I understand the OP's query.
there is a master file. say MASTER.SAV  Sometimes the FIPS state code is
'00' or missing.
There is also a file of FIPS and post office state codes. say
FIPS_Table.SAV already in post office code order.



Something like this untested syntax should work. The rename line and
save into a different name are there to avoid the "sin" of creating a
situation in which
one cannot go back and do it over correctly this time.

get file= 'd:\project\master.sav.
rename vars(state_fips = old_state_fips).
sort cases by State_Abbr2.
match files file=* /table = 'd:\project\FIPS_Table.SAV'
 by State_Abbr2.
crosstabs tables= old_state_fips by state_fips /missing = include.
save outfile= 'd:project\master2.sav'.

I have some boilerplate syntax files for states and roughly equivalent
entities that contains the FIPS codes, post office codes, and long
string names, to use for value labels. If anyone would like this file
send me an email, I can send them.
(After the holidays, I'll check whether this syntax should go on
Developer Central.)

Art Kendall
Social Research Consultants


Brian Kelley wrote:

>HI Gene, this is more detail. yes, the code that I sent you has variables
>in the Do if loop that belong to the larger Master SPSS file. What I want
>to do is to substitute some of those values for values from the vector, so
>I don't have to have a separate line for each state (AL FIPS 01, AK FIPS
>02, etc.)
>
>I tried various ways to assign the value of the vector to the do if loop,
>as follows, but clearly it's not working. I don't understand how to assign
>the first and second value of a vector/array to a value, and the SPSS
>reference documents don't help.
>Thanks for any help you can provide. Brian
>
>==========
>
>Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A).
>Begin Data
>01AL
>02AK
>04AZ
>08CO
>End Data.
>
>Vector V=State_FIPSV to State_AbbrV.
>
>Loop #Counter 1 to 4.
>
>Do if (To_St = "00") & (To_State_Abbr = (V(#Counter)).
> Compute To_St_2 = ("State_FIPS2 ... this is one place of confusion.
>End if.
>End Loop.
>
>Execute.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Richard Ristow
In reply to this post by Brian-31
OK, this one seems to have got a little tangled.

At 02:56 PM 11/21/2006, Brian Kelley wrote:

>I would like to be able to read in a list of state FIPS codes
>and   abbreviations, then use that in a "Do if" statement to computer
>another variable in a large SPSS file that I have. [Something like]
>
>Do if (To_St = "00") & (To_State_Abbr = "AL").
>  Compute To_St_2 = "01".
>Else if (To_St = "00") & (To_State_Abbr = "AK").
>  Compute To_St_2 = "02".
>End if.

Forget "DO IF" and "VECTOR"; those are the means, not the end you're
looking for. It doesn't look like it's ever been clear what you're
trying to do, and without that, there'll never be a solution.

It seems clear you want to do a table look-up of some kind. You have a
table that matches FIPS codes and state abbreviations (in technical
use, "postal codes") for the states; it has 52 entries, or thereabouts.
In your very large file, you have a FIPS variable called "To_St" and a
postal-code variable called "To_State_Abbr". They *should* match, i.e.
be for the same state; goodness knows what you want to do, if they
don't. If they do match, you're mapping the state into another state
code, "To_St_2".

But I may not have this right. Your second clause has

>Else if (To_St = "00") & (To_State_Abbr = "AK").
>  Compute To_St_2 = "02".

If you really mean 'To_St = "00"' here, then To_St *isn't* the FIPS
code, and I've no idea what you want the FIPS code for.

(If you'll excuse me, why the project in the first place? You have two
perfectly good state codes already: FIPS and postal. And if you have
two different 2-digit state codes in your file - FIPS, in which Alabama
is "00", and yours in which Alabama is "01" - you've handed yourself
great opportunity for confusion.)

Anyway, to do a table lookup in SPSS, you can

1. SORT CASES and MATCH FILES, as Art Kendall suggested. That's the
most standard, SPSS-idiomatic technique; but if the file to be sorted,
and then probably re-sorted, has 1.5 million records, that may be
inefficient.

2. If you don't mind data in code, you can write your table as a RECODE
statement (preferred) or a DO IF chain (as you started to write). For
the trouble you're putting yourself through, you probably could have
written the DO IF chain by hand, by now.

3. Or, you can use AUTORECODE with a template file that you generate.
Efficient lookup, and the data isn't in the code.

4. Or, finally, have the table in a set of variables IN EVERY CASE.
They you can use VECTOR/LOOP logic, though this can be a roundabout way
of doing it. It seems to be what you're thinking about.

Code below is not tested.

1.   MATCH FILES: see Art's code, modify as necessary.

2.   Data in code, using RECODE:
*  The variable "#To_St_All" is useful in .
*  all of the approaches, except maybe    .
*  MATCH FILES.                           .
STRING  #To_St_All (A4).
COMPUTE #To_St_All = CONCAT(To_St,
                             To_State_Abbr).

RECODE  #To_St_All
   ("    " = "99")
   ("00AL" = "01")
   ("01AK" = "02") /* 01AK, not 00AK */
...
   (ELSE   = "98") INTO To_St_2.


I'll skip code for other methods now - my energy is not what I'd hoped
for. If you're determined, and really don't want data in code, look at
AUTORECODE with a template; that may be your best bet. If you like, AND
if I understand what you want, I'll try it later. Or the VECTOR/LOOP
logic, mainly to illustrate its complexity and general awkwardness. (It
takes a CASESTOVARS followed by a MATCH FILES just to make the lookup
table available for VECTOR/LOOP.)

-Good luck,
  Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Peck, Jon
The need for this sort of table lookup function comes up a lot, so we have added a lookup function to the extendedTransforms programmability module that will be on SPSS Developer Central at the end of November.  With that, you can read in a table - from text or an SPSS dataset or any other source, including just creating a little Python dictionary in the code, and then you can just do the lookup with the variable as a key.  Doesn't matter whether it is string or numeric, and you can supply a value to use if the key is not in the table.  This will require SPSS 15 with programmability installed.

I'll post a complete example when the new module is available.

Regards,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion on behalf of Richard Ristow
Sent: Wed 11/22/2006 11:51 AM
To: [hidden email]
Subject:      Re: [SPSSX-L] Basic loop-vector command help?
 
OK, this one seems to have got a little tangled.

At 02:56 PM 11/21/2006, Brian Kelley wrote:

>I would like to be able to read in a list of state FIPS codes
>and   abbreviations, then use that in a "Do if" statement to computer
>another variable in a large SPSS file that I have. [Something like]
>
>Do if (To_St = "00") & (To_State_Abbr = "AL").
>  Compute To_St_2 = "01".
>Else if (To_St = "00") & (To_State_Abbr = "AK").
>  Compute To_St_2 = "02".
>End if.

Forget "DO IF" and "VECTOR"; those are the means, not the end you're
looking for. It doesn't look like it's ever been clear what you're
trying to do, and without that, there'll never be a solution.

It seems clear you want to do a table look-up of some kind. You have a
table that matches FIPS codes and state abbreviations (in technical
use, "postal codes") for the states; it has 52 entries, or thereabouts.
In your very large file, you have a FIPS variable called "To_St" and a
postal-code variable called "To_State_Abbr". They *should* match, i.e.
be for the same state; goodness knows what you want to do, if they
don't. If they do match, you're mapping the state into another state
code, "To_St_2".

But I may not have this right. Your second clause has

>Else if (To_St = "00") & (To_State_Abbr = "AK").
>  Compute To_St_2 = "02".

If you really mean 'To_St = "00"' here, then To_St *isn't* the FIPS
code, and I've no idea what you want the FIPS code for.

(If you'll excuse me, why the project in the first place? You have two
perfectly good state codes already: FIPS and postal. And if you have
two different 2-digit state codes in your file - FIPS, in which Alabama
is "00", and yours in which Alabama is "01" - you've handed yourself
great opportunity for confusion.)

Anyway, to do a table lookup in SPSS, you can

1. SORT CASES and MATCH FILES, as Art Kendall suggested. That's the
most standard, SPSS-idiomatic technique; but if the file to be sorted,
and then probably re-sorted, has 1.5 million records, that may be
inefficient.

2. If you don't mind data in code, you can write your table as a RECODE
statement (preferred) or a DO IF chain (as you started to write). For
the trouble you're putting yourself through, you probably could have
written the DO IF chain by hand, by now.

3. Or, you can use AUTORECODE with a template file that you generate.
Efficient lookup, and the data isn't in the code.

4. Or, finally, have the table in a set of variables IN EVERY CASE.
They you can use VECTOR/LOOP logic, though this can be a roundabout way
of doing it. It seems to be what you're thinking about.

Code below is not tested.

1.   MATCH FILES: see Art's code, modify as necessary.

2.   Data in code, using RECODE:
*  The variable "#To_St_All" is useful in .
*  all of the approaches, except maybe    .
*  MATCH FILES.                           .
STRING  #To_St_All (A4).
COMPUTE #To_St_All = CONCAT(To_St,
                             To_State_Abbr).

RECODE  #To_St_All
   ("    " = "99")
   ("00AL" = "01")
   ("01AK" = "02") /* 01AK, not 00AK */
...
   (ELSE   = "98") INTO To_St_2.


I'll skip code for other methods now - my energy is not what I'd hoped
for. If you're determined, and really don't want data in code, look at
AUTORECODE with a template; that may be your best bet. If you like, AND
if I understand what you want, I'll try it later. Or the VECTOR/LOOP
logic, mainly to illustrate its complexity and general awkwardness. (It
takes a CASESTOVARS followed by a MATCH FILES just to make the lookup
table available for VECTOR/LOOP.)

-Good luck,
  Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Brian-31
In reply to this post by Brian-31
Richard, thanks very much; I will pursue the recode angle. I have also
tried to use a macro, which according to the manual should work, but it
does not. So, I'm still struggling to make this code work.

Because you asked ... the reason that I am having to do this is that some
of the FIPS codes are correct, but some are not due to the history of this
data. In some cases, the state FIPS codes were set to "00" and I'm trying
to fix that, so that I can use the data. I can't just do a bulk replace,
unfortunately.

I'll try another way to describe the situation by showing
syntax that DOES work and syntax that DOES NOT work, and maybe someone can
help me, and I'm trying to use a macro instead of vectors which may work
more effectively.

1. GOOD Code. This basic code works by first checking the values in two
fields of my main database (Master_spare) and then depending on the
results of that process, it computes a new value for a third field.

GET FILE='E:\IRS Migration Data\County Migration
Files\Co_9304_Master_spare.sav'.
DATASET NAME Master_IRS WINDOW=FRONT.

If (TO_ST = "00"  And TO_ST_ABBR = "AL") To_St_2 = "01".
If (TO_ST = "00" And TO_ST_ABBR = "AK") To_St_2 = "02".

Execute.

2. BAD Code. Because I'm hoping to avoid having to write a single line for
each of the 50 states, and I want to learn more about how to tap the power
of SPSS syntax, I've tried to use vectors, loops and now I'm trying a
macro. Unfortnately, the macro below does not work, although the SPSS file
does have "transformations pending" on the bottom of the screen; when I
choose "run transformations" nothing happens, and my data options are set
on "run transformations immediately". So I don't know what transformations
are pending, but I can't get them to show up, in any event. I've reworked
this macro many times, and it should work given what I can read, but it
does not.


GET FILE='E:\IRS Migration Data\County Migration
Files\Co_9304_Master_spare.sav'.
DATASET NAME Master_IRS WINDOW=FRONT.

DEFINE !No_Zeros (StateFips = !Tokens(1) /StateAbbr = !Tokens(1))
 Do if (TO_ST = "00" and TO_ST_ABBR = "AL").
  Compute TO_ST_2 = "01".
 End If.
!ENDDEFINE.

!No_Zeros "01"  "AL".
EXECUTE.

(Result: no change in the main spss file, notification that
Transformations are pending).

Any help would be much appreciated!

best,

Brian
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

Maguin, Eugene
Brian,

From your reply to Richard (Richard, thanks for asking the right question!):

>>Because you asked ... the reason that I am having to do this is that some
of the FIPS codes are correct, but some are not due to the history of this
data. In some cases, the state FIPS codes were set to "00" and I'm trying
to fix that, so that I can use the data. I can't just do a bulk replace,
unfortunately.

Stated differently, the problem is that the FIPS code may be wrong given a
state abbreviation. The way to do this is to create a new file with two
columns: one is the FIPS code, call this variable CORRECTFIPS and the other
is the state abbreviation, which must be called TO_ST_ABBR. Sort this file
by TO_ST_ABBR and save. Get your master file and sort it by TO_ST_ABBR. Then
do a match files using the table option to add the correct fips code
variable (CORRECTFIPS) to each record. You can then compare the the two
values and mark your third variable as you show in your 'good code'.

However, you can do this in one step if you use the Update command and
change the name of the variable previously called CORRECTFIPS to TO_FIPS.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Basic loop-vector command help?

hillel vardi
In reply to this post by Brian-31
Shalom


 I believe you don't need any  loops or macro to do the job. you can do
it the way it is done in relational data base like access or oracle .

First  you aggregate your big file on    State_FIPS  State_Abbr2 .

that will give you all the combination  of State_FIPS  and  State_Abbr2
the corrected ones and the mistakes

If your data is OK you will get  51 lines for the  stats .

If there are mistakes in the data you will get more lines.

Now  to that table you add a column of the new code (the corrected code)
  and use that table as an intermediate table between your big file and
the lookup table.


Here is an example

_
Your original data (the big file)_

FIP Abbr

01 AL
01 Al
04 AZ
06 CA
27 ii
27 ii
27 ii
33 cr

_ After aggregate_

FIP Abbr  n

01 AL   2
04 AZ   1
06 CA   1
27 ii   3
33 cr   1

_Now add new cod/coulm
_
FIP Abbr n newcode

01 AL   2     Alabama
04 AZ   1     arizona
06 CA   1     california
27 ii   3     Alabama
33 cr   1     Alabama

***  here you corrected the mistake of 27 ii  and 33 cr and give the the code State of  Alabama   ****
_
Now you do  the match_

match files   file=State_FIPS
              /table=*
              /by  State_FIPS2 State_Abbr2 .
sort cases    by newcode.
match files   file=*
              /table=codetable
              /by  newcode .

Hillel Vardi



Brian Kelley wrote:

> I'm new to SPSS, and would appreciate some help on a very basic question.
>
> I would like to be able to read in a list of state FIPS codes and
> abbreviations, then use that in a "Do if" statement to computer another
> variable in a large SPSS file that I have. I know that this is simple, but
> after many hours I'm just stuck. I can't seem to understand how the vector
> command works, its not like any array variable I've ever used before. I'm
> trying to create a two-dimensional array, then read from the array,
> replace some values in the "Do if" loop, and then compute a new variable,
> but my I'm missing something basic here. Here's the code. Any help would
> be greatly appreciated.
> thanks, Brian
>
>
> Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A).
> Begin Data
> 01AL
> 02AK
> 04AZ
> 05AR
> 06CA
> 08CO
> End Data.
>
> *Some kind of vector command?.
> *Some kind of loop comand?.
> *I would like to be able to use vectors and loops to insert the
> appropriate state abbreviation and state fips code into the following if
> loop, rather than type out the following replacement/compute command 51
> times.
> *Help!.
>
>
> Do if (To_St = "00") & (To_State_Abbr = "AL").
>  Compute To_St_2 = "01".
> Else if (To_St = "00") & (To_State_Abbr = "AK").
>  Compute To_St_2 = "02".
> End if.
>
> Execute.
>
>