I'm new to SPSS, and would appreciate some help on a very basic question.
I would like to be able to read in a list of state FIPS codes and abbreviations, then use that in a "Do if" statement to computer another variable in a large SPSS file that I have. I know that this is simple, but after many hours I'm just stuck. I can't seem to understand how the vector command works, its not like any array variable I've ever used before. I'm trying to create a two-dimensional array, then read from the array, replace some values in the "Do if" loop, and then compute a new variable, but my I'm missing something basic here. Here's the code. Any help would be greatly appreciated. thanks, Brian Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A). Begin Data 01AL 02AK 04AZ 05AR 06CA 08CO End Data. *Some kind of vector command?. *Some kind of loop comand?. *I would like to be able to use vectors and loops to insert the appropriate state abbreviation and state fips code into the following if loop, rather than type out the following replacement/compute command 51 times. *Help!. Do if (To_St = "00") & (To_State_Abbr = "AL"). Compute To_St_2 = "01". Else if (To_St = "00") & (To_State_Abbr = "AK"). Compute To_St_2 = "02". End if. Execute. |
Brian,
I may not understand you at all but it seems to me that you might be trying to work with two files. The first contains the FIPS (whatever they are) and state abbreviations. The second is the 'large spss file' you mention having. If this is true and the second file has the FIPS codes as a variable but not the abbreviations, then you should use the table keyword of the match files command. Arrays and vectors aren't involved. If you don't have two files then, perhaps you can better explain what you do have because I can't imagine what you are describing. Gene Maguin |
In reply to this post by Brian-31
Hi Gene, thanks for the reply. I'll try to provide more information.
I am writing a syntax file that I will use the run command to perform operations on my larger file (Master_SPSS). My intent with the syntax file is to automate some data changes in some fields in the Master_SPSS file. I can currently "run" the Syntax in the syntax file that I submitted and it will make changes on the Master file, but I'd perfer not to have to write two lines of code for every state. I was hoping that I could read into a vector file (that I seen as an array, maybe I'm wrong) the state FIPS codes and abbreviations, then use the data in the vector to fill in a variable in the Do if loop that would test for two values in the larger file (1.5million records) and then if correct, a value in a third field would be changed. I'm keeping the names of the internal syntax fields different than the names of the fields for the larger SPSS file. In other languages, I'd write some type of array and then use a substitution variable, and I could iterate the Do if loop through the different records in the array. I think that I must have some fundamental misunderstanding of how vectors and arrays interact. I hope that this helps? best, Brian |
In reply to this post by Brian-31
HI Gene, this is more detail. yes, the code that I sent you has variables
in the Do if loop that belong to the larger Master SPSS file. What I want to do is to substitute some of those values for values from the vector, so I don't have to have a separate line for each state (AL FIPS 01, AK FIPS 02, etc.) I tried various ways to assign the value of the vector to the do if loop, as follows, but clearly it's not working. I don't understand how to assign the first and second value of a vector/array to a value, and the SPSS reference documents don't help. Thanks for any help you can provide. Brian ========== Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A). Begin Data 01AL 02AK 04AZ 08CO End Data. Vector V=State_FIPSV to State_AbbrV. Loop #Counter 1 to 4. Do if (To_St = "00") & (To_State_Abbr = (V(#Counter)). Compute To_St_2 = ("State_FIPS2 ... this is one place of confusion. End if. End Loop. Execute. |
In reply to this post by Brian-31
Brian,
I still don't really get what you are doing. So now it sounds like you have a single file, the 'master file', and that file contains a FIPS variable and a state abbreviation variable and some other variables. You say: I was hoping that I could read into a vector file (that I seen as an array, maybe I'm wrong) the state FIPS codes and abbreviations, then use the data in the vector to fill in a variable in the Do if loop that would test for two values in the larger file (1.5million records) and then if correct, a value in a third field would be changed. In spss you could do this. Numeric FIPS1 to FIPS50(f2.0). String ABBR1 to ABBR50(A2). Do repeat x=FIPS1 to FIPS50/ y=1 2 3 4 5 ..... 50. + compute x=y. End repeat. Do repeat x=ABBR1 to ABBR50/ y='AL' 'AR' ..... 'WY'. + compute x=y. End repeat. But, you have now added 100 variables to every record in your master file. Now to use use those variables, I'll bet that you have to do search. Let M_FIPS and M_ABBR be the variables in your master file. It could be that you are wanting to check whether the M_FIPS and M_ABBR are both on the 'list' represented by FIPS1-50 and ABBR1-50. If so, you could do Vector FIPS=FIPS1 to FIPS50/ABBR=ABBR1 to ABBR50. Do if (ABBR(M_FIPS) eq M_ABBR). + some computations follow. Else. + some other computations follow. End if. Of couse, if M_FIPS does not go from 1 to 50, then you have to search FIPS to find the correct value which then tells you the index to use to compare abbreviations. Having said all that, let me add that spss can handle matrix data. Look at and read the section of the documentation on the Matrix-End matrix command set. If this is really what you are after, then there are others on the list that are especially good with that command set. Gene Maguin |
In reply to this post by Brian-31
somebody asked what FIPS codes are. Federal Information Processing
Standards are used to assign standard values to variables. In this instance the Geography Division of Census has assigned codes to every state, county, place, tract and block. There are also internationally agreed standard codes for countries, provinces, and cities. The use of these standard codes allows data from different sources to be matched. Many mapping and Geographic Information Systems (GIS) use these to match user data to geographic data. If I understand the OP's query. there is a master file. say MASTER.SAV Sometimes the FIPS state code is '00' or missing. There is also a file of FIPS and post office state codes. say FIPS_Table.SAV already in post office code order. Something like this untested syntax should work. The rename line and save into a different name are there to avoid the "sin" of creating a situation in which one cannot go back and do it over correctly this time. get file= 'd:\project\master.sav. rename vars(state_fips = old_state_fips). sort cases by State_Abbr2. match files file=* /table = 'd:\project\FIPS_Table.SAV' by State_Abbr2. crosstabs tables= old_state_fips by state_fips /missing = include. save outfile= 'd:project\master2.sav'. I have some boilerplate syntax files for states and roughly equivalent entities that contains the FIPS codes, post office codes, and long string names, to use for value labels. If anyone would like this file send me an email, I can send them. (After the holidays, I'll check whether this syntax should go on Developer Central.) Art Kendall Social Research Consultants Brian Kelley wrote: >HI Gene, this is more detail. yes, the code that I sent you has variables >in the Do if loop that belong to the larger Master SPSS file. What I want >to do is to substitute some of those values for values from the vector, so >I don't have to have a separate line for each state (AL FIPS 01, AK FIPS >02, etc.) > >I tried various ways to assign the value of the vector to the do if loop, >as follows, but clearly it's not working. I don't understand how to assign >the first and second value of a vector/array to a value, and the SPSS >reference documents don't help. >Thanks for any help you can provide. Brian > >========== > >Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A). >Begin Data >01AL >02AK >04AZ >08CO >End Data. > >Vector V=State_FIPSV to State_AbbrV. > >Loop #Counter 1 to 4. > >Do if (To_St = "00") & (To_State_Abbr = (V(#Counter)). > Compute To_St_2 = ("State_FIPS2 ... this is one place of confusion. >End if. >End Loop. > >Execute. > > > > |
In reply to this post by Brian-31
OK, this one seems to have got a little tangled.
At 02:56 PM 11/21/2006, Brian Kelley wrote: >I would like to be able to read in a list of state FIPS codes >and abbreviations, then use that in a "Do if" statement to computer >another variable in a large SPSS file that I have. [Something like] > >Do if (To_St = "00") & (To_State_Abbr = "AL"). > Compute To_St_2 = "01". >Else if (To_St = "00") & (To_State_Abbr = "AK"). > Compute To_St_2 = "02". >End if. Forget "DO IF" and "VECTOR"; those are the means, not the end you're looking for. It doesn't look like it's ever been clear what you're trying to do, and without that, there'll never be a solution. It seems clear you want to do a table look-up of some kind. You have a table that matches FIPS codes and state abbreviations (in technical use, "postal codes") for the states; it has 52 entries, or thereabouts. In your very large file, you have a FIPS variable called "To_St" and a postal-code variable called "To_State_Abbr". They *should* match, i.e. be for the same state; goodness knows what you want to do, if they don't. If they do match, you're mapping the state into another state code, "To_St_2". But I may not have this right. Your second clause has >Else if (To_St = "00") & (To_State_Abbr = "AK"). > Compute To_St_2 = "02". If you really mean 'To_St = "00"' here, then To_St *isn't* the FIPS code, and I've no idea what you want the FIPS code for. (If you'll excuse me, why the project in the first place? You have two perfectly good state codes already: FIPS and postal. And if you have two different 2-digit state codes in your file - FIPS, in which Alabama is "00", and yours in which Alabama is "01" - you've handed yourself great opportunity for confusion.) Anyway, to do a table lookup in SPSS, you can 1. SORT CASES and MATCH FILES, as Art Kendall suggested. That's the most standard, SPSS-idiomatic technique; but if the file to be sorted, and then probably re-sorted, has 1.5 million records, that may be inefficient. 2. If you don't mind data in code, you can write your table as a RECODE statement (preferred) or a DO IF chain (as you started to write). For the trouble you're putting yourself through, you probably could have written the DO IF chain by hand, by now. 3. Or, you can use AUTORECODE with a template file that you generate. Efficient lookup, and the data isn't in the code. 4. Or, finally, have the table in a set of variables IN EVERY CASE. They you can use VECTOR/LOOP logic, though this can be a roundabout way of doing it. It seems to be what you're thinking about. Code below is not tested. 1. MATCH FILES: see Art's code, modify as necessary. 2. Data in code, using RECODE: * The variable "#To_St_All" is useful in . * all of the approaches, except maybe . * MATCH FILES. . STRING #To_St_All (A4). COMPUTE #To_St_All = CONCAT(To_St, To_State_Abbr). RECODE #To_St_All (" " = "99") ("00AL" = "01") ("01AK" = "02") /* 01AK, not 00AK */ ... (ELSE = "98") INTO To_St_2. I'll skip code for other methods now - my energy is not what I'd hoped for. If you're determined, and really don't want data in code, look at AUTORECODE with a template; that may be your best bet. If you like, AND if I understand what you want, I'll try it later. Or the VECTOR/LOOP logic, mainly to illustrate its complexity and general awkwardness. (It takes a CASESTOVARS followed by a MATCH FILES just to make the lookup table available for VECTOR/LOOP.) -Good luck, Richard Ristow |
The need for this sort of table lookup function comes up a lot, so we have added a lookup function to the extendedTransforms programmability module that will be on SPSS Developer Central at the end of November. With that, you can read in a table - from text or an SPSS dataset or any other source, including just creating a little Python dictionary in the code, and then you can just do the lookup with the variable as a key. Doesn't matter whether it is string or numeric, and you can supply a value to use if the key is not in the table. This will require SPSS 15 with programmability installed.
I'll post a complete example when the new module is available. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion on behalf of Richard Ristow Sent: Wed 11/22/2006 11:51 AM To: [hidden email] Subject: Re: [SPSSX-L] Basic loop-vector command help? OK, this one seems to have got a little tangled. At 02:56 PM 11/21/2006, Brian Kelley wrote: >I would like to be able to read in a list of state FIPS codes >and abbreviations, then use that in a "Do if" statement to computer >another variable in a large SPSS file that I have. [Something like] > >Do if (To_St = "00") & (To_State_Abbr = "AL"). > Compute To_St_2 = "01". >Else if (To_St = "00") & (To_State_Abbr = "AK"). > Compute To_St_2 = "02". >End if. Forget "DO IF" and "VECTOR"; those are the means, not the end you're looking for. It doesn't look like it's ever been clear what you're trying to do, and without that, there'll never be a solution. It seems clear you want to do a table look-up of some kind. You have a table that matches FIPS codes and state abbreviations (in technical use, "postal codes") for the states; it has 52 entries, or thereabouts. In your very large file, you have a FIPS variable called "To_St" and a postal-code variable called "To_State_Abbr". They *should* match, i.e. be for the same state; goodness knows what you want to do, if they don't. If they do match, you're mapping the state into another state code, "To_St_2". But I may not have this right. Your second clause has >Else if (To_St = "00") & (To_State_Abbr = "AK"). > Compute To_St_2 = "02". If you really mean 'To_St = "00"' here, then To_St *isn't* the FIPS code, and I've no idea what you want the FIPS code for. (If you'll excuse me, why the project in the first place? You have two perfectly good state codes already: FIPS and postal. And if you have two different 2-digit state codes in your file - FIPS, in which Alabama is "00", and yours in which Alabama is "01" - you've handed yourself great opportunity for confusion.) Anyway, to do a table lookup in SPSS, you can 1. SORT CASES and MATCH FILES, as Art Kendall suggested. That's the most standard, SPSS-idiomatic technique; but if the file to be sorted, and then probably re-sorted, has 1.5 million records, that may be inefficient. 2. If you don't mind data in code, you can write your table as a RECODE statement (preferred) or a DO IF chain (as you started to write). For the trouble you're putting yourself through, you probably could have written the DO IF chain by hand, by now. 3. Or, you can use AUTORECODE with a template file that you generate. Efficient lookup, and the data isn't in the code. 4. Or, finally, have the table in a set of variables IN EVERY CASE. They you can use VECTOR/LOOP logic, though this can be a roundabout way of doing it. It seems to be what you're thinking about. Code below is not tested. 1. MATCH FILES: see Art's code, modify as necessary. 2. Data in code, using RECODE: * The variable "#To_St_All" is useful in . * all of the approaches, except maybe . * MATCH FILES. . STRING #To_St_All (A4). COMPUTE #To_St_All = CONCAT(To_St, To_State_Abbr). RECODE #To_St_All (" " = "99") ("00AL" = "01") ("01AK" = "02") /* 01AK, not 00AK */ ... (ELSE = "98") INTO To_St_2. I'll skip code for other methods now - my energy is not what I'd hoped for. If you're determined, and really don't want data in code, look at AUTORECODE with a template; that may be your best bet. If you like, AND if I understand what you want, I'll try it later. Or the VECTOR/LOOP logic, mainly to illustrate its complexity and general awkwardness. (It takes a CASESTOVARS followed by a MATCH FILES just to make the lookup table available for VECTOR/LOOP.) -Good luck, Richard Ristow |
In reply to this post by Brian-31
Richard, thanks very much; I will pursue the recode angle. I have also
tried to use a macro, which according to the manual should work, but it does not. So, I'm still struggling to make this code work. Because you asked ... the reason that I am having to do this is that some of the FIPS codes are correct, but some are not due to the history of this data. In some cases, the state FIPS codes were set to "00" and I'm trying to fix that, so that I can use the data. I can't just do a bulk replace, unfortunately. I'll try another way to describe the situation by showing syntax that DOES work and syntax that DOES NOT work, and maybe someone can help me, and I'm trying to use a macro instead of vectors which may work more effectively. 1. GOOD Code. This basic code works by first checking the values in two fields of my main database (Master_spare) and then depending on the results of that process, it computes a new value for a third field. GET FILE='E:\IRS Migration Data\County Migration Files\Co_9304_Master_spare.sav'. DATASET NAME Master_IRS WINDOW=FRONT. If (TO_ST = "00" And TO_ST_ABBR = "AL") To_St_2 = "01". If (TO_ST = "00" And TO_ST_ABBR = "AK") To_St_2 = "02". Execute. 2. BAD Code. Because I'm hoping to avoid having to write a single line for each of the 50 states, and I want to learn more about how to tap the power of SPSS syntax, I've tried to use vectors, loops and now I'm trying a macro. Unfortnately, the macro below does not work, although the SPSS file does have "transformations pending" on the bottom of the screen; when I choose "run transformations" nothing happens, and my data options are set on "run transformations immediately". So I don't know what transformations are pending, but I can't get them to show up, in any event. I've reworked this macro many times, and it should work given what I can read, but it does not. GET FILE='E:\IRS Migration Data\County Migration Files\Co_9304_Master_spare.sav'. DATASET NAME Master_IRS WINDOW=FRONT. DEFINE !No_Zeros (StateFips = !Tokens(1) /StateAbbr = !Tokens(1)) Do if (TO_ST = "00" and TO_ST_ABBR = "AL"). Compute TO_ST_2 = "01". End If. !ENDDEFINE. !No_Zeros "01" "AL". EXECUTE. (Result: no change in the main spss file, notification that Transformations are pending). Any help would be much appreciated! best, Brian |
Brian,
From your reply to Richard (Richard, thanks for asking the right question!): >>Because you asked ... the reason that I am having to do this is that some of the FIPS codes are correct, but some are not due to the history of this data. In some cases, the state FIPS codes were set to "00" and I'm trying to fix that, so that I can use the data. I can't just do a bulk replace, unfortunately. Stated differently, the problem is that the FIPS code may be wrong given a state abbreviation. The way to do this is to create a new file with two columns: one is the FIPS code, call this variable CORRECTFIPS and the other is the state abbreviation, which must be called TO_ST_ABBR. Sort this file by TO_ST_ABBR and save. Get your master file and sort it by TO_ST_ABBR. Then do a match files using the table option to add the correct fips code variable (CORRECTFIPS) to each record. You can then compare the the two values and mark your third variable as you show in your 'good code'. However, you can do this in one step if you use the Update command and change the name of the variable previously called CORRECTFIPS to TO_FIPS. Gene Maguin |
In reply to this post by Brian-31
Shalom
I believe you don't need any loops or macro to do the job. you can do it the way it is done in relational data base like access or oracle . First you aggregate your big file on State_FIPS State_Abbr2 . that will give you all the combination of State_FIPS and State_Abbr2 the corrected ones and the mistakes If your data is OK you will get 51 lines for the stats . If there are mistakes in the data you will get more lines. Now to that table you add a column of the new code (the corrected code) and use that table as an intermediate table between your big file and the lookup table. Here is an example _ Your original data (the big file)_ FIP Abbr 01 AL 01 Al 04 AZ 06 CA 27 ii 27 ii 27 ii 33 cr _ After aggregate_ FIP Abbr n 01 AL 2 04 AZ 1 06 CA 1 27 ii 3 33 cr 1 _Now add new cod/coulm _ FIP Abbr n newcode 01 AL 2 Alabama 04 AZ 1 arizona 06 CA 1 california 27 ii 3 Alabama 33 cr 1 Alabama *** here you corrected the mistake of 27 ii and 33 cr and give the the code State of Alabama **** _ Now you do the match_ match files file=State_FIPS /table=* /by State_FIPS2 State_Abbr2 . sort cases by newcode. match files file=* /table=codetable /by newcode . Hillel Vardi Brian Kelley wrote: > I'm new to SPSS, and would appreciate some help on a very basic question. > > I would like to be able to read in a list of state FIPS codes and > abbreviations, then use that in a "Do if" statement to computer another > variable in a large SPSS file that I have. I know that this is simple, but > after many hours I'm just stuck. I can't seem to understand how the vector > command works, its not like any array variable I've ever used before. I'm > trying to create a two-dimensional array, then read from the array, > replace some values in the "Do if" loop, and then compute a new variable, > but my I'm missing something basic here. Here's the code. Any help would > be greatly appreciated. > thanks, Brian > > > Data List /State_FIPS2 1-2 (A) State_Abbr2 3-4 (A). > Begin Data > 01AL > 02AK > 04AZ > 05AR > 06CA > 08CO > End Data. > > *Some kind of vector command?. > *Some kind of loop comand?. > *I would like to be able to use vectors and loops to insert the > appropriate state abbreviation and state fips code into the following if > loop, rather than type out the following replacement/compute command 51 > times. > *Help!. > > > Do if (To_St = "00") & (To_State_Abbr = "AL"). > Compute To_St_2 = "01". > Else if (To_St = "00") & (To_State_Abbr = "AK"). > Compute To_St_2 = "02". > End if. > > Execute. > > |
Free forum by Nabble | Edit this page |