|
Hi to you all,
I have been reading this group for a while, and have had great fun with it. Now I have met a problem that I do not see a elegant solution for within SPSS. I need to make a decision on how to restructure a datafile that contains thousands of cases and around 20 variables. The data comes from a database, and the one of the fields, "Code", is a table mapped to the rest of the data. Thus this field can contain from 0 to 20 pieces of information, each one with on a new line, in no particular order. There is a matching list (.xls) of total close to 600 codes, which I would like to convert to a variables and their values in my datafile. Unfortunately, I do not have access to the database, only excel files exported out of the database. OLDstructure (in excel) CaseID Code VariableA 08/000125 1--***-*** a 08/000124 1-SF- a 08/000117 1-SF- a 08/000116 1-SF-411; a 2-SF-512.2; 3-SF-512.3; 4-SF-902; 5-SF-711 08/000115 1--***-***; a 2--***-***; 3--***-***; 4--***-*** 08/000114 1-SF-300; a 2-SF-902; 3-SF-710 08/000113 1-SF-325; a 2-SF-901 08/000112 1-SF- a I want keep all information. One way would be to restructure the code so that "code" is transformed from several lines to only one line. Example of this is below. Then I can use IF INDEX(NewCode)... to give values to variabels. Newstructure (suggestion) CaseID NewCode 08/000125 1--***-*** a 08/000124 1-SF- a 08/000117 1-SF- a 08/000116 1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a 08/000115 1--***-***; 2--***-***;3--***-***;4--***-*** a So my questions are 1) What is the best way to restructure the datafile? 2) or perhaps there is a elegant way to work on files of irregular structre? I have seen datafiles where one case is allways on several lines, but never on a random number of lines. 3) Are the other more elegant ways to recode "newCode" to a set of variables. I am thinking about taking the list of 600 codes and turning it into a table with Code, Variable, Value. And then somehow matching this with the restructured datafile (but I suspect that a regular MATCH FILES would not succeed, because there is no unique ID). Sincerely, Eero Olli ________________________________________ Eero Olli Advisor the Equality and Anti-discrimination Ombud +47 2405 5951 POB 8048 Dep, N-0031 Oslo, Norway ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Eero,
May I assume that you have read in the excel file and that the spss data file looks like you have shown it. CaseID Code VariableA 08/000117 1-SF- a 08/000116 1-SF-411; a 2-SF-512.2; 3-SF-512.3; 4-SF-902; 5-SF-711 So, in the above snippet, is the caseid value and the variablea value blank for the following records: 2-SF-512.2; 3-SF-512.3; 4-SF-902; 5-SF-711 If so, then I think this is a Casestovars problem assuming you want to go from the current layout to CaseID Code1 VariableA Code2 Code3 Code4 Code5 08/000117 1-SF- a 08/000116 1-SF-411; a 2-SF-512.2; 3-SF-512.3; 4-SF-902; 5-SF-711 Assuming so, then you need to carry the caseid down the records. If (caseid eq ' ') caseid=lag(caseid). Then use the casestovars command. (The naming operation will look cleaner if you rename code1 to code. Rename variables (code1=code). Casestovars /id=caseid/index=code. Does this help? Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Eero Olli
Shalom
Here is a syntax to restructure your data from long to wide . If the 600 code list have some information on the code you should match them before the casestovars command . no problem of unique ID , the 600code table should be unique on variable "code" and you should declare it as table in the match files command (see the syntax ) . you can use the autorecode command to transform your code to numeric code . data list fixed / aseID Code VariableA(a9 a15 a4). begin data 08/000125 1--***-*** a 08/000124 1-SF- a 08/000117 1-SF- a 08/000116 1-SF-411; a 2-SF-512.2; 3-SF-512.3; 4-SF-902; 5-SF-711 08/000115 1--***-***; a 2--***-***; 3--***-***; 4--***-*** 08/000114 1-SF-300; a 2-SF-902; 3-SF-710 08/000113 1-SF-325; a 2-SF-901 08/000112 1-SF- a end data . execute . autorecode code/ into icode /print . *** if you need to match the 600 code do it here . * sort cases by code . * match files file=* / table =cod600.sav/ by code . string tmpid(a9). leave tmpid. if aseID ne ' ' tmpid=aseID . if aseID eq ' ' aseID=tmpid. SORT CASES BY aseID . CASESTOVARS /ID = aseID /drop=tmpid /GROUPBY = VARIABLE . Hillel Vardi BGU Eero Olli wrote: > Hi to you all, > > I have been reading this group for a while, and have had great fun with > it. Now I have met a problem that I do not see a elegant solution for > within SPSS. > > I need to make a decision on how to restructure a datafile that contains > thousands of cases and around 20 variables. The data comes from a > database, and the one of the fields, "Code", is a table mapped to the > rest of the data. Thus this field can contain from 0 to 20 pieces of > information, each one with on a new line, in no particular order. There > is a matching list (.xls) of total close to 600 codes, which I would > like to convert to a variables and their values in my datafile. > > Unfortunately, I do not have access to the database, only excel files > exported out of the database. > > OLDstructure (in excel) > CaseID Code VariableA > 08/000125 1--***-*** a > 08/000124 1-SF- a > 08/000117 1-SF- a > 08/000116 1-SF-411; a > 2-SF-512.2; > 3-SF-512.3; > 4-SF-902; > 5-SF-711 > 08/000115 1--***-***; a > 2--***-***; > 3--***-***; > 4--***-*** > 08/000114 1-SF-300; a > 2-SF-902; > 3-SF-710 > 08/000113 1-SF-325; a > 2-SF-901 > 08/000112 1-SF- a > > > I want keep all information. One way would be to restructure the code > so that "code" is transformed from several lines to only one line. > Example of this is below. Then I can use IF INDEX(NewCode)... to give > values to variabels. > > Newstructure (suggestion) > CaseID NewCode > 08/000125 1--***-*** > a > 08/000124 1-SF- > a > 08/000117 1-SF- > a > 08/000116 1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a > 08/000115 1--***-***; 2--***-***;3--***-***;4--***-*** a > > So my questions are > 1) What is the best way to restructure the datafile? > 2) or perhaps there is a elegant way to work on files of irregular > structre? I have seen datafiles where one case is allways on several > lines, but never on a random number of lines. > 3) Are the other more elegant ways to recode "newCode" to a set of > variables. I am thinking about taking the list of 600 codes and turning > it into a table with Code, Variable, Value. And then somehow matching > this with the restructured datafile (but I suspect that a regular MATCH > FILES would not succeed, because there is no unique ID). > > Sincerely, > Eero Olli > > > ________________________________________ > Eero Olli > Advisor > the Equality and Anti-discrimination Ombud > +47 2405 5951 > POB 8048 Dep, N-0031 Oslo, Norway > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
