How to best restructure a datafile to include a multiline field

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to best restructure a datafile to include a multiline field

Eero Olli
Hi to you all,

I have been reading this group for a while, and have had great fun with
it. Now I have met a problem that I do not see a elegant solution for
within SPSS.

I need to make a decision on how to restructure a datafile that contains
thousands of cases and around 20 variables. The data comes from a
database, and the one of the fields, "Code",  is a table mapped to the
rest of the data. Thus this field can contain from 0 to 20 pieces of
information, each one with on a new line, in no particular order.  There
is a matching list (.xls) of total close to 600 codes, which I would
like to convert to a variables and their values in my datafile.

Unfortunately, I do not have access to the database, only excel files
exported out of the database.

OLDstructure (in excel)
CaseID          Code         VariableA
08/000125       1--***-***   a
08/000124       1-SF-        a
08/000117       1-SF-        a
08/000116       1-SF-411;    a
        2-SF-512.2;
        3-SF-512.3;
        4-SF-902;
        5-SF-711
08/000115       1--***-***;  a
        2--***-***;
        3--***-***;
        4--***-***
08/000114       1-SF-300;    a
        2-SF-902;
        3-SF-710
08/000113       1-SF-325;    a
        2-SF-901
08/000112       1-SF-        a


I want keep all information.  One way would be to restructure the code
so that "code" is transformed from several lines to only one line.
Example of this is below. Then I can use IF INDEX(NewCode)... to give
values to variabels.

Newstructure (suggestion)
CaseID          NewCode
08/000125       1--***-***
a
08/000124       1-SF-
a
08/000117       1-SF-
a
08/000116   1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a
08/000115       1--***-***; 2--***-***;3--***-***;4--***-***     a

So my questions are
1) What is the best way to restructure the datafile?
2) or perhaps there is a elegant way to work on files of irregular
structre? I have seen datafiles where one case is allways on several
lines, but never on a random number of lines.
3) Are the other more elegant ways to recode "newCode" to a set of
variables. I am thinking about taking the list of 600 codes and turning
it into a table with Code, Variable, Value. And then somehow matching
this with the restructured datafile (but I suspect that a regular MATCH
FILES would not succeed, because there is no unique ID).

Sincerely,
Eero Olli


________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to best restructure a datafile to include a multiline field

Maguin, Eugene
Eero,

May I assume that you have read in the excel file and that the spss data
file looks like you have shown it.

CaseID          Code         VariableA
08/000117       1-SF-        a
08/000116       1-SF-411;    a
                2-SF-512.2;
                3-SF-512.3;
                4-SF-902;
                5-SF-711

So, in the above snippet, is the caseid value and the variablea value blank
for the following records:
                2-SF-512.2;
                3-SF-512.3;
                4-SF-902;
                5-SF-711

If so, then I think this is a Casestovars problem assuming you want to go
from the current layout to
CaseID          Code1     VariableA  Code2        Code3       Code4
Code5
08/000117       1-SF-        a
08/000116       1-SF-411;    a       2-SF-512.2;  3-SF-512.3; 4-SF-902;
5-SF-711

Assuming so, then you need to carry the caseid down the records.

If (caseid eq '         ') caseid=lag(caseid).

Then use the casestovars command. (The naming operation will look cleaner if
you rename code1 to code.
Rename variables (code1=code).
Casestovars /id=caseid/index=code.


Does this help?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to best restructure a datafile to include a multiline field

hillel vardi
In reply to this post by Eero Olli
Shalom

Here is a syntax to restructure  your data from long to wide .
If the 600 code list have some information on the code you should match
them before the casestovars command .
no problem of  unique ID ,   the 600code table should be unique on
variable "code" and you should declare  it as table in the
match files  command (see the syntax ) .
you can use the autorecode command to transform your code to numeric code .

data list    fixed / aseID  Code   VariableA(a9 a15 a4).
begin data
08/000125    1--***-***   a
08/000124    1-SF-        a
08/000117    1-SF-        a
08/000116    1-SF-411;    a
             2-SF-512.2;
             3-SF-512.3;
             4-SF-902;
             5-SF-711
08/000115    1--***-***;  a
             2--***-***;
             3--***-***;
             4--***-***
08/000114    1-SF-300;    a
             2-SF-902;
             3-SF-710
08/000113    1-SF-325;    a
             2-SF-901
08/000112    1-SF-        a
end data .
execute .
autorecode   code/ into icode /print .

*** if you need to match the 600 code do it here .
* sort cases by code .
* match files   file=* / table =cod600.sav/ by code .

string       tmpid(a9).
leave     tmpid.
if           aseID ne '         '   tmpid=aseID .
if           aseID eq '         '   aseID=tmpid.
SORT CASES BY aseID .
CASESTOVARS
 /ID = aseID
 /drop=tmpid
 /GROUPBY = VARIABLE .


Hillel Vardi
BGU
Eero Olli wrote:

> Hi to you all,
>
> I have been reading this group for a while, and have had great fun with
> it. Now I have met a problem that I do not see a elegant solution for
> within SPSS.
>
> I need to make a decision on how to restructure a datafile that contains
> thousands of cases and around 20 variables. The data comes from a
> database, and the one of the fields, "Code",  is a table mapped to the
> rest of the data. Thus this field can contain from 0 to 20 pieces of
> information, each one with on a new line, in no particular order.  There
> is a matching list (.xls) of total close to 600 codes, which I would
> like to convert to a variables and their values in my datafile.
>
> Unfortunately, I do not have access to the database, only excel files
> exported out of the database.
>
> OLDstructure (in excel)
> CaseID          Code         VariableA
> 08/000125       1--***-***   a
> 08/000124       1-SF-        a
> 08/000117       1-SF-        a
> 08/000116       1-SF-411;    a
>         2-SF-512.2;
>         3-SF-512.3;
>         4-SF-902;
>         5-SF-711
> 08/000115       1--***-***;  a
>         2--***-***;
>         3--***-***;
>         4--***-***
> 08/000114       1-SF-300;    a
>         2-SF-902;
>         3-SF-710
> 08/000113       1-SF-325;    a
>         2-SF-901
> 08/000112       1-SF-        a
>
>
> I want keep all information.  One way would be to restructure the code
> so that "code" is transformed from several lines to only one line.
> Example of this is below. Then I can use IF INDEX(NewCode)... to give
> values to variabels.
>
> Newstructure (suggestion)
> CaseID          NewCode
> 08/000125       1--***-***
> a
> 08/000124       1-SF-
> a
> 08/000117       1-SF-
> a
> 08/000116   1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a
> 08/000115       1--***-***; 2--***-***;3--***-***;4--***-***     a
>
> So my questions are
> 1) What is the best way to restructure the datafile?
> 2) or perhaps there is a elegant way to work on files of irregular
> structre? I have seen datafiles where one case is allways on several
> lines, but never on a random number of lines.
> 3) Are the other more elegant ways to recode "newCode" to a set of
> variables. I am thinking about taking the list of 600 codes and turning
> it into a table with Code, Variable, Value. And then somehow matching
> this with the restructured datafile (but I suspect that a regular MATCH
> FILES would not succeed, because there is no unique ID).
>
> Sincerely,
> Eero Olli
>
>
> ________________________________________
> Eero Olli
> Advisor
> the Equality and Anti-discrimination Ombud
>                    +47 2405 5951
> POB 8048 Dep,     N-0031 Oslo,      Norway
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD