SPSSX Discussion

How to best restructure a datafile to include a multiline field

Classic

List

Threaded

3 messages Options

Eero Olli

How to best restructure a datafile to include a multiline field

Hi to you all,

I have been reading this group for a while, and have had great fun with
it. Now I have met a problem that I do not see a elegant solution for
within SPSS.

I need to make a decision on how to restructure a datafile that contains
thousands of cases and around 20 variables. The data comes from a
database, and the one of the fields, "Code", is a table mapped to the
rest of the data. Thus this field can contain from 0 to 20 pieces of
information, each one with on a new line, in no particular order. There
is a matching list (.xls) of total close to 600 codes, which I would
like to convert to a variables and their values in my datafile.

Unfortunately, I do not have access to the database, only excel files
exported out of the database.

OLDstructure (in excel)
CaseID Code VariableA
08/000125 1--***-*** a
08/000124 1-SF- a
08/000117 1-SF- a
08/000116 1-SF-411; a
2-SF-512.2;
3-SF-512.3;
4-SF-902;
5-SF-711
08/000115 1--***-***; a
2--***-***;
3--***-***;
4--***-***
08/000114 1-SF-300; a
2-SF-902;
3-SF-710
08/000113 1-SF-325; a
2-SF-901
08/000112 1-SF- a

I want keep all information. One way would be to restructure the code
so that "code" is transformed from several lines to only one line.
Example of this is below. Then I can use IF INDEX(NewCode)... to give
values to variabels.

Newstructure (suggestion)
CaseID NewCode
08/000125 1--***-***
a
08/000124 1-SF-
a
08/000117 1-SF-
a
08/000116 1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a
08/000115 1--***-***; 2--***-***;3--***-***;4--***-*** a

So my questions are
1) What is the best way to restructure the datafile?
2) or perhaps there is a elegant way to work on files of irregular
structre? I have seen datafiles where one case is allways on several
lines, but never on a random number of lines.
3) Are the other more elegant ways to recode "newCode" to a set of
variables. I am thinking about taking the list of 600 codes and turning
it into a table with Code, Variable, Value. And then somehow matching
this with the restructured datafile (but I suspect that a regular MATCH
FILES would not succeed, because there is no unique ID).

Sincerely,
Eero Olli

________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
+47 2405 5951
POB 8048 Dep, N-0031 Oslo, Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: How to best restructure a datafile to include a multiline field

Eero,

May I assume that you have read in the excel file and that the spss data
file looks like you have shown it.

CaseID Code VariableA
08/000117 1-SF- a
08/000116 1-SF-411; a
2-SF-512.2;
3-SF-512.3;
4-SF-902;
5-SF-711

So, in the above snippet, is the caseid value and the variablea value blank
for the following records:
2-SF-512.2;
3-SF-512.3;
4-SF-902;
5-SF-711

If so, then I think this is a Casestovars problem assuming you want to go
from the current layout to
CaseID Code1 VariableA Code2 Code3 Code4
Code5
08/000117 1-SF- a
08/000116 1-SF-411; a 2-SF-512.2; 3-SF-512.3; 4-SF-902;
5-SF-711

Assuming so, then you need to carry the caseid down the records.

If (caseid eq ' ') caseid=lag(caseid).

Then use the casestovars command. (The naming operation will look cleaner if
you rename code1 to code.
Rename variables (code1=code).
Casestovars /id=caseid/index=code.

Does this help?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

hillel vardi

Re: How to best restructure a datafile to include a multiline field

In reply to this post by Eero Olli

Shalom

Here is a syntax to restructure your data from long to wide .
If the 600 code list have some information on the code you should match
them before the casestovars command .
no problem of unique ID , the 600code table should be unique on
variable "code" and you should declare it as table in the
match files command (see the syntax ) .
you can use the autorecode command to transform your code to numeric code .

data list fixed / aseID Code VariableA(a9 a15 a4).
begin data
08/000125 1--***-*** a
08/000124 1-SF- a
08/000117 1-SF- a
08/000116 1-SF-411; a
2-SF-512.2;
3-SF-512.3;
4-SF-902;
5-SF-711
08/000115 1--***-***; a
2--***-***;
3--***-***;
4--***-***
08/000114 1-SF-300; a
2-SF-902;
3-SF-710
08/000113 1-SF-325; a
2-SF-901
08/000112 1-SF- a
end data .
execute .
autorecode code/ into icode /print .

*** if you need to match the 600 code do it here .
* sort cases by code .
* match files file=* / table =cod600.sav/ by code .

string tmpid(a9).
leave tmpid.
if aseID ne ' ' tmpid=aseID .
if aseID eq ' ' aseID=tmpid.
SORT CASES BY aseID .
CASESTOVARS
/ID = aseID
/drop=tmpid
/GROUPBY = VARIABLE .

Hillel Vardi
BGU
Eero Olli wrote:

> Hi to you all,
>
> I have been reading this group for a while, and have had great fun with
> it. Now I have met a problem that I do not see a elegant solution for
> within SPSS.
>
> I need to make a decision on how to restructure a datafile that contains
> thousands of cases and around 20 variables. The data comes from a
> database, and the one of the fields, "Code", is a table mapped to the
> rest of the data. Thus this field can contain from 0 to 20 pieces of
> information, each one with on a new line, in no particular order. There
> is a matching list (.xls) of total close to 600 codes, which I would
> like to convert to a variables and their values in my datafile.
>
> Unfortunately, I do not have access to the database, only excel files
> exported out of the database.
>
> OLDstructure (in excel)
> CaseID Code VariableA
> 08/000125 1--***-*** a
> 08/000124 1-SF- a
> 08/000117 1-SF- a
> 08/000116 1-SF-411; a
> 2-SF-512.2;
> 3-SF-512.3;
> 4-SF-902;
> 5-SF-711
> 08/000115 1--***-***; a
> 2--***-***;
> 3--***-***;
> 4--***-***
> 08/000114 1-SF-300; a
> 2-SF-902;
> 3-SF-710
> 08/000113 1-SF-325; a
> 2-SF-901
> 08/000112 1-SF- a
>
>
> I want keep all information. One way would be to restructure the code
> so that "code" is transformed from several lines to only one line.
> Example of this is below. Then I can use IF INDEX(NewCode)... to give
> values to variabels.
>
> Newstructure (suggestion)
> CaseID NewCode
> 08/000125 1--***-***
> a
> 08/000124 1-SF-
> a
> 08/000117 1-SF-
> a
> 08/000116 1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a
> 08/000115 1--***-***; 2--***-***;3--***-***;4--***-*** a
>
> So my questions are
> 1) What is the best way to restructure the datafile?
> 2) or perhaps there is a elegant way to work on files of irregular
> structre? I have seen datafiles where one case is allways on several
> lines, but never on a random number of lines.
> 3) Are the other more elegant ways to recode "newCode" to a set of
> variables. I am thinking about taking the list of 600 codes and turning
> it into a table with Code, Variable, Value. And then somehow matching
> this with the restructured datafile (but I suspect that a regular MATCH
> FILES would not succeed, because there is no unique ID).
>
> Sincerely,
> Eero Olli
>
>
> ________________________________________
> Eero Olli
> Advisor
> the Equality and Anti-discrimination Ombud
> +47 2405 5951
> POB 8048 Dep, N-0031 Oslo, Norway
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD