Restructure datafile

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Restructure datafile

Eero Olli
Dear list members,

I am struggeling with finding a efficient and foolproof way to
restructure a large datafile (10Mb).

My data looks like this. For some cases there are extra information,
that is spread over several lines. I want to restructure the datafile so
that every case is on one line.
The number of pieces of extra information varies (class), and I do not
know what will be the maximum number. Common will from 0 to 10, but I
suspect 20 pieces, or more, of extra info could take place, too.

DATA LIST LIST (",")/ casenum(A8) class (A12) .
begin data
09/1     ,1-SF-007
08/2010  ,1-SF-
08/2010  ,1-SF-
08/2009  ,1-SF-513.2;
         ,2-SF-711;
         ,3-SF-822;
         ,4-SF-902
08/2008  ,1-SF-471;
         ,2-SF-551;
         ,3-SF-711;
         ,4-SF-821;
         ,5-SF-902
END DATA.
LIST.


I would like to have the data in a format like this, where all extra
information is joined on one long text string (Class_long). And I would
like to move the extra pieces to the line with case number (because
there are actually 50 more variables on this line, while the pieces of
extra information are each alone on their line).

CASENUM CLASS_LONG
09/1    1-SF-007
08/2010 1-SF-
08/2010 1-SF-
08/2009 1-SF-513.2;2-SF-711;3-SF-822;4-SF-902
08/2008 1-SF-471;2-SF-551;3-SF-711;4-SF-821;5-SF-902

What do you think is the best way to approach this? I can see three ways
to restructure
A) some LOOP with LAG and CONCAT
B) first CASESTOVARS and then CONCAT
C) python

I am sure that there is a simple and elegant solution, that I am
missing.

Sincerely,

Eero
________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructure datafile

Daciuk, Tim
Try this:
if (casenum ne '        ') x=$casenum.
exe.
if (casenum = '        ') x = lag(x).
exe.
SORT CASES BY x .
CASESTOVARS
  /ID=x
  /GROUPBY=VARIABLE.

Tim Daciuk
Director, Worldwide Demo Resources
SPSS Inc.




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Eero Olli
Sent: Tuesday, January 06, 2009 6:54 AM
To: [hidden email]
Subject: Restructure datafile

Dear list members,

I am struggeling with finding a efficient and foolproof way to
restructure a large datafile (10Mb).

My data looks like this. For some cases there are extra information,
that is spread over several lines. I want to restructure the datafile so
that every case is on one line.
The number of pieces of extra information varies (class), and I do not
know what will be the maximum number. Common will from 0 to 10, but I
suspect 20 pieces, or more, of extra info could take place, too.

DATA LIST LIST (",")/ casenum(A8) class (A12) .
begin data
09/1     ,1-SF-007
08/2010  ,1-SF-
08/2010  ,1-SF-
08/2009  ,1-SF-513.2;
         ,2-SF-711;
         ,3-SF-822;
         ,4-SF-902
08/2008  ,1-SF-471;
         ,2-SF-551;
         ,3-SF-711;
         ,4-SF-821;
         ,5-SF-902
END DATA.
LIST.


I would like to have the data in a format like this, where all extra
information is joined on one long text string (Class_long). And I would
like to move the extra pieces to the line with case number (because
there are actually 50 more variables on this line, while the pieces of
extra information are each alone on their line).

CASENUM CLASS_LONG
09/1    1-SF-007
08/2010 1-SF-
08/2010 1-SF-
08/2009 1-SF-513.2;2-SF-711;3-SF-822;4-SF-902
08/2008 1-SF-471;2-SF-551;3-SF-711;4-SF-821;5-SF-902

What do you think is the best way to approach this? I can see three ways
to restructure
A) some LOOP with LAG and CONCAT
B) first CASESTOVARS and then CONCAT
C) python

I am sure that there is a simple and elegant solution, that I am
missing.

Sincerely,

Eero
________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD