Restructuring data

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Restructuring data

Jan Veenstra
I have a very large data file (over 1 million records) that contains
information listed in multiple rows that I need to restructure so that the
rows for each ID are combined into a single row.  Please note that duplicates
are also allowed in this file.  I have included a sample below to see if
someone can help me as I am a beginner and have not been able to get the
CASETOVARS syntax to work.  I also tried to aggregate the data and then use
CASETOVARS but it only moved the row number around and not the data.  The
training I've had has been almost completely menu driven so am not familiar
with using syntax in SPSS.

Below is my sample.


Billing_ID      Line No.        HCPC Code       Modifier_1      Modifer_2
Many other variables ->->->->->->->->->->->->->->->->->->->->->

7707            1               9567            03              01
7710            1               9567                            
7710            2               9658    
7710            3               9567            


What I need them to look like is as follows.  It basically moves line numbers
2 and 3 to new variables.  I also have multiple line counts for each ID -
some have 3 lines, some have 14 lines, some have less, some have more.

Billing_ID      Line No.        HCPC Code       HCPC_2  HCPC_3  Modifier_1
Many other variables ->->->->->->->->->->->->->->->->->->->->->

7707            1               9567
03      
7710            1               9567            9658            9567


Thank you for any help you can provide me.              






> Division of Medical Economics
> TEXAS MEDICAL ASSOCIATION
> 401 West 15th Street
> Austin, TX 78701-1680
> 512/370-1461 Phone
> 512/370-1693 FAX
>
>
 
--------------------------------------------------------

Improving the Health of All Texans

 
Texas Medical Association
www.texmed.org
401 W. 15th Street
Austin, TX 78701
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring data

Richard Ristow
At 01:22 PM 8/2/2007, Jan Veenstra wrote:

>I have a data file that contains information listed in multiple rows
>that I need to restructure so that the rows for each ID are combined
>into a single row.  Please note that duplicates are also allowed in
>this file.  I have included a sample below. [I]have not been able to
>get the CASETOVARS syntax to work.  I also tried to aggregate the data
>and then use CASETOVARS but it only moved the row number around and
>not the data.

In the following, I haven't addressed the problem of duplicates. I'm
not sure what is 'duplicated' - just the HCPC_Code, the whole record,
or what? Anyway,

Input data (SPSS 15 draft output):
|-----------------------------|---------------------------|
|Output Created               |02-AUG-2007 14:31:28       |
|-----------------------------|---------------------------|
Billing_ID Line_No HCPC_Code Modifier_1 Modifier_2

    7707        1      9567       03         01
    7710        1      9567        .          .
    7710        2      9658        .          .
    7710        3      9567        .          .

Number of cases read:  4    Number of cases listed:  4

>What I need them to look like is as follows.  It basically moves line
>numbers 2 and 3 to new variables.  I have multiple line counts for
>each ID - some have 3 lines, some have more.
>
>Billing_ID Line_No HCPC_Code HCPC_2 HCPC_3 Modifier_1 Modifier_2
>7707        1       9567                     03         01
>7710        1       9567      9658   9567

So far, except for the 'Modifier' variables, and the "Many other
variables" that I'm ignoring, this is straightforward CASESTOVARS that
could be clicked up from the menus. I clicked this one up, then added
the /RENAME and /SEPARATOR subcommands. It isn't *quite* the above
format, but it's very close. (You can use a /KEEP= list on ADD FILES to
reorder the variables, if you like.) Continuing SPSS 15 draft output
(WRR:not saved separately):

SORT CASES BY Billing_ID Line_No .
CASESTOVARS
  /ID        = Billing_ID
  /INDEX     = Line_No
  /RENAME      HCPC_Code=HCPC
  /SEPARATOR = '_'
  /GROUPBY   = VARIABLE .

Cases to Variables
|----------------------------|---------------------------|
|Output Created              |02-AUG-2007 14:55:02       |
|----------------------------|---------------------------|
Generated Variables
|---------|-------|------|
|Original |Line_No|Result|
|Variable |       |------|
|         |       |Name  |
|---------|-------|------|
|HCPC_Code|1      |HCPC_1|
|         |2      |HCPC_2|
|         |3      |HCPC_3|
|---------|-------|------|

Processing Statistics
|---------------|---|
|Cases In       |4  |
|Cases Out      |2  |
|---------------|---|
|Cases In/Cases |2.0|
|Out            |   |
|---------------|---|
|Variables In   |5  |
|Variables Out  |6  |
|---------------|---|
|Index Values   |3  |
|---------------|---|

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |02-AUG-2007 14:55:02       |
|-----------------------------|---------------------------|
Billing_ID Modifier_1 Modifier_2 HCPC_1 HCPC_2 HCPC_3

    7707        03         01      9567      .      .
    7710         .          .      9567   9658   9567

Number of cases read:  2    Number of cases listed:  2
...............

Now, what of Modifier_1, Modifier_2, and the "many other variables"? If
they appear on every line for each ID, and have the same values on
every line, this code will work as is, though always putting the HCPC
codes at the end of the records. If, say, they only appear on Line 1,
you need something more, possibly using AGGREGATE but I can't say at
this point.

-Good luck,
  Richard
===================
APPENDIX: Test data
===================
(This code works, but gives warning messages.)

DATA LIST LIST SKIP=1
    /Billing_ID  Line_No   HCPC_Code   Modifier_1   Modifier_2
     (F4,         F2,        F4,         N2,           N2).
BEGIN DATA
     Billing_ID  Line No.  HCPC Code   Modifier_1   Modifier_2
     7707           1        9567         03           01
     7710           1        9567
     7710           2        9658
     7710           3        9567
END DATA.
FORMATS Modifier_1 Modifier_2 (N2).
LIST.
Reply | Threaded
Open this post in threaded view
|

Transforming blood pressure measures

Frommelt, Robert A.
In reply to this post by Jan Veenstra
I have blood pressure measures that are stored as a string variable
120/80.  I would like to convert that measure to two numeric variables
so that I will sysbp 120 and diabp 80.

Is there anyway to do that in SPSS using syntax?

Thanks,

Allen

This email and all attachments are confidential and intended solely
for the use of the individual or entity to which they are addressed.
If you have received this email in error please notify the sender
by replying to this message. If you are not the intended recipient,
please delete this message and all attachments immediately.  Do not
copy, disclose, use or act upon the information contained. Please
note that any views or opinions presented in this email are solely
those of the author and do not necessarily represent those of the
company. Finally, the recipient should check this email and any
attachments for the presence of viruses. While every attempt is made
to verify that the contents are safe, the company accepts no liability
for any damage caused by any virus transmitted by this email.
Reply | Threaded
Open this post in threaded view
|

Re: Transforming blood pressure measures

Richard Ristow
At 03:54 PM 8/2/2007, Frommelt, Robert A. wrote:

>I have blood pressure measures that are stored as a string variable
>120/80.  I would like to convert that measure to two numeric variables
>so that I will sysbp 120 and diabp 80.

What the heck. Not tested, but if the string variable is named "bp":

NUMERIC sysbp diabp(F3).
COMPUTE #Break = INDEX(bp,'/').
COMPUTE sysbp  = NUMBER(substr(bp,1,#Break-1),F4).
COMPUTE diabp  = NUMBER(substr(bp,#Break+1)  ,F4).
Reply | Threaded
Open this post in threaded view
|

go to in spss15 vs spss13

JOSE MARIA GARCIA TULICH
In reply to this post by Jan Veenstra
In spss 15 I cannot make (go to) as it made it in version 13, that is to
say, with the window of go to digit the variable to which queria to go and
takes to me automatically.

there is some recommendable option for this case
Reply | Threaded
Open this post in threaded view
|

Re: go to in spss15 vs spss13

Hector Maletta
         Jose Maria,
         Your problem is not clear. It may or may not be the same as mine.
You seem to mean that in the DATA Editor window you want to FIND some
particular value of a variable. In that case, you should position yourself
in the corresponding column, click on Edit - Find (that is "Editar - Buscar"
in Spanish), and that works perfectly in my own V.15 (mine is in English).
Another possible meaning of your question is that you cannot go to a certain
target (a digit, a word) in a SYNTAX editor window. In this case you click
on Edit - Find (or Replace). A third different problem is finding a
particular VARIABLE in the data editor: open the variable information
dialog, find a variable, and click Go To. I am not clear which one is your
trouble. My own problem is limited to the syntax editor. The other two work
fine in my version.

         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
JOSE MARIA GARCIA TULICH
Sent: 02 August 2007 19:28
To: [hidden email]
Subject: go to in spss15 vs spss13

         In spss 15 I cannot make (go to) as it made it in version 13, that
is to
         say, with the window of go to digit the variable to which queria to
go and
         takes to me automatically.

         there is some recommendable option for this case