Unequal Number of Lines Per Case with ID listed only in First Line

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unequal Number of Lines Per Case with ID listed only in First Line

Whanger, J. Mr. CTR
Dear SPSS-L,

I am working on a project in which the data I receive is available only in
Excel format and contains an unequal number of lines per case, but lists
the ID number in only the first line of the case.  I would like to know
how I might accomplish the following with syntax:

1. Add the ID number to each line of the unequal number of lines per case.

ID  X1
111 333
    334
    333
112 333
    333
113 333
    333
    333
    333

2. Create an additional variable (X2) within the dataset in which the
value computed is the sum of lines within a case possessing the value 334
for (X1). For example, for case 111 in the data above, the value of X2
would be 2, while the value for case 113 would be 4.

3. Create a separate dataset with a single case per variable, in which the
line chosen as the case for the new dataset is determined by the value in
(X1).  For example, in the data above, (X2)=1 if (X1)=334 in ANY lines of
the case and (X2)=0 if (X1)=333 in ALL lines of the case.

Thanks in advance for your help with this,

Jim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unequal Number of Lines Per Case with ID listed only in First Line

hillel vardi
Shalom

Here is a syntax for your first 2 requests  your 3ed one is not clear .


title      'Unequal Number of Lines Per Case with ID listed only in
First Line' .
DATA LIST / id x1 (2f4) .
BEGIN DATA
111 333
    334
    333
112 333
    333
113 333
    333
    333
    333
END DATA.
numeric    x2 sum333 sum334 (f4) .
leave   x2  sum333 sum334 .
do if    sysmis(id) eq 0 .
compute   x2=id.
compute   sum333=0.
compute   sum334=0.
else .
compute   id=x2.
end if.
if         x1 eq 333      sum333=sum(sum333,1).
if         x1 eq 334      sum334=sum(sum334,1).
execute .

Hillel Vardi
BGU



James Whanger wrote:

> Dear SPSS-L,
>
> I am working on a project in which the data I receive is available only in
> Excel format and contains an unequal number of lines per case, but lists
> the ID number in only the first line of the case.  I would like to know
> how I might accomplish the following with syntax:
>
> 1. Add the ID number to each line of the unequal number of lines per case.
>
> ID  X1
> 111 333
>     334
>     333
> 112 333
>     333
> 113 333
>     333
>     333
>     333
>
> 2. Create an additional variable (X2) within the dataset in which the
> value computed is the sum of lines within a case possessing the value 334
> for (X1). For example, for case 111 in the data above, the value of X2
> would be 2, while the value for case 113 would be 4.
>
> 3. Create a separate dataset with a single case per variable, in which the
> line chosen as the case for the new dataset is determined by the value in
> (X1).  For example, in the data above, (X2)=1 if (X1)=334 in ANY lines of
> the case and (X2)=0 if (X1)=333 in ALL lines of the case.
>
> Thanks in advance for your help with this,
>
> Jim
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unequal Number of Lines Per Case with ID listed only in First Line

Maguin, Eugene
In reply to this post by Whanger, J. Mr. CTR
James,

>>I am working on a project in which the data I receive is available only in
Excel format and contains an unequal number of lines per case, but lists
the ID number in only the first line of the case.

I'm going to assume that you have read/can read the data into spss from the
excel file. From this point on, and unless stated otherwise, I assume the
data are in an spss datafile.


>>1. Add the ID number to each line of the unequal number of lines per case.

ID  X1
111 333
    334
    333
112 333
    333
113 333
    333
    333
    333

Simple enough.

If (sysmis(id)) id=lag(id).


>>2. Create an additional variable (X2) within the dataset in which the
value computed is the sum of lines within a case possessing the value 334
for (X1). For example, for case 111 in the data above, the value of X2
would be 2, while the value for case 113 would be 4.

This doesn't make sense to me. If case 111 is the only example, a better
definition of X2 would be that X2 is the sum of the line numbers within a
case on which the value of X1 is 334. For case 111, x1=334 appears on line 2
and only line 2. Thus, X2=2. X1=334 does not appear in either case 112 or
case 113. Please explain.


>>3. Create a separate dataset with a single case per variable, in which the
line chosen as the case for the new dataset is determined by the value in
(X1).  For example, in the data above, (X2)=1 if (X1)=334 in ANY lines of
the case and (X2)=0 if (X1)=333 in ALL lines of the case.

X2 is already used in Question 2. Let us say the new variable is X3 and
defined as described above.

Having muliple records or lines per case is a real pain. I understand that
you can't do anything about the structure of the incoming data file.
However, for question 3, I'd restructure your dataset from 'long' to 'wide'
and then work with variables rather than records. I'll assume that x1 may
have other values besides 333 and 334. One other comment. I haven't tested
this code and I'm a bit skeptical that spss can handle a variable name of x1
in a vector structure. Thus it may be necessary to rename x1 to, for
instance, y. I do so here. Also, I assume that the most number of lines or
records per case is 5. You will need to adjust that number based on the
value of recs from the frequencies command.


Rename variables (x1=y).
Casestovars /id=id/count=recs.

Frequencies recs.

Compute x3a=0.
Compute x3b=0.
Vector y=y1 to y5. /* y5 assume a max of 5 lines per case. May be too small.
Loop #i=1 to recs.
+   if (x1(#i) eq 334) x3a=x3a+1.
+   if (x1(#i) eq 333) x3b=x3b+1.
End loop.
Compute x3=9.
If (x3a ge 1) x3=1.
If (x3b eq recs) x3=0.

Save outfile='<file name string'/keep=id x3.


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD