Suggestion: MVGROUPS for logical sets of variables

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Suggestion: MVGROUPS for logical sets of variables

Richard Ristow
Below is typical VARSTOCASES syntax to unroll a dataset from 'wide' to
'long' organization:

>  ID Brand1 B1Q1 B1Q2 Brand2 B2Q1 B2Q2 Age Gender
>
>   1     5     1    2     4     2    1  25     1
>   2     4     2    2     3     2    1  26     2
>   3     3     1    2     1     1    1  27     1
>Number of cases read:  3    Number of cases listed:  3
>
>VARSTOCASES  /MAKE Brand FROM Brand1 Brand2
>  /MAKE Q1 FROM B1Q1 B2Q1
>  /MAKE Q2 FROM B1Q2 B2Q2
>  /KEEP =  ID Age Gender
>  /NULL = KEEP.
>
>  ID Age Gender Brand  Q1  Q2
>
>   1  25     1     5    1   2
>   1  25     1     4    2   1
>   2  26     2     4    2   2
>   2  26     2     3    2   1
>   3  27     1     3    1   2
>   3  27     1     1    1   1
>
>Number of cases read:  6    Number of cases listed:  6

In the 'wide' data, variables Brand1, B1Q1, and B1Q2 are logically
parallel to Brand2, B2Q1, and B2Q2. Each set of three has corresponding
data for one brand, and each set becomes a record in the 'long' file.

But to unroll them takes a separate '/MAKE' for each set of logically
equivalent variables (Brand, Q1 and Q2 in the output). And 'TO' can't
be used for the variable list; every one of the variables must be named
individually. It's not very elegant, and it would be genuinely awkward
if there were many more than two of the groups.

Analogous to MRSETS, which records one kind of relationship among
variables as an dataset attribute, I'd like to see something like this
(paralleling MRSETS syntax) for 'wide' relationships like the above:

MVGROUPS
     NAME=(Brand,Q1,Q2)
     VARIABLES= Brand1 TO B2Q2
     LABELS='Brand'
            'Quality measure 1'
            'Quality measure 2'
or
     LABELSOURCE  = VARLABLE
     FORMATSOURCE = VARFORMAT
or
     FORMATS      = (F3,F2,F2)

*   VARIABLES may only specify a 'TO' list
*   The number of variables on the list must be an exact multiple of
the number of groups named
*   Every corresponding variable on the TO list (every 'nth' variable,
if 'n' groups are named) must be the same type; if string, all must be
the same length.

EFFECT:
The above specifies three variable groups:
Brand  vars Brand1 Brand2
Q1     vars  B1Q1   B2Q1
Q2     vars  B2Q1   B2Q2

USES:
*   The name of a group may be used in syntax where a list of variables
would be accepted, and expands to the set of variables:

MVGROUPS
     NAME=(Brand,Q1,Q2)
     VARIABLES= Brand1 TO B2Q2.

DO REPEAT
    M1 = Q1
   /M2 = Q2.
.  IF MISSING (M2) M2 = M1.
END REPEAT.

VARSTOCASES  /MAKE Brand FROM Brand
  /MAKE Q1 FROM Q1
  /MAKE Q2 FROM Q2
  /KEEP =  ID Age Gender
  /NULL = KEEP.

In both of these cases, the MVGROUPS syntax would be much more compact
and readable than the direct syntax, if there were many more than two
sets of the variables.

*   A group may be indexed as a vector is

This is an extension of the rule that all elements of a vector must be
contiguous. However, since all elements of any group are equally spaced
in the dataset, it requires only a small extension to vector indexing
logic.