Restructuring Long Data based on multiple ID or index variables

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Restructuring Long Data based on multiple ID or index variables

starkeel
Hi there. I am trying to restructure long data into wide, I think. I have a
dataset that looks like this (with all the following vars on one row):

InstitutionID DegreeAwardLevel DegreeField TotalWhiteMen Awarded
TotalWhiteWomenAwarded TotalBlackMenAwarded TotalBlackWomenAwarded....

I'd like to transform it so that I have one row per type of person, so one
row that tells me the total number of white men awarded a given type of
degree in a given field at a given institution, and then another row that
tells me the total number of white women awarded that same type of degree in
that same field at that same institution, etc through all the race/gender
category combinations (two for gender, 9 for race).

I've been trying to use the restructure wizard to put variables into cases.
I can't figure out if I can use this to achieve what I want, however. I've
tried listing all the race/gender values as the target variables, with
institutionid as the identification variable, and DegreeField and
DegreeAwardLevel as the index variables, but I can't figure it out. If this
is the way to achieve this data restructuring, I'm confused regarding how to
specify the levels for the index variables. This dataset has thousands of
cases, and there are hundreds of different degree fields and over 100
different institutions. There are three different types of degree award
levels.  

thanks in advance for any help anyone can offer!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring Long Data based on multiple ID or index variables

Rich Ulrich
You might want "long to wide" eventually, but your first
step must be to go from "wide to long", VarsToCases -

"I'd like to transform it so that I have one row per type of person, so one
row that tells me the total number of white men awarded a given type of
degree in a given field at a given institution, and then another row that
tells me the total number of white women awarded that same type of degree in
that same field at that same institution, etc through all the race/gender
category combinations (two for gender, 9 for race)."

You are starting with many race/gender combinations in a line,
and you are asking to have a separate line for each.  Wide to long.

I don't know what you intend to produce after that, but I suspect
that "wide" will not be very useful unless you narrow the "fields" to
a handful (from hundreds).  The "long" form will be more useful
for doing something like an unbalanced design, mixed model ANOVA. 

I would try to recode into groups that preserve the sex-race inequalities
and aggregate to get a manageably small set of totals.

Hope this helps.

Rich Ulrich



From: SPSSX(r) Discussion <[hidden email]> on behalf of starkeel <[hidden email]>
Sent: Thursday, June 20, 2019 11:06 AM
To: [hidden email]
Subject: Restructuring Long Data based on multiple ID or index variables
 
Hi there. I am trying to restructure long data into wide, I think. I have a
dataset that looks like this (with all the following vars on one row):

InstitutionID DegreeAwardLevel DegreeField TotalWhiteMen Awarded
TotalWhiteWomenAwarded TotalBlackMenAwarded TotalBlackWomenAwarded....

I'd like to transform it so that I have one row per type of person, so one
row that tells me the total number of white men awarded a given type of
degree in a given field at a given institution, and then another row that
tells me the total number of white women awarded that same type of degree in
that same field at that same institution, etc through all the race/gender
category combinations (two for gender, 9 for race).

I've been trying to use the restructure wizard to put variables into cases.
I can't figure out if I can use this to achieve what I want, however. I've
tried listing all the race/gender values as the target variables, with
institutionid as the identification variable, and DegreeField and
DegreeAwardLevel as the index variables, but I can't figure it out. If this
is the way to achieve this data restructuring, I'm confused regarding how to
specify the levels for the index variables. This dataset has thousands of
cases, and there are hundreds of different degree fields and over 100
different institutions. There are three different types of degree award
levels. 

thanks in advance for any help anyone can offer!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring Long Data based on multiple ID or index variables

starkeel
Thanks for your response! And oops, I labelled what I did incorrectly (long
to wide), but if you see the description you'll see I did attempt to do
variables to cases already (wide to long).

Does anyone have any tips as to how I might manage this vars to cases
transformation in which I have multiple variables that I'm using as index
variables? I can figure out which variables to use (institution, degree
award level, and degree field), I just can't figure out which numerical
values to input for each level of each index variable.

I'd love to see any syntax that anyone has for vars to cases involving index
variables.

Thanks!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring Long Data based on multiple ID or index variables

Rich Ulrich
Why are you speaking of "index variables"? 

As I read it - You are creating 18 rows for each present row --
sex by race (2x9).  That gives you a file format that is easy
to work with.  It does not sound like you have done that, yet.
I would probably want to create the 18 rows with new,
numeric indices that are 1-2  and 1-9. 

[adapted from online manual] 
In VarsToCases, you name "from="  as your 18 vars, then
  • The basic specification is /INDEX=sex(2) race(9)
  • The last index specified varies the fastest.
As I read your mention of index variables, blah-blah-blah,
it suggests that you THEN want to go WIDE, with hundreds
of fields-of-study across each row.  Not usable for much (IMHO),
but it is not difficult to create if you have done the first step
as described above.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of starkeel <[hidden email]>
Sent: Thursday, June 20, 2019 2:18 PM
To: [hidden email]
Subject: Re: Restructuring Long Data based on multiple ID or index variables
 
Thanks for your response! And oops, I labelled what I did incorrectly (long
to wide), but if you see the description you'll see I did attempt to do
variables to cases already (wide to long).

Does anyone have any tips as to how I might manage this vars to cases
transformation in which I have multiple variables that I'm using as index
variables? I can figure out which variables to use (institution, degree
award level, and degree field), I just can't figure out which numerical
values to input for each level of each index variable.

I'd love to see any syntax that anyone has for vars to cases involving index
variables.

Thanks!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring Long Data based on multiple ID or index variables

David Marso-2
In reply to this post by starkeel
What happens if you do VARSTOCASES / ID = uniquerowidvar/ MAKE values FROM ALL/ INDEX=varnames(values).

Assumes all variables are same type.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Restructuring Long Data based on multiple ID or index variables

Javier Figueroa
Hello!!! experts

starkeel I hope this serves you or gives you an idea of how to create cases of variables.

data list list
/var1 to var10.
begin data.
1 1 5 10 8 1 1 3 10 8
1 2 5 3 9 1 3 6 3 9
1 3 5 3 4 1 2 4 3 4
2 2 10 4 10 2 2 9 4 10
2 1 5 8 10 2 1 2 8 10
2 3 7 5 10 2 3 3 5 10
end data.
execute.

numeric id_xx(f8.0).
compute id_xx=$casenum.
execute.
variable level var1 to var10 (NOMINAL).
variable level ID_xx (scale).
VARIABLE LABELS var1 'COMPRAR'.
VARIABLE LABELS var2 'VAR 1'.
VARIABLE LABELS var3 'VAR 2'.
VARIABLE LABELS var4 'VAR 3'.
VARIABLE LABELS var5 'VAR 4'.
VARIABLE LABELS var6 'VAR 5'.
VARIABLE LABELS var7 'VAR 6'.
VARIABLE LABELS var8 'VAR 7'.
VARIABLE LABELS var9 'VAR 8'.
VARIABLE LABELS var10 'VAR 9'.

VARIABLE LABELS var1 'Sexo'.
VALUE LABELS var1
1 Masculino
2 Femenino.

VARIABLE LABELS var2 'Edad'.
VALUE LABELS var2
1 Edad 1
2 Edad 2
3 Edad 3.

VALUE LABELS var3 to var10
1 Pesimo
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 Excelente.

VARSTOCASES
  /ID=id
/MAKE escala1 FROM var3 var7
/MAKE escala2 FROM var4 var8
/MAKE escala3 FROM var5 var9
  /INDEX=Índice1(2)
  /KEEP=id_xx
var1
var2
  /NULL=DROP.

Sincerely,

El jue., 20 jun. 2019 a las 14:48, David Marso (<[hidden email]>) escribió:
What happens if you do VARSTOCASES / ID = uniquerowidvar/ MAKE values FROM ALL/ INDEX=varnames(values).

Assumes all variables are same type.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD