Hi Everyone,
I have a gigantic (5 million + data) set that is in long form – each line currently represents a person's behavior with a particular object, and there are a bunch of different variables to do regression analyses on, in tandem with looking at the effects of an experimental variable.
I've been playing around with using "Restructure" to put it into "wide" form with a participant on each line – which is how I've used SPSS the last 8 years – but it takes forever and actually makes it harder to specify variables.
Can anyone point me to some documentation/links or provide information on whether/how I can do the analyses and graphs I want with the data in long form? Thanks a lot, Joseph |
Useful information is most apt to be relevant if you find it within your own
subject area, which is not something that you mention. What do you have? If it is 5 million subjects with multiple "objects" (and time?), then conventional testing should be largely irrelevant, since the nominal power is huge. And that same comment may apply if it is "only" 5 million lines in long form. What do you have? What I have to suggest assumes that conventional regression testing is largely irrelevant because of N. Generating useful tests based on specific, selected d.f. is a separate problem, one which usually is ignored). An obvious, potential starting point for graphs and analyses would be to explicitly subtract off the Subject effects, which is what Repeated Measures does implicitly: 1) Aggregate with /ADD VARS and subtract in order to get deviation scores within Subject. I would considering preserving that overall level as a potential predictor. 2) If the Objects have notably different means, Aggregate first (that is, before step (1)) by Object, across Subjects, with /ADD VARS , in order to get deviations scores for objects, which you then use as the starting place for step (1). I list (2) as (2) rather than making it (1) because I think I would avoid doing it when the item differences are merely "significant" for the huge N and are not "notable." I say this because the ipsatizing procedure - subtracting means - puts weight on the assumption that the data are, indeed, measured on a good scale, with equal intervals and no basement/ceiling effects. Repeated Measures does give you a good form of data (pre/post; object1/object2) for examining linearity and equal intervals. Problems are more likely when the ratings do not use the same part of the scale for all Objects. Hope this helps. Please give more detail, anyway. -- Rich Ulrich Date: Sun, 6 Oct 2013 18:46:32 -0700 From: [hidden email] Subject: How to do analyses (mainly linear regression/repeated measures ANOVA) with data sets in "long" form To: [hidden email] Hi Everyone, I have a gigantic (5 million + data) set that is in long form – each line currently represents a person's behavior with a particular object, and there are a bunch of different variables to do regression analyses on, in tandem with looking at the effects of an experimental variable.
I've been playing around with using "Restructure" to put it into "wide" form with a participant on each line – which is how I've used SPSS the last 8 years – but it takes forever and actually makes it harder to specify variables.
Can anyone point me to some documentation/links or provide information on whether/how I can do the analyses and graphs I want with the data in long form? Thanks a lot, Joseph |
Thanks a lot for your help! My subject area is psychology/education, and the 5 million number of observations is not so much the issue as the 25+ variables that could be relevant. For a couple reasons, right now I'm trying to understand how to use SPSS to analyze long-form data using tests I often use with wide data (but I plan to consider the statistical issues you're pointing out). I think the technical question I have is – how do I run a typical ANOVA/regression with SPSS data in long form, without Restructuring the data to wide form. E.g. If the data is in long form as shown below:
How would I tell SPSS to do an ANOVA with ExerciseNumber as between-sub variable, AccuracyType (AccuracyType1, AccuracyType2) as within-sub? Or tell SPSS to regress AccuracyType2 onto AccuracyType1 and ExerciseNumber?
Joseph Joseph Jay Williams, Ph.D. Graduate School of Education, Stanford University
On Sun, Oct 6, 2013 at 8:19 PM, Rich Ulrich <[hidden email]> wrote:
|
In reply to this post by joseph.williams
To begin, look at the examples for the Mixed command. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Joseph Jay Williams Thanks a lot for your help! My subject area is psychology/education, and the 5 million number of observations is not so much the issue as the 25+ variables that could be relevant. For a couple reasons, right now I'm trying to understand how to use SPSS to analyze long-form data using tests I often use with wide data (but I plan to consider the statistical issues you're pointing out). I think the technical question I have is – how do I run a typical ANOVA/regression with SPSS data in long form, without Restructuring the data to wide form. E.g. If the data is in long form as shown below:
How would I tell SPSS to do an ANOVA with ExerciseNumber as between-sub variable, AccuracyType (AccuracyType1, AccuracyType2) as within-sub? Or tell SPSS to regress AccuracyType2 onto AccuracyType1 and ExerciseNumber?
Joseph Joseph Jay Williams, Ph.D. Graduate School of Education, Stanford University On Sun, Oct 6, 2013 at 8:19 PM, Rich Ulrich <[hidden email]> wrote: Useful information is most apt to be relevant if you find it within your own
Date: Sun, 6 Oct 2013 18:46:32 -0700 Hi Everyone, I have a gigantic (5 million + data) set that is in long form – each line currently represents a person's behavior with a particular object, and there are a bunch of different variables to do regression analyses on, in tandem with looking
at the effects of an experimental variable. I've been playing around with using "Restructure" to put it into "wide" form with a participant on each line – which is how I've used SPSS the last 8 years – but it takes forever and actually makes it harder to specify variables. Can anyone point me to some documentation/links or provide information on whether/how I can do the analyses and graphs I want with the data in long form? Thanks a lot,
Joseph |
In reply to this post by Joseph Jay Williams
from the example data
you posted is this a correct description of your data?
You have participants each with doubly repeated data. The two repeated factor are exercise with up to 4 levels and type of accuracy with 2 levels. Some participants are missing data on some exercises, but have up to 4 exercises. When an exercise is missing of course its accuracy is missing. Is exercise 1 always the same exercise? Is exercise 4 always the same exercise? that gives you a table of 8 means for interaction, 2 for type of accuracy and 4 for exercise. Where do the 25 variables come into the model? The example data only has 4 variables. It is late in the day but IIRC repeated measures GLM does not accommodate missing data but MIXED does. What questions are you asking of the data? It is extremely likely that the means will be statistically distinct (significantly different). But the question then becomes is it meaningful. BUT as Rich says maybe you should just use MEANS or AGGREGATE to get the means and SDs Art Kendall Social Research ConsultantsOn 10/7/2013 5:24 PM, Joseph Jay Williams [via SPSSX Discussion] wrote:
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |