SPSSX Discussion

Repeated measures in large data set

Classic

List

Threaded

7 messages Options

NomiW

Repeated measures in large data set

I have a date set with approximately 1,000,000 people. Medication usage was
recorded monthly throughout one calendar year (i.e. each person has 12 time
points). The variables are numeric and refer to dosage.
I'm interested in comparing use across time, between two different regions and
three different groups. I've run Repeated Measures models with factors and
interactions. Everything is significant because the n is so large. Is there a
better way to do this? The differences between months are very small but all
pairwise comparisons are significant. How do I know which are meaningful?
(I'm particularly interested in comparing one month to the preceding and
following months).

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

MaxJasper

RE: Repeated measures in large data set

· You should avoid random-intercepts-and-slopes model with time. Such combo results in error-covariance structure that may be inappropriate.

· To find best fit/analysis: you need to conduct several analysis and then select one with lowest -2LL.

· For selection of an appropriate form of the residual covar matrix, fit a factorial model with fixed effects only (no random) and with an unstructured covar matrix.

· To find best covar structure for 2 models with same fixed effects you test if there is a significant change in -2LL.

Max.

From: NomiW [via SPSSX Discussion] [mailto:[hidden email]]
Sent: 2013-Nov-16 13:02
To: MaxJasper
Subject: Repeated measures in large data set

Art Kendall

Re: Repeated measures in large data set

In reply to this post by NomiW

Whether or not statistically different differences are meaningful is not a statistical matter but a substantive area matter. Do the differences pass the "Who cares?" test (aka as the "So what?" test.)

Art Kendall
Social Research Consultants

On 11/16/2013 3:01 PM, NomiW [via SPSSX Discussion] wrote:

I have a date set with approximately 1,000,000 people. Medication usage was
recorded monthly throughout one calendar year (i.e. each person has 12 time
points). The variables are numeric and refer to dosage.
I'm interested in comparing use across time, between two different regions and
three different groups. I've run Repeated Measures models with factors and
interactions. Everything is significant because the n is so large. Is there a
better way to do this? The differences between months are very small but all
pairwise comparisons are significant. How do I know which are meaningful?
(I'm particularly interested in comparing one month to the preceding and
following months).

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Repeated-measures-in-large-data-set-tp5723115.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Kornbrot, Diana

Re: Repeated measures in large data set

In reply to this post by MaxJasper

Re: Repeated measures in large data set For large – nd small –data sets it is the EFFECT SIZE that matters not the p-value
I.e what proportion of the variabilty around the place is due to particular predictors
OR put another way
How does the magnitude of the effect compare with the standard deviatiob
Best
Diana

On 17/11/2013 03:33, "MaxJasper" <maxjasper@...> wrote:

·         You should avoid random-intercepts-and-slopes model with time. Such combo results in error-covariance structure that may be inappropriate.

·         To find best fit/analysis: you need to conduct several analysis and then select one with lowest -2LL.

·         For selection of an appropriate form of the residual covar matrix, fit a factorial model with fixed effects only (no random) and with an unstructured covar matrix.

·         To find best covar structure for 2 models with same fixed effects you test if there is a significant change in -2LL.

Max.

From: NomiW [via SPSSX Discussion] [[hidden email] email] </user/SendEmail.jtp?type=node&node=5723118&i=0> ]
Sent: 2013-Nov-16 13:02
To: MaxJasper
Subject: Repeated measures in large data set

I have a date set with approximately 1,000,000 people. Medication usage was
recorded monthly throughout one calendar year (i.e. each person has 12 time
points). The variables are numeric and refer to dosage.
I'm interested in comparing use across time, between two different regions and
three different groups. I've run Repeated Measures models with factors and
interactions. Everything is significant because the n is so large. Is there a
better way to do this? The differences between months are very small but all
pairwise comparisons are significant. How do I know which are meaningful?
(I'm particularly interested in comparing one month to the preceding and
following months).

Thanks!

View this message in context: RE: Repeated measures in large data set <http://spssx-discussion.1045642.n5.nabble.com/Repeated-measures-in-large-data-set-tp5723115p5723118.html> Sent from the SPSSX Discussion mailing list archive <http://spssx-discussion.1045642.n5.nabble.com/> at Nabble.com.

Professor Diana Kornbrot
email: : d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
            http://go.herts.ac.uk/diana_kornbrot
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612

Rich Ulrich

Re: Repeated measures in large data set

In reply to this post by NomiW

The loading you look at are the ones in the factor structure matrix,
which has the item-to-factor correlations. The matrix before rotation
will yield the largest factor as the overall total score, reflecting the
generally-positive correlations among items on a scale. The second
factor will be "bipolar" in the sense of tending to have one set of high
correlations that are positive, and another set that are negative.

--
Rich Ulrich

> Date: Sat, 16 Nov 2013 15:00:56 -0500

> From: [hidden email]
> Subject: Repeated measures in large data set
> To: [hidden email]
>
> I have a date set with approximately 1,000,000 people. Medication usage was
> recorded monthly throughout one calendar year (i.e. each person has 12 time
> points). The variables are numeric and refer to dosage.
> I'm interested in comparing use across time, between two different regions and
> three different groups. I've run Repeated Measures models with factors and
> interactions. Everything is significant because the n is so large. Is there a
> better way to do this? The differences between months are very small but all
> pairwise comparisons are significant. How do I know which are meaningful?
> (I'm particularly interested in comparing one month to the preceding and
> following months).
>
> Thanks!
> ...

Bruce Weaver

Re: Repeated measures in large data set

Administrator

Rich, I'm not sure why you're bringing up factor analysis here. In the first post in the thread, NomiW said, "I've run Repeated Measures models with factors and interactions." I took "factors" there to mean categorical explanatory variables in the RM ANOVA model. Did you read it differently?

Cheers,
Bruce

Rich Ulrich wrote

The loading you look at are the ones in the factor structure matrix,
which has the item-to-factor correlations. The matrix before rotation
will yield the largest factor as the overall total score, reflecting the
generally-positive correlations among items on a scale. The second
factor will be "bipolar" in the sense of tending to have one set of high
correlations that are positive, and another set that are negative.

--
Rich Ulrich

> Date: Sat, 16 Nov 2013 15:00:56 -0500
> From: [hidden email]
> Subject: Repeated measures in large data set
> To: [hidden email]
>
> I have a date set with approximately 1,000,000 people. Medication usage was
> recorded monthly throughout one calendar year (i.e. each person has 12 time
> points). The variables are numeric and refer to dosage.
> I'm interested in comparing use across time, between two different regions and
> three different groups. I've run Repeated Measures models with factors and
> interactions. Everything is significant because the n is so large. Is there a
> better way to do this? The differences between months are very small but all
> pairwise comparisons are significant. How do I know which are meaningful?
> (I'm particularly interested in comparing one month to the preceding and
> following months).
>
> Thanks!
> ...

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Rich Ulrich

Re: Repeated measures in large data set

In reply to this post by NomiW

[many prior attempts.]
There is pretty general disparagement for using the very-high-d.f.
tests that you can have, but I don't know if the survey people or
data-miners have come up with the proper replacements. (Someone?)
When they do, one recommended solution will be something like this.

Given your set of data, I think that for Within comparisons (which
you seem to be talking about) I would consider using a conservative
error term constructed as follows:

For each of the 6 samples (2 regions x 3 groups), find the 11 d.f. error
term for the linear trend across 12 months; pool these, resulting in a
66 d.f. error. Sixty-six gives pretty good robustness. "Deviation from
linear trend" should give a fairly practical basis for being meaningful.

One piece of pragmatic advice for large N, which has been around for a
very long time, is that you should simple ignore all tests; focus on the
effect sizes that are meaningful in some other sense.

- When you have dozens or hundreds of tests, you can always sort them
from largest to smallest, and talk about the largest. That gets you the
right set to talk about, anyway. (I have seen the error where some PI
spends far-too-much time on over-interpreting some diddly 0.05 test
while he ignores various < 0.001 results.)

--
Rich Ulrich

> Date: Sat, 16 Nov 2013 15:00:56 -0500