I am looking for some direction on how to proceed.
We work with adult offenders and I have been given a project to determine if there is anything of significance with aggregate health data across years for 122 offenders (these are the same offenders each year). Since the data is health data, I canât get individual data because of data privacy laws. I have several domains across these years, but Iâll just ask about one of them, because I am assuming I can use the same procedure for the other domains. I also have a lot of descriptive statistics: non- annualized means, standard deviations and medians, total months enrolled among the clients and annualized means for clients with one or more hospital visits and annualized means for all clients. Iâm only providing the annualized means statistics because Iâm assuming that statistic is the one that makes the most sense to analyze. For Hospital Visits by year, I have the following: Year 2004 2005 2006 2007 2008 Total # Clients 122 122 122 122 122 Mean # of visits (annualized) for all clients 3.4 3.67 3.72 2.06 2.08 # Clients w/ 1 or more Visit 54 58 67 52 53 Mean # of visits for clients w 1 or more visit (annualized) 5.17 5.55 5.45 4.32 3.79 The actual question(s). The intervention occurred after 2005. So 2004 and 2005 are pre-intervention years and post intervention is 2006 - 2008. What I am trying to figure out is were there differences between each of the years and, if possible, differences between pre- and post-intervention times, if so how do I test for these differences with the data I have? Thanks, I could really use some help. Kara ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Kara,
I think the way to think about your question is in terms of the ideal design, which would be a constant group of people followed over the 5 years and which had person level data on number of visits. As I understand it, you have a constant set of people, N=122. So, it's a repeated measures design. Your three variables are variations on number of visits. Given the mean visits numbers, I'd guess that most people have no visits and a very few have a lot. Possibly, the visits data fit a Poisson, negative binomial or a zero-inflated version of either of those. The number of clients with more than one visit is a proportion. The problem you have is that you need the DV variance-covariance matrix and you don't have it. If you assumed that the DVs are uncorrelated over time, the problem becomes a between groups analysis, which is partially doable in spss. The proportions can be done in crosstabs with weights to reconstruct the dataset. The mean visits still remain a problem for two reasons. The first is the distribution issue. No way to overcome that, I don't think. This is wildly incorrect but it's what people did before they could model Poisson distribution and you could do it and that is to assume that visits residuals are normally distributed. That brings up the second reason, which is that you don't have variances/SDs. Those I think you ought to be able to get without being entangled in HIPPA. Now, if you had those SDs AND you were willing to make a structured set of assumptions about covariances, you could create a data set and analyze it using the old MANOVA command, which I assume still exists but I haven't checked the reference. So, you do all this and what do you have? A bunch of results that depend on suppositions and assumptions that are either out-and-out incorrect (distributions, covariances) or just off-the-base (covariance magnitudes). The is HIPPA business and I wonder if the providing agency would be willing to provide de-identified data with persons identified by a consistent id number. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kara Sent: Friday, October 21, 2011 11:01 AM To: [hidden email] Subject: Not sure if SPSS can help: Testing differences with aggregate data. I am looking for some direction on how to proceed. We work with adult offenders and I have been given a project to determine if there is anything of significance with aggregate health data across years for 122 offenders (these are the same offenders each year). Since the data is health data, I can’t get individual data because of data privacy laws. I have several domains across these years, but I’ll just ask about one of them, because I am assuming I can use the same procedure for the other domains. I also have a lot of descriptive statistics: non- annualized means, standard deviations and medians, total months enrolled among the clients and annualized means for clients with one or more hospital visits and annualized means for all clients. I’m only providing the annualized means statistics because I’m assuming that statistic is the one that makes the most sense to analyze. For Hospital Visits by year, I have the following: Year 2004 2005 2006 2007 2008 Total # Clients 122 122 122 122 122 Mean # of visits (annualized) for all clients 3.4 3.67 3.72 2.06 2.08 # Clients w/ 1 or more Visit 54 58 67 52 53 Mean # of visits for clients w 1 or more visit (annualized) 5.17 5.55 5.45 4.32 3.79 The actual question(s). The intervention occurred after 2005. So 2004 and 2005 are pre-intervention years and post intervention is 2006 - 2008. What I am trying to figure out is were there differences between each of the years and, if possible, differences between pre- and post-intervention times, if so how do I test for these differences with the data I have? Thanks, I could really use some help. Kara ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Kara
Gene
Thanks for the reply. Months of begging the providing agency got us just these statistics. They will not give us a de-identified data. We tried everything. I wish they would this would have been easier. But I can get standard deviations. I also have medians if that allows for some nonparametric options. Let me simplify the data I sent previously a little. The standard deviations are for the mean number of hospital visits for those with one or more hospital visits, so that removes the large number of those with no hospital visits and increases the mean. But you are right, this is not normally distributed. So, if I only use the average number of hospital visits for those with one or more hospital visits this is the aggregate data.
Year N Ave # of Visits St Dev Median
2004 54 5.17 8.47 3.00
2005 58 5.55 7.57 3.00
2006 67 5.45 5.79 3.60
2007 52 4.32 5.48 2.10
2008 53 3.79 4.19 2.00
Now I have those st dev, but to do what you suggested is beyond my skill set, so I will need help. But with all the assumptions that will have to be made, it almost sounds like it would be wildly inappropriate and would not stand up to review. Is there any other options?
Kara
|
In reply to this post by Maguin, Eugene
On 21/10/2011 18:02, Gene Maguin wrote:
> I can't get individual data because of data > privacy laws I don't know your privacy laws but what if you propose - to work on the premises of the hospital and the data don't leave it ? - to separate data access and analysis ? = produce the syntax and analyse, but let the running do by a hospital person ? In some countries stats worker have to take a pledge to act according to the privacy law and then get access to individual-level data. Just my 2 centimes. Frank Thomas ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Kara
I appreciate your 2 centimes, Frank. And your suggestions are the most logical and practical. There is just no way I can get individual level data. It isn't possible, so going back to the data provider is not an option I can pursue. I was hoping there was a way using the overall means, st. dev or medians across the years, maybe through hand calculations. I even looked at doing non-parametric, Cochran's Q, computing the formula in a Excel but I need mean differences and, well obviously, I can't get that, or if I could just do post hoc computations. It may be it's not possible, that's why I threw this out to the experts on this listserve.
Thanks, though.
Kara
From: ftr <[hidden email]>To: [hidden email]Date: 10/25/2011 11:01 AMSubject: Re: Not sure if SPSS can help: Testing differences with aggregate data.Sent by: "SPSSX(r) Discussion" <[hidden email]>
On 21/10/2011 18:02, Gene Maguin wrote: |
Free forum by Nabble | Edit this page |