|
Hello all, I’m trying to run an insurance claims risk adjustment
program. But by my calculations it could take several thousand hours to run. The program takes the over 16,000 ICD9 codes and puts
diagnoses into groups. My data has 8 diagnoses per claim. And I have almost 3
million claims. My machine is a dual core Pentium, using the python plugin,
2 gigs of ram, windows xp. Does it make sense that it could take so long? Someone asked
me if spss runs from memory, compared to sas which runs from the disk. Could
this be part of the issue? Bottom line. Does this type of analysis sound possible with
SPSS. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 |
|
Matt,
I can't comment on spss vs sas or on your computer. It seems to me, though, that your dataset+required computations is at a place were the computational algorithm matters, maybe a great deal. I'm pretty sure there are others on the list that have experience with big datasets and can comment better than I can. I wondering if you have the most efficient algorithm for the required operations. Have you tested alternatives and, if so, was there enough difference to matter? And, if you'd care to, I'd be interested to hear a description of your computational algorithm. Gene Maguin Hello all, I'm trying to run an insurance claims risk adjustment program. But by my calculations it could take several thousand hours to run. The program takes the over 16,000 ICD9 codes and puts diagnoses into groups. My data has 8 diagnoses per claim. And I have almost 3 million claims. My machine is a dual core Pentium, using the python plugin, 2 gigs of ram, windows xp. Does it make sense that it could take so long? Someone asked me if spss runs from memory, compared to sas which runs from the disk. Could this be part of the issue? Bottom line. Does this type of analysis sound possible with SPSS. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks gene,
My syntax looks like this. Create the 99 numeric variables that are the diagnostic groups. Run do repeat: Do repeat diagnosis = diagnosis1 to diagnosis8. THE 16,000+ IF STATEMENTS TO GROUP THE 8 DIAGNOSES BASED ON ICD9 CODES. IF DIAGNOSIS = XXXXX THEN GROUPx = 1. End repeat. If the diagnosis codes are missing after these if statements then make GROUPx equal to zero. If more efficient code could speed this up that'd be great! Maybe an alternative to the 16,000+ if statements? Is there a way to put this in another file that the syntax draws from? The program was originally written in sas and used a macro to bring in these recodes. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Monday, March 29, 2010 8:32 AM To: [hidden email] Subject: Re: insurance claims risk adjustment takes forever Matt, I can't comment on spss vs sas or on your computer. It seems to me, though, that your dataset+required computations is at a place were the computational algorithm matters, maybe a great deal. I'm pretty sure there are others on the list that have experience with big datasets and can comment better than I can. I wondering if you have the most efficient algorithm for the required operations. Have you tested alternatives and, if so, was there enough difference to matter? And, if you'd care to, I'd be interested to hear a description of your computational algorithm. Gene Maguin Hello all, I'm trying to run an insurance claims risk adjustment program. But by my calculations it could take several thousand hours to run. The program takes the over 16,000 ICD9 codes and puts diagnoses into groups. My data has 8 diagnoses per claim. And I have almost 3 million claims. My machine is a dual core Pentium, using the python plugin, 2 gigs of ram, windows xp. Does it make sense that it could take so long? Someone asked me if spss runs from memory, compared to sas which runs from the disk. Could this be part of the issue? Bottom line. Does this type of analysis sound possible with SPSS. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
|
In reply to this post by mpirritano
At 11:56 AM 3/29/2010, Pirritano, Matthew wrote:
My syntax looks like this. >Create the 99 numeric variables that are the diagnostic groups. >Run do repeat: > >Do repeat diagnosis = diagnosis1 to diagnosis8. > THE 16,000+ IF STATEMENTS TO GROUP THE 8 DIAGNOSES BASED ON ICD9 >CODES. IF DIAGNOSIS = XXXXX THEN GROUPx = 1. >End repeat. > >If more efficient code could speed this up that'd be great! Maybe an >alternative to the 16,000+ if statements? RECODE is very fast, far faster than an IF chain. I've no idea whether you'd need 16,000 RECODE clauses, but I doubt it; surely, you have ranges of ICD9 codes that all map into the same diagnostic category. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by mpirritano
Matthew,
In keeping with Richard's suggestion, it might help to write out the recode statements for each diagnosis, rather than performing the do repeat. My experience with ICD9 codes is that you need one recode statement for each broader diagnostic category. Once you have written the code for one diagnosis, using a global replace in a text editor will enable you to write the code for the second diagnosis and so forth. I may be mistaken, but I believe that reading the recode commands without going through the do repeat loop will save you time. HTH, Steve Brand www.StatisticsDoc.com On Mon, Mar 29, 2010 at 1:13 PM, Richard Ristow wrote: > At 11:56 AM 3/29/2010, Pirritano, Matthew wrote: > > My syntax looks like this. > >> Create the 99 numeric variables that are the diagnostic groups. >> Run do repeat: >> >> Do repeat diagnosis = diagnosis1 to diagnosis8. >> THE 16,000+ IF STATEMENTS TO GROUP THE 8 DIAGNOSES BASED ON >> ICD9 >> CODES. IF DIAGNOSIS = XXXXX THEN GROUPx = 1. >> End repeat. >> >> If more efficient code could speed this up that'd be great! Maybe an >> alternative to the 16,000+ if statements? > > RECODE is very fast, far faster than an IF chain. I've no idea > whether you'd need 16,000 RECODE clauses, but I doubt it; surely, you > have ranges of ICD9 codes that all map into the same diagnostic > category. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
