|
I have a large claims dataset and need help preparing it for analysis. For
those of you out there who are unfamiliar with the structure of claims datasets, each medical claim generates it's own row of data, regardless if it is the same person or not. So, I have many subjects who have mutltiple rows of data. I would like all data for each subject to be restricted to one row of data. For the numeric variables, I could easily just use the 'Data Aggregate' function and sum all the values. However, most of my variables are strings, and I need to concatenate all their values into one cell. I don't think you can do this within the 'Data Aggregate' function, but I may be wrong. If this can be done in 'Data Aggregate' please advise. If not, does anyone know how I can clean up this dataset according to how I just outlined above? Thanks in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I have done tons of this type of analysis.
My advice is to leave the dataset as is and use a combination of aggregate to separate file, aggregate to same file, identify duplicates, to create separate analytic files for difference analyses you'll want to do. These data manipulations can get kind of complicated, but you get the hang of it after a while. The key is to write very well commented and complete syntax. I've gotten to the point where I spend most of my time writing the syntax so that it is a self-contained program that can do common data manipulation and analysis tasks. This way all of the work is up front, and new versions of data that you receive can be run through previous programs you've written with little new work for yourself. Good luck! matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joe Sent: Monday, August 03, 2009 2:00 PM To: [hidden email] Subject: Claims Data Management I have a large claims dataset and need help preparing it for analysis. For those of you out there who are unfamiliar with the structure of claims datasets, each medical claim generates it's own row of data, regardless if it is the same person or not. So, I have many subjects who have mutltiple rows of data. I would like all data for each subject to be restricted to one row of data. For the numeric variables, I could easily just use the 'Data Aggregate' function and sum all the values. However, most of my variables are strings, and I need to concatenate all their values into one cell. I don't think you can do this within the 'Data Aggregate' function, but I may be wrong. If this can be done in 'Data Aggregate' please advise. If not, does anyone know how I can clean up this dataset according to how I just outlined above? Thanks in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Joe-256
Hi Joe,
The simple answer is that you can "restructure" the data using the included function. But, as Matthew Pirritano has indicated, you could probably do better. Restructuring will yield a hugely messy dataset which will likely prove to be more difficult to work with. I do work with claims data on a daily basis, and would not go the route you are considering. HTH Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joe Sent: Monday, August 03, 2009 5:00 PM To: [hidden email] Subject: Claims Data Management I have a large claims dataset and need help preparing it for analysis. For those of you out there who are unfamiliar with the structure of claims datasets, each medical claim generates it's own row of data, regardless if it is the same person or not. So, I have many subjects who have mutltiple rows of data. I would like all data for each subject to be restricted to one row of data. For the numeric variables, I could easily just use the 'Data Aggregate' function and sum all the values. However, most of my variables are strings, and I need to concatenate all their values into one cell. I don't think you can do this within the 'Data Aggregate' function, but I may be wrong. If this can be done in 'Data Aggregate' please advise. If not, does anyone know how I can clean up this dataset according to how I just outlined above? Thanks in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
