Hello Everybody,
I have a program that builds an error report. IIn addition to a bunch of descriptive variables (ID, Name, Student Type, Major) it is comprised of a series of Boolean variables with descriptive variable names. For example: Missing_Stu_Level Missing_Stu_Type Missing_Stu_Status ---------------------------------------------------------------------------------------------------------- 1 0 1 1 1 0 0 0 0 1 1 1 This program breaks out repeated subgroups from the dataset based on values on a series of error codes and appends them to an Excel file. The final Excel report looks like this: Data Cleaning Report ID Name Student_Type Student_Status Major Timecode Advisor --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- (Missing_Stu_Level) 1234 Joe Doe DP CN BIO FT Smith 2345 Jane Cane PA UF PSY HT White (Missing_Stu_Type) 3456 Will Do DP UT MUS FT Smith 4567 Ima Tired DP UT ART HT White Etc……… As you can see, records are grouped by error code, so if they have multiple error codes they might appear several times at different places in this file. I would like to compute a final variable that indicates which error codes a person has a value of 1 on. This is so the individual who is doing the data cleaning can see what other missing data needs to be fixed for any specific individual. This way they can fix all the errors for each person at once rather than have to call up the record more than one time due to duplicate appearances for more than one type of error. I want to create a single string variable, perhaps 200 characters wide, that will list all the relevant error variables. For example: (The Dataset From Which The Report Is Created) ID Missing_Stu_Level Missing_Stu_Type Missing_Stu_Status Multi_Errors --------------------------------------------------------------------------------------------------------------------------------------------------------------------- #### 1 0 1 Missing_Stu_Level, Missing_Stu_Status #### 1 1 0 Missing_Stu_Level, Missing_Stu_Type #### 0 0 0 #### 1 1 1 Missing_Stu_Level, Missing_Stu_Type, Missing_Stu_Status Any idea are welcome, including better ways of organizing such a report. I'm out of ideas for the moment. Thanks! Gary Moser, Research Associate Institutional Research Dominican University Tel: 415.482.2400 Fax: 415.257.1367 E-Mail: [hidden email] |
At 03:06 AM 8/20/2006, Hal 9000 wrote:
>I have a program that builds an error report. >II addition to a bunch of descriptive variables >(ID, Name, Student Type, Major) it is comprised >of a series of Boolean variables with >descriptive variable names. For example: > >Missing_Stu_Level Missing_Stu_Type Missing_Stu_Status >----------------------------------------------------- >1 0 1 >1 1 0 >0 0 0 >1 1 1 >This program breaks out repeated subgroups from >the dataset based on values on a series of error >codes and appends them to an Excel file. The >final Excel report looks like this: > > >Data Cleaning Report > >ID Name S_type S_Stat Major Timecode Advisor (Msg_Stu_Lvl) >------------------------------------------------------------ >1234 Joe Doe DP BIO FT Smith >2345 Jane Cane PA UF PSY HT White > >(Missing_Stu_Type) >3456 Will Do DP UT MUS FT Smith >4567 Ima Tired DP UT ART HT White >Etc > >As you can see, records are grouped by error >code, so if they have multiple error codes they >might appear several times at different places in this file. > >I would like to compute a final variable that >indicates which error codes a person has a value >of 1 on. This is [one] can fix all the errors >for each person at once. I want to create a >single string variable, perhaps 200 characters >wide, that will list all the relevant error [conditions]. I have, by the way, shortened your variable names and data lines. For reference, the software that distributes postings breaks all lines at 72 characters; posted data and code lines are much more readable if kept within that limit. What you want ought to be very easy. Try this (not tested): STRING ALL_ERRS(A200). DO REPEAT ERR_VBL = Missing_Stu_Level Missing_Stu_Type Missing_Stu_Status /ERR_MSG = 'Msg level' 'Msg Type' 'Msg Status'. . DO IF ERR_VBL EQ 1. . DO IF ALL_ERRS EQ ' '. . COMPUTE ALL_ERRS = LTRIM(ERR_MSG). . ELSE. . COMPUTE ALL_ERRS = CONCAT(RTRIM(ALL_ERRS),', ', LTRIM(ERR_MSG)). . END. . END IF. END REPEAT. SELECT IF ALL_ERRS NE ' '. I'd keep the error messages short, shorter than your variable names, so the error-message string won't get unwieldy. Good luck, Richard >For example: > > > >(The Dataset From Which The Report Is Created) > >ID Missing_Stu_Level Missing_Stu_Type Missing_Stu_Status >Multi_Errors > >---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >#### 1 0 >1 Missing_Stu_Level, Missing_Stu_Status > >#### 1 1 >0 Missing_Stu_Level, Missing_Stu_Type > >#### 0 0 >0 > >#### 1 1 >1 Missing_Stu_Level, Missing_Stu_Type, >Missing_Stu_Status > > > >Any idea are welcome, including better ways of >organizing such a report. I'm >out of ideas for the moment. Thanks! > > > >Gary Moser, Research Associate > >Institutional Research > >Dominican University > >Tel: 415.482.2400 > >Fax: 415.257.1367 > >E-Mail: [hidden email] |
>
> Thanks to everybody--all the input I receive jogged my mind, and I came > up with something that does what I wanted (listed below). I should have > mentioned that I prefered to leave the values in the error variables as 1's > and 0's because a macro that comes later is set up to expect those values, > therefore the most efficient solution for me leaves the data as it is. > *************************************************************************. > *Key: TRANS = Temporary Transfer Variable T_ALL_ERRS = Temporary All-Error Variable ALL_ERRS = Final Variable Listing All Errors. > string TRANS (a40) ALL_ERRS (a200) T_ALL_ERRS (a200). > exe. > > define OLAP (OL = !charend ('/')) > !do !S !in (!OL) > compute T_ALL_ERRS = ALL_ERRS. > exe. > do if (!S = 1). > compute TRANS = !quote(!S). > compute ALL_ERRS = concat(rtrim(TRANS), ' / ', rtrim(T_ALL_ERRS)). > end if. > exe. > !doend > !enddefine. > > OLAP OL = Missing_Stu_Level Missing_Ethnicity Missing_SSN Missing_Religion > Missing_Citizenship Missing_Handicap Missing_Phone1 Missing_City > Missing_State Missing_Email Missing_Visa_Code Missing_Residence > Missing_Stu_Status > Missing_Timecode Missing_Maj1 Citizen_Eth_Conflict Citizen_Visa_Conflict > /. > |
Free forum by Nabble | Edit this page |