|
I am using SPSS 15 and I am looking for a (semi)automatic procedure that
should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
|
At 05:40 AM 4/21/2007, Luca Meyer wrote:
>I am using SPSS 15 and I am looking for a (semi)automatic procedure >that should allow me, for a given variable, to classify all less >frequent occurences into an "other" occurence. This may be a place to start. It's a macro, originally from Raynald Levesque but I've added a lot of bells and whistles, that reproduces AUTORECODE, but generates the RECODE and VALUE LABELS. It uses AGGREGATE. As written, it assigns all one-occurrence values to '999 other'. I haven't tested it for a long time, I think since its last-modified date of 18 Oct 2004, and I don't think I had it in perfect form even then. But it's probably a place to start. >Now, what I am looking for is a procedure that: > >(a) identifies the main - for instance 5, but such a parameter should >be set in the syntax - most frequent answers (let's say BRAND 19, >BRAND 5, BRAND 12, BRAND 2 and BRAND 8) That'd take a little extra code. You would, 1.) AGGREGATE, to count occurrences of each answer (BRAND19, BRAND5, ...) 2.) Sort by frequency. Mark the first 5 (or however many) records to be coded individually, and all others to be coded as 'other'. 3.) Re-sort, by alphabetical order or whatever final order you like, except keeping the values to be individually coded at the head of the list. 4.) Using code like that in the macro, write the RECODE and VALUE LABEL statements to an external file. See how far this gets you. I don't pretend it's a complete solution, so feel free to post with questions. |
|
At 04:02 AM 4/22/2007, Luca Meyer wrote:
>Thanks Richard, >I will work on it. Would you mind to give me a URL link to the macro? My gosh! I wrote all that, and never gave you the macro text. I must have been sleepier than I thought. (I'm sending this to the list, as well, under the principle of generally sharing with the whole list.) The macro isn't on-line. Here's the text, in this note. If it isn't easy to extract from the note, let me know and I'll send it as an attachment - but the List won't take that. Good luck! Richard In this version, . Fixed bug that got line of first occurrence wrong . Parameter Presort=NO to run AGGREGATE without sorting the data first (and with out PRESORT subcommand). More reliable, and considerably faster, on some large files. (In fact, Presort=NO should probably be the default, or the only option; see posting on "SORT CASES algorithm", with Jon Peck's advice.) . Improved layout of RECODE specifications in output. . Shorten lines; I hope they don't wrap so much in the E-message. /* Macro !AutoRcd, version 18 Oct 2004 */ /* */ /* Writes RECODE and VALUE LABEL syntax to AUTORECODE one */ /* character string variable into a numeric variable. */ /* */ /* Parameters (all keyword): Default */ /* ------------------------ ------- */ /* SrceVar Variable to AUTORECODE <required> */ /* (must be character-string) */ /* NewVar Variable to AUTORECODE into <required> */ /* RunName Name of output code file AutoRcd */ /* OneTime If not "NO", all once-occurring NO */ /* values are recoded 999, "Other". */ /* Restore If "NO", or not "YES", original YES */ /* file is not restored. */ /* ("NO" saves time & disk space */ /* for large files that can be */ /* easily be reloaded.) */ /* Presort If "NO", or not "YES", file is YES */ /* not sorted before AGGREGATE. */ /* (May be faster and more reliable */ /* on large files with many more */ /* cases than separate values.) */ /* */ /* Path Directory for output file(*) c:\Tmp */ /* Scratch Directory for scratch files(*) c:\Tmp */ /* (*) No trailing "\" */ /* */ /* Output Contains */ /* ------ -------- */ /* <Path>\<RunName>.GEN RECODE !SrceVar into !Newvar, */ /* and VALUE LABELS for !Newvar */ /* */ /* Purpose: AUTORECODE, allowing post-editing */ /* ------- */ /* Details of service */ /* ------------------ */ /* - Like AUTORECODE, recodes values of variable into */ /* integers starting with 1, in alphabetic order, except */ /* as below */ /* - On each RECODE line, writes number of occurrences and */ /* first occurrence of the value, in comments */ /* - Recodes blank value to 0. */ /* - If parameter "OneTime" is YES, recodes all once- */ /* occurring values to 999, with value label "Other". */ /* */ /* Side effects */ /* ------------ */ /* 1. Creates and leaves on disk, */ /* <Scratch>\AutoRecd-data.SAV (unless RESTORE=NO) */ /* <Scratch>\AutoRecd-summary.SAV */ /* */ /* Known deficiencies */ /* ------------------ */ /* A. Blank value is not handled properly if any value */ /* sorts earlier than blank */ /* B. If variable has 999 or more values, value label */ /* for the 999th is "Other", whether or not parameter */ /* parameter OneTime=YES. */ /* C. Error, if directory specified with trailing "\" */ /* D. First occurrences and occurrence numbers in output */ /* comments are 7 digits; may overflow for files of */ /* more than 10,000,000 records. */ /* E. Overwrites an existing output file without comment */ /* F. Does not run fast, even on small files. (SORT and */ /* AGGREGATE are the time-consuming steps.) */ /* G. Output RECODE syntax is well formatted, but code */ /* to generate it is long and complicated. */ /* H. Has not undergone exhaustive testing */ /* */ /* Acknowledgement */ /* --------------- */ /* Adapted from macro by Raynald Levesque, SPSSX-L */ /* "Syntax reproducing AUTORECODE", 11 Jun 2003 19:59:43 */ /* */ /* . . . . . . . . . . . . . . . . . . . . . . . . . . . . */ /* Adapted by Richard Ristow */ /* - - - - - - - Start macro definition - - - - - - - */ DEFINE !AutoRec ( RunName=!DEFAULT(AutoRcd) !TOKENS(1) /SrceVar=!TOKENS(1) /NewVar=!TOKENS(1) /Path =!DEFAULT('c:\Tmp\') !TOKENS(1) /Scratch=!DEFAULT('c:\Tmp\') !TOKENS(1) /OneTime=!DEFAULT(NO) !TOKENS(1) /Restore=!DEFAULT(YES) !TOKENS(1) /Presort=!DEFAULT(YES) !TOKENS(1)). !LET !SPSSout = !QUOTE(!CONCAT(!UNQUOTE(!Path),'\', !RunName,'.GEN')) !LET !OrigSAV = !QUOTE(!CONCAT(!UNQUOTE(!Scratch),'\', 'AutoRecd-data.SAV')) !LET !SmrySAV = !QUOTE(!CONCAT(!UNQUOTE(!Scratch),'\', 'AutoRecd-summary.SAV')) * Macro variables for files superseded in this version . !LET !RecdSPS = !QUOTE(!CONCAT(!UNQUOTE(!Path),'\', !RunName,'-Recode.SPS')) !LET !LablSPS = !QUOTE(!CONCAT(!UNQUOTE(!Path),'\', !RunName,'-Value lbl.SPS')) * Save original file, if it's to be restored later . !IF (!UPCASE(!Restore) !EQ YES) !THEN - SAVE OUTFILE= !OrigSAV. !IFEND * Develop the data for Recode and Value Labels. . COMPUTE FrstOccr = $CASENUM. !IF (!UPCASE(!Presort) !EQ YES) !THEN - SORT CASES BY !SrceVar. - AGGREGATE OUTFILE=* /PRESORTED/BREAK=!SrceVar /Num_Occr = N /FrstOccr = MIN(FrstOccr). !ELSE - AGGREGATE OUTFILE=* /BREAK=!SrceVar /Num_Occr = N /FrstOccr = MIN(FrstOccr). !IFEND * Compute recode target values. Assign values in order but:. * 1. If first value is blank, assign it 0, not 1. . * 2. If parameter OneTime is set (not "NO"), assign once- . * occurring values code 999, "Other". . NUMERIC #Cur_Tgt (F3). DO IF ($CASENUM = 1). - DO IF (!SrceVar = ' '). . COMPUTE #Cur_Tgt = 0. . COMPUTE RecNb = 0. !IF (!UPCASE(!OneTime) !NE NO) !THEN /* Special-case values with one occurrence */ - ELSE IF (Num_Occr = 1). . COMPUTE #Cur_Tgt = 0. . COMPUTE RecNb = 999. !IFEND - ELSE. . COMPUTE #Cur_Tgt = 1. . COMPUTE RecNb = 1. - END IF. ELSE. !IF (!UPCASE(!OneTime) !NE NO) !THEN /* Special-case values with one occurrence */ - DO IF (Num_Occr = 1). . COMPUTE RecNb = 999. - ELSE. . COMPUTE #Cur_Tgt = #Cur_Tgt + 1. . COMPUTE RecNb = #Cur_Tgt. - END IF. !ELSE /* Regular processing for values with one occurrence */ . COMPUTE #Cur_Tgt = #Cur_Tgt + 1. . COMPUTE RecNb = #Cur_Tgt. !IFEND END IF. * The following was from the Levesque version; the saved . * file was never used. . /*-- SAVE OUTFILE = !SmrySAV. /*-*/ * This, added 06 Oct 2004, writes a summary with the . * records duplicated, for the two output passes needed . LOOP PASS=1 TO 2. . XSAVE OUTFILE = !SmrySAV. END LOOP. EXECUTE /* REQUIRED 06 Oct 2004 */. * Re-load the file with duplicated records, and sort . * to separate the two "passes" for output . GET FILE=!SmrySAV. SORT CASES BY PASS !SrceVar. MATCH FILES FILE=* /BY=PASS /FIRST=first /LAST=last. DO IF PASS=1. * Write syntax to recode values. . NUMERIC #N_OTHR (F3). . IF (RecNb = 999) #N_OTHR = #N_OTHR + 1. . DO IF first. . WRITE OUTFILE=!SPSSout/ 'RECODE ' !QUOTE(!SrceVar). . END IF. /* All records: write individual RECODE specs. */ . DO IF LENGTH(!SrceVar) LE 35. * One-line RECODE specifications . . DO IF (SUBSTR(!SrceVar,LENGTH(!SrceVar),1)= ' '). . COMPUTE !SrceVar = CONCAT(RTRIM(!SrceVar),'"'). . IF (!SrceVar = '"') !SrceVar = ' "'. . WRITE OUTFILE=!SPSSout/ ' ("'!SrceVar' ='RecNb(F4) ') /* N='Num_Occr(F7)' 1st='FrstOccr(F7)' */'. . ELSE. . WRITE OUTFILE=!SPSSout/ ' ("'!SrceVar'" ='RecNb(F4) ') /* N='Num_Occr(F7)' 1st='FrstOccr(F7)' */'. . END IF. . ELSE. * Two-line RECODE specifications . . DO IF (LENGTH(RTRIM(!SrceVar)) LE 33). . COMPUTE !SrceVar = CONCAT(RTRIM(!SrceVar),'"'). . IF (!SrceVar = '"') !SrceVar = ' "'. . WRITE OUTFILE=!SPSSout/ ' ("'!SrceVar '=' 40 RecNb(F4) ') /* N='Num_Occr(F7)' 1st='FrstOccr(F7)' */'. . ELSE IF (SUBSTR(!SrceVar,LENGTH(!SrceVar),1)= ' '). . COMPUTE !SrceVar = CONCAT(RTRIM(!SrceVar),'"'). . WRITE OUTFILE=!SPSSout/ ' ("'!SrceVar. . WRITE OUTFILE=!SPSSout/ '=' 40 RecNb(F4) ') /* N='Num_Occr(F7)' 1st='FrstOccr(F7)' */'. . ELSE. . WRITE OUTFILE=!SPSSout/ ' ("'!SrceVar'"'. . WRITE OUTFILE=!SPSSout/ '=' 40 RecNb(F4) ') /* N='Num_Occr(F7)' 1st='FrstOccr(F7)' */'. . END IF. . END IF. . DO IF last. . WRITE OUTFILE=!SPSSout/ ' INTO '!QUOTE(!NewVar)'.'. . END IF. ELSE IF PASS = 2. * Write syntax for value labels. . DO IF first. . WRITE OUTFILE=!SPSSout/ ' '/ 'ADD VALUE LABELS '!QUOTE(!NewVar). . END IF. . DO IF NOT last. - DO IF RecNb NE 999. . WRITE OUTFILE=!SPSSout/ ' 'RecNb(F4)' "'!SrceVar'"'. - END IF. . ELSE. - DO IF #N_OTHR = 0. . WRITE OUTFILE=!SPSSout/ ' 'RecNb(F4)' "'!SrceVar'".'. - ELSE. - DO IF RecNb NE 999. . WRITE OUTFILE=!SPSSout/ ' 'RecNb(F4)' "'!SrceVar'"'. - END IF. . WRITE OUTFILE=!SPSSout/ ' 999 "Other".'. - END IF. . END IF. ELSE. . WRITE OUTFILE=!SPSSout/ ' BUG! Pass = ' PASS. END IF. EXECUTE. * Restore original file, if it's saved for that purpose . !IF (!UPCASE(!Restore) !EQ YES) !THEN - GET FILE= !OrigSAV . !IFEND !ENDDEFINE. /* - - - - - - - End macro definition - - - - - - - */ |
|
In reply to this post by Luca Meyer
This is pretty easy to do with programmability. Collapsing small cell counts is straightforward, and there is already a function genCategoryList in the spssaux2 supplementary module that allows you to create a category list that is presorted but has special categories placed at the end.
I'll work something up for this later today (Sunday) or tomorrow to add to the SpecialTransforms module. -Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion on behalf of Luca Meyer Sent: Sat 4/21/2007 4:40 AM To: [hidden email] Subject: [SPSSX-L] Occurences classification I am using SPSS 15 and I am looking for a (semi)automatic procedure that should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
|
In reply to this post by Luca Meyer
As promised (almost), I have added a function to collapse infrequent variable values to the specialtransforms module available on SPSS Developer Central (www.spss.com/devcentral).
The function CollapseInfrequentValues collapses values whose counts or occurrence percentage is below a specified value into a specified new value. It deletes any value labels for the collapsed categories and creates a new one for the collapse value. It can also define an SPSS macro listing the values in ascending or descending order of frequency with the collapsed value at the end. User and system missing values are ignored. Here is an example: begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. In this example, values of the variable educcopy occurring in fewer than 20% of the cases are mapped to the value 99 which is given the value label "low count categories". The collapse is in place, so you may want to copy the variable(s) first using COMPUTE and APPLY DICTIONARY. If you are using Ctables, you might want the macro to control the order. specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories", macronames="!educMacro", order="D") Then you could do CTABLES /TABLE educcopy /CATEGORIES VARIABLE=educcopy [!educMacro]. That would show the values in descending order of frequency with the "other" category at the end. It can process more than one variable at a time and has a few other useful extras. If the threshold is < 1, it is interpreted as a fraction; if it is >=1, it is interpreted as a count. I have only tested this on 15.0.1, but it should also work on 14.0.1 and later. It requires the Python programmability plug-in and supplementary modules spssaux, spssdata, and namedtuple from Developer Central. I hope you find this useful. Regards, Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer Sent: Saturday, April 21, 2007 4:40 AM To: [hidden email] Subject: [SPSSX-L] Occurences classification I am using SPSS 15 and I am looking for a (semi)automatic procedure that should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
|
Jon, I ran the following on a dataset of about 2500 cases, variable age
(scale) begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("age", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. I say "ran" but it is really still running. The process line says Running Program>... Like the energizer bunny, it is still going? W -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Tuesday, April 24, 2007 10:03 AM To: [hidden email] Subject: Re: Occurrences classification As promised (almost), I have added a function to collapse infrequent variable values to the specialtransforms module available on SPSS Developer Central (www.spss.com/devcentral). The function CollapseInfrequentValues collapses values whose counts or occurrence percentage is below a specified value into a specified new value. It deletes any value labels for the collapsed categories and creates a new one for the collapse value. It can also define an SPSS macro listing the values in ascending or descending order of frequency with the collapsed value at the end. User and system missing values are ignored. Here is an example: begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. In this example, values of the variable educcopy occurring in fewer than 20% of the cases are mapped to the value 99 which is given the value label "low count categories". The collapse is in place, so you may want to copy the variable(s) first using COMPUTE and APPLY DICTIONARY. If you are using Ctables, you might want the macro to control the order. specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories", macronames="!educMacro", order="D") Then you could do CTABLES /TABLE educcopy /CATEGORIES VARIABLE=educcopy [!educMacro]. That would show the values in descending order of frequency with the "other" category at the end. It can process more than one variable at a time and has a few other useful extras. If the threshold is < 1, it is interpreted as a fraction; if it is >=1, it is interpreted as a count. I have only tested this on 15.0.1, but it should also work on 14.0.1 and later. It requires the Python programmability plug-in and supplementary modules spssaux, spssdata, and namedtuple from Developer Central. I hope you find this useful. Regards, Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer Sent: Saturday, April 21, 2007 4:40 AM To: [hidden email] Subject: [SPSSX-L] Occurences classification I am using SPSS 15 and I am looking for a (semi)automatic procedure that should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
|
It's hard to tell from the formatting, but make sure that the
end program. is on a line by itself. The email probably mangled the example program. It runs quite quickly, so I think the problem is that it hasn't really started, because it hasn't seen the end program yet. -----Original Message----- From: Will Bailey [Statman] [mailto:[hidden email]] Sent: Tuesday, April 24, 2007 1:57 PM To: Peck, Jon; [hidden email] Subject: RE: Occurrences classification Jon, I ran the following on a dataset of about 2500 cases, variable age (scale) begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("age", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. I say "ran" but it is really still running. The process line says Running Program>... Like the energizer bunny, it is still going? W -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Tuesday, April 24, 2007 10:03 AM To: [hidden email] Subject: Re: Occurrences classification As promised (almost), I have added a function to collapse infrequent variable values to the specialtransforms module available on SPSS Developer Central (www.spss.com/devcentral). The function CollapseInfrequentValues collapses values whose counts or occurrence percentage is below a specified value into a specified new value. It deletes any value labels for the collapsed categories and creates a new one for the collapse value. It can also define an SPSS macro listing the values in ascending or descending order of frequency with the collapsed value at the end. User and system missing values are ignored. Here is an example: begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. In this example, values of the variable educcopy occurring in fewer than 20% of the cases are mapped to the value 99 which is given the value label "low count categories". The collapse is in place, so you may want to copy the variable(s) first using COMPUTE and APPLY DICTIONARY. If you are using Ctables, you might want the macro to control the order. specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories", macronames="!educMacro", order="D") Then you could do CTABLES /TABLE educcopy /CATEGORIES VARIABLE=educcopy [!educMacro]. That would show the values in descending order of frequency with the "other" category at the end. It can process more than one variable at a time and has a few other useful extras. If the threshold is < 1, it is interpreted as a fraction; if it is >=1, it is interpreted as a count. I have only tested this on 15.0.1, but it should also work on 14.0.1 and later. It requires the Python programmability plug-in and supplementary modules spssaux, spssdata, and namedtuple from Developer Central. I hope you find this useful. Regards, Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer Sent: Saturday, April 21, 2007 4:40 AM To: [hidden email] Subject: [SPSSX-L] Occurences classification I am using SPSS 15 and I am looking for a (semi)automatic procedure that should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
|
Simple enough - My oversight, thought it was
Tks, W -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Tuesday, April 24, 2007 3:02 PM To: [hidden email] Subject: Re: Occurrences classification It's hard to tell from the formatting, but make sure that the end program. is on a line by itself. The email probably mangled the example program. It runs quite quickly, so I think the problem is that it hasn't really started, because it hasn't seen the end program yet. -----Original Message----- From: Will Bailey [Statman] [mailto:[hidden email]] Sent: Tuesday, April 24, 2007 1:57 PM To: Peck, Jon; [hidden email] Subject: RE: Occurrences classification Jon, I ran the following on a dataset of about 2500 cases, variable age (scale) begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("age", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. I say "ran" but it is really still running. The process line says Running Program>... Like the energizer bunny, it is still going? W -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Tuesday, April 24, 2007 10:03 AM To: [hidden email] Subject: Re: Occurrences classification As promised (almost), I have added a function to collapse infrequent variable values to the specialtransforms module available on SPSS Developer Central (www.spss.com/devcentral). The function CollapseInfrequentValues collapses values whose counts or occurrence percentage is below a specified value into a specified new value. It deletes any value labels for the collapsed categories and creates a new one for the collapse value. It can also define an SPSS macro listing the values in ascending or descending order of frequency with the collapsed value at the end. User and system missing values are ignored. Here is an example: begin program. import specialtransforms specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories") end program. In this example, values of the variable educcopy occurring in fewer than 20% of the cases are mapped to the value 99 which is given the value label "low count categories". The collapse is in place, so you may want to copy the variable(s) first using COMPUTE and APPLY DICTIONARY. If you are using Ctables, you might want the macro to control the order. specialtransforms.CollapseInfrequentValues("educcopy", threshold=.20, collapsevalue=99, otherlabel="low count categories", macronames="!educMacro", order="D") Then you could do CTABLES /TABLE educcopy /CATEGORIES VARIABLE=educcopy [!educMacro]. That would show the values in descending order of frequency with the "other" category at the end. It can process more than one variable at a time and has a few other useful extras. If the threshold is < 1, it is interpreted as a fraction; if it is >=1, it is interpreted as a count. I have only tested this on 15.0.1, but it should also work on 14.0.1 and later. It requires the Python programmability plug-in and supplementary modules spssaux, spssdata, and namedtuple from Developer Central. I hope you find this useful. Regards, Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer Sent: Saturday, April 21, 2007 4:40 AM To: [hidden email] Subject: [SPSSX-L] Occurences classification I am using SPSS 15 and I am looking for a (semi)automatic procedure that should allow me, for a given variable, to classify all less frequent occurences into an "other" occurence. For instance, assume a variable deriving from a question such as "What car do you own?" with a list of some 25 possible brands plus a do not know option: BRAND 1 BRAND 2 BRAND 3 ... BRAND 22 BRAND 23 BRAND 24 BRAND 25 DO NOT KNOW Now, what I am looking for is a procedure that: (a) identifies the main - for instance 5, but such a parameter should be set in the syntax - most frequent answers (let's say BRAND 19, BRAND 5, BRAND 12, BRAND 2 and BRAND 8) (b) riclassifies all other occurences into a generic OTHER (c) reports as such DO NOT KNOW and eventually other ausiliary answers (such ad I DO NOT OWN A CAR, ecc) (d) mantains the order in a CTABLES command (letting last the OTHER and ausiliary categories) In the example a CTABLES command with descending count order after the classification should yeld: BRAND 19 BRAND 5 BRAND 12 BRAND 2 BRAND 8 OTHER DO NOT KNOW As any of you already developed something similar? I am doing it manually for each variable and is getting a burden.... Thanks, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
| Free forum by Nabble | Edit this page |
