Guys:
The syntax provided works fine, but I came across a problem, I have some of the variables with a length of 1, and another of 2. For example, V1 = 1, V2 = 2, but V17 (for example) is V17 = 22. And the code stops working. The code so far is working only for variables of fixed length. Any ideas. Hi: I have a DataBase of 2500 variables, all are numbers. V1 to V2500, and I need to create the a 200 character variable that concatenates such values like this.. If V1 = 1 and V2 = 7 and V3 = 9 I need a new variable like this. Vnew = 179 and so on. Can I create 200 character variable like this, does SPSS has a limitation on the length of a variable?. Regards, |
(Continues thread begun under heading "How to create a SUPER
Variable".) At 04:32 PM 1/19/2007, Eugenio Grant wrote: >The syntax provided works fine, but I came across a problem, I have >some of the variables with a length of 1, and another of 2. For >example, V1=1, V2=2, but V17 (for example) is V17=22. And the code >stops working. > >The code so far is working only for variables of fixed length. ALL numeric variables (and don't forget this) are of the same fixed length. Some have different ranges that they span, which means they'll take different numbers of digits to display properly, but they're all the same. That's important here, because it means the transformation from numeric to string is basically arbitrary: there's no way that's *a priori* right. >Any ideas. Yes. To start with, rethink the project: how do you WANT the string (the 'super' variable) to be defined? Do you want to allocate two spaces for each variable, to accommodate two-digit values? Here's code (untested) to do that; it uses leading zeroes for values less than 10. Changes are, using format N2 instead of F1 in the 'string' function; and somewhat awkward calculations for SUBSTR arguments, in the code that uses SUBSTR. (Definitely an argument for CONCAT rather than SUBSTR logic.) do repeat #T = v1 to v3. . compute SUPER = concat(rtrim(SUPER), rtrim(string(#T, N2))). end repeat. OR do repeat VARIABLE = v1 to v3 /POSITION = 1 to 3. . COMPUTE #CharPos = POSITION*2 - 1. . COMPUTE SUBSTR(SUPER2,#CharPos,2) = STRING(VARIABLE,F1). end repeat. |
At 06:28 PM 1/19/2007, Eugenio Grant wrote, off-list:
>The idea is to take a big piece of information of every record of my >database, and then be able to aggregate, So this whole thing is to be a BREAK variable for AGGREGATE? You know that you can give a variable list, a large number of variables (I don't know how many) in BREAK. Try that, before something like this combination. >In order to make sure that there are no duplicates, records that hold >big chunks of the same info might be suspicious to me. (if I aggregate >by the Super Variable there should not be equal cases, hence n = 1 in >all cases) Yes, I see what you're getting at. I hope you set up your AGGREGATE so it gives you *which* records, by respondent ID or whatever you have, participate in a putative 'duplicate'. >Because a questionnaire in which 2 respondents answer identically in >some parts might be pretty strange... I can see that, too. >My variables are all numeric but have different width, for example v10 >width is 2, while v77 has a width of 4. Remember, they're *NOT* of different width; they're of a different range of values. To do what you say you want to, all I can think of is to allow enough character spaces per variable, to hold the maximum number of digits needed. However, better not to fuss with the catenation at all. That is, instead of do repeat #T = v1 to v3. . compute SUPER = concat(rtrim(SUPER), rtrim(string(#T, N2))). end repeat. AGGREGATE OUTFILE=* /BREAK=SUPER /N_INST 'Number of instances matching vbl grp' = N. use AGGREGATE OUTFILE=* /BREAK=v1 to v3 /N_INST 'Number of instances matching vbl grp' = N. >If I can take big pieces of my database for every record I might be >able to find duplicate records. If you really want to find exact duplicates, you could try the above, naming all variables in your data. I don't know when AGGREGATE will hit a limit of how many variables it can have in a BREAK, but try it. If your file is big (hundreds of thousands of records, or more), you'll have many, many break groups, only a little less than one per record. That can slow AGGREGATE badly. If so, SORT CASES to put records in order, and use PRESORTED on AGGREGATE. As an alternative to AGGREGATE OUTFILE=* /BREAK=<varlist> /N_INST 'Number of instances matching vbl grp' = N. consider the following (untested), which retains the original records SORT CASES BY <varlist>. ADD FILES /FILE=* /BY <varlist> /FIRST=ListFrst /LAST =ListLast. NUMERIC List_Dup (F2). VAR LABELS List_Dup 'Record is non-unique on list <describe>'. VAL LABELS List_Dup 0 'Unique' 1 'Duplicated'. COMPUTE List_Dup = 0. IF ListFrst EQ 0 List_Dup = 1. IF ListLast EQ 0 List_Dup = 1. |
Free forum by Nabble | Edit this page |