|
My job is such that I perform the most complicated, subtle task of the
entire year only once per year. I am trying to construct a manual so that when the next year comes around, I don't have to re-invent the wheel. My manual still has gaps in it, however. Please allow me to review, and simplify. In previous years, I spent too much time worrying about duplicate cases, and trying to salvage loose ends. This year, I think I have resolved the duplicate case issues sufficiently, and will deal with odds and ends by discarding them, after duly noting their existence. However, one complication remains: The fundamental unit of observation is not the person, but the case, since a person may be served by this system more than once during a year. Therefore, there may be more than one case per person. So, to review the simplified situation: * I have an aggregated file representing the merger of 4 quarterly case files. Duplicate cases have been removed. Let's call the key ID field the CASEKEY. In addition, the file includes a personal identifier, the SSN. The SSN + Date of Application (DATEAPPL), if correctly specified, should also provide a unique case key. Let's call this file QuExtAll. * I also have an annual case summary file, called RSA911, that is supposed to summarize information on the same cases. However, RSA911 is not derived from QuExtAll. Rather, it is derived independently from the same computerized database. RSA911 does not have the CASEKEY. Instead, it has the SSN and DATEAPPL. I need to combine these two files into one, because although many of the variables are the same, there are important variables in the first file that are not in the second, and important variables in the second file that are not in the first. Wherever there is any conflict in the value of a variable common to both files, I want the value from RSA911 to be used, unless it is "missing" or undefined. I am assuming that the key field to match the two files will be SSN + DATEAPPL. I plan to discard any cases that do not have a match in both files. The question of the hour is, should I be using the syntax for ADD FILES, or for UPDATE? And if either one finds a discrepancy in any of a variable's attributes or characteristics (e.g., field width, type) what will it do? Last year, your choice was UPDATE: At 12:01 PM 5/1/2006, Richard Ristow wrote: >. . . I think you want UPDATE. On the UPDATE command, as I'm outlining it >(not tested), when a variable has data in both input files, the value from >the LATER SPECIFIED one will be taken. "Missing or blank values in the >transaction [i.e., second] files are not used to update values in the >master [first] file." > >So, if >-> You've done the matching of people. You have a variable PRSN_KEY in >files ANNUAL and QTRLY. You're confident that if PRSN_ID is the same in >records in the two files, they represent the same person; and if PRSN_ID >is different, they represent different people. But this is not quite correct, since I will be matching cases, not people. There is a PRSN_ID = SSN, and that will be unique for the person (but not for the case) because I've already thrown out the cases where the SSN is all 0's or all 9's. >-> You've done the selection of quarterly records, so QTRLY contains at >most one record for any PRSN_KEY. Not quite. It can have more than one record for a PRSN_KEY if the DATEAPPL is different. > (That was already true of file ANNUAL, right?) Same thing. >-> You want all variables that appear in either file. For variables that >appear in both files, you want the non-missing value from either. If both >have non-missing values, you want the value found in ANNUAL. Yes! Well stated. >Then, this should work (but very much not tested): > >UPDATE > /FILE=QTRLY > /FILE=ANNUAL > /BY PRSN_KEY. ...except that this should be CASE_KEY, which needs two variables, i.e., SSN and DATEAPPL. How do I designate that syntax? >>But what will it do when it finds a case in one database that is not in >>the other database? > >Take the values from the database the case is in. Variables that are only >in the other database will have missing values. > >What you're doing has a lot of room for glitches; it doesn't sound like >the files are perfectly clean, at all. But I think this should get you >started; I hope, most of the way. > >-Good luck, > Richard Thanks! Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
|
Bob,
I'd like to offer a suggestion. I haven't been following the postings very carefully and I know there have been several back and forths between you and Richard that may have contained important elements that I either didn't read or don't recall. But, based on your most recent email only, this is how I work the problem. It seems that you have two data sets with a common pair of keys. In theory, every case in one dateset should be the other and vice-versa. I would first satisfy myself that this is, in fact, true. It might not be. This is one problem. You have the additional problem that you want to select which variables to save contingently (i.e., look at the values of the same variable in both datasets and decide based on that evaluation which to keep). This is the other problem. This may be overkill but I think it is the only way to be really sure about both problems. First, go into both datasets and rename to a dataset specific name the variables that you are going to want to keep. Then do a Match Files by the two keys. Although I would add a dataset id variable to each dataset before matching, the In subcommand will, I think, do the same thing. If you crosstab the dataset id vars you'll find out whether one dataset has more cases than the other. That might be revealing in and of itself! When you are satisfied that you have the cases right, you can write some code for your contingent variables (I think Do Repeat would work fine here), i.e., to decide whether to use the value from the first dataset or replace it with the value from the other dataset. Finally, do a big rename operation and save the finished dataset. Gene Maguin |
|
In reply to this post by Bob Schacht-3
Thinking along, maybe complementary to Gene and maybe going different
directions, but here goes: At 05:09 PM 6/21/2007, Bob Schacht wrote: >[In my data,] one complication remains: The fundamental unit of >observation is not the person, but the case, since a person may be >served by this system more than once during a year. Therefore, there >may be more than one case per person. So far, so good. Not a difficulty in itself. Data can have any unit of observation, as long as it's clear what it is, and you have keys that identify units. (Data can also, as easily, have several units of observation, in a hierarchy of decreasing aggregation; and that works fine, too.) >So, to review the simplified situation: > * I have an aggregated file representing the merger of 4 quarterly > case files. Duplicate cases have been removed. I take it that this merger is a catenation or interleave. That is, you have the 4 quarterly files, all with the same set of variables, and different records. The merged file has the same set of variables, and all records from all the quarterly files, except (as you say) where duplicates have been removed. (In SPSS, such a merger is usually done with ADD FILES.) And you're all set, here; you already have the merged file with the records from all 4 quarters. >[In this merged file, there is a] key ID field [that uniquely >identifies each case, as defined above]; let's call [it] the CASEKEY. >In addition, the file includes a personal identifier, the SSN. The SSN >+ Date of Application (DATEAPPL), if correctly specified, should also >provide a unique case key. >Let's call this [merged file of the 4 quarters' data] QuExtAll. And, since you've eliminated duplicates, this file has at most one record for any one case. > * I also have an annual case summary file, called RSA911, that is > supposed to summarize information on the same cases. However, RSA911 > is not derived from QuExtAll. Rather, it is derived independently > from the same computerized database. RSA911 does not have the > CASEKEY. Instead, it has the SSN and DATEAPPL. > >I need to combine these two files into one, because although many of >the variables are the same, there are important variables in the first >file that are not in the second, and important variables in the second >file that are not in the first. Wherever there is any conflict in the >value of a variable common to both files, I want the value from RSA911 >to be used, unless it is "missing" or undefined. Clear enough specification here, too,so far. >I am assuming that the key field to match the two files will be SSN + >DATEAPPL. (Grin) A reasonable assumption, since that pair of variables is the only shared key you have. >The question of the hour is, should I be using the syntax for ADD >FILES, or for UPDATE? This one is definitely UPDATE. Like this, but not tested: UPDATE /FILE=QuExtAll /IN=IN_Qtrs /FILE=RSA911 /IN=IN_RSA /BY SSN DATEAPPL. >I plan to discard any cases that do not have a match in both files. That would be, any case for which 'IN_Qtrs' and 'IN_RSA' are not both 1. >And if either one finds a discrepancy in any of a variable's >attributes or characteristics (e.g., field width, type) what will it >do? It will abandon the UPDATE, and give an error message. Then, in the UPDATE statement, you have to rename that variable in one of the files, using the /RENAME subcommand. After you've done that, you can, a. Keep both versions of the variable, with their different types but now also with different names. b. Drop the version of the variable in one of the files. If you do that, rename the version in the file from which you're going to drop the variable, and then drop the renamed version. c. Use COMPUTE to copy the value of the renamed variable to the original name; then, later, drop the renamed version. IF you do that, . Since you want the value from RSA to take precedence, rename the copy in QuExtAll. If the variable is named 'Vbl', rename the QuExtAll copy something like 'QuX_Vbl'. . Then, probably this logic (but not tested) IF MISSING(Vbl) Vbl = QuX_Vbl. You'll probably want to drop 'QuX_Vbl', but *later*, not on the UPDATE statement. -Good night and good luck, Richard |
|
At 07:15 PM 6/21/2007, Richard Ristow wrote:
>Thinking along, maybe complementary to Gene and maybe going different >directions, but here goes: > >At 05:09 PM 6/21/2007, Bob Schacht wrote: > >[snip] > >>So, to review the simplified situation: >> * I have an aggregated file representing the merger of 4 quarterly >>case files. Duplicate cases have been removed. > >I take it that this merger is a catenation or interleave. That is, you >have the 4 quarterly files, all with the same set of variables, and >different records. The merged file has the same set of variables, and >all records from all the quarterly files, except (as you say) where >duplicates have been removed. (In SPSS, such a merger is usually done >with ADD FILES.) > >And you're all set, here; you already have the merged file with the >records from all 4 quarters. Right! >>[In this merged file, there is a] key ID field [that uniquely >>identifies each case, as defined above]; let's call [it] the CASEKEY. >>In addition, the file includes a personal identifier, the SSN. The SSN >>+ Date of Application (DATEAPPL), if correctly specified, should also >>provide a unique case key. >>Let's call this [merged file of the 4 quarters' data] QuExtAll. > >And, since you've eliminated duplicates, this file has at most one record >for any one case. Yes. >> * I also have an annual case summary file, called RSA911, that is >>supposed to summarize information on the same cases. However, RSA911 >>is not derived from QuExtAll. Rather, it is derived independently >>from the same computerized database. RSA911 does not have the >>CASEKEY. Instead, it has the SSN and DATEAPPL. >> >>I need to combine these two files into one, because although many of >>the variables are the same, there are important variables in the first >>file that are not in the second, and important variables in the second >>file that are not in the first. Wherever there is any conflict in the >>value of a variable common to both files, I want the value from RSA911 >>to be used, unless it is "missing" or undefined. > >Clear enough specification here, too,so far. > >>I am assuming that the key field to match the two files will be SSN + >>DATEAPPL. > >(Grin) A reasonable assumption, since that pair of variables is the >only shared key you have. > >>The question of the hour is, should I be using the syntax for ADD FILES, >>or for UPDATE? > >This one is definitely UPDATE. Like this, but not tested: > >UPDATE > /FILE=QuExtAll /IN=IN_Qtrs > /FILE=RSA911 /IN=IN_RSA > /BY SSN DATEAPPL. I didn't think to add the /IN= tags. >>I plan to discard any cases that do not have a match in both files. > >That would be, any case for which 'IN_Qtrs' and 'IN_RSA' are not both 1. This part of my "plan" might be over-hasty. I may not discard these cases after all; depends on the results. In any case, having the 'IN_xxx' tags may come in handy. >>And if either one finds a discrepancy in any of a variable's >>attributes or characteristics (e.g., field width, type) what will it >>do? > >It will abandon the UPDATE, and give an error message. I've already encountered this, as it points out to me half a dozen common variables that are defined differently. Very nice error message. >Then, in the UPDATE statement, you have to rename that variable in one >of the files, using the /RENAME subcommand. After you've done that, you >can, >a. Keep both versions of the variable, with their different types but >now also with different names. >b. Drop the version of the variable in one of the files. If you do >that, rename the version in the file from which you're going to drop >the variable, and then drop the renamed version. >c. Use COMPUTE to copy the value of the renamed variable to the >original name; then, later, drop the renamed version. IF you do that, >. Since you want the value from RSA to take precedence, rename the copy >in QuExtAll. If the variable is named 'Vbl', rename the QuExtAll copy >something like 'QuX_Vbl'. >. Then, probably this logic (but not tested) >IF MISSING(Vbl) Vbl = QuX_Vbl. > >You'll probably want to drop 'QuX_Vbl', but *later*, not on the UPDATE >statement. Rather than do those things, I think I'll fix the files before the merge so that the variables will merge properly in the first place. Thanks for your recommendations! Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
|
Colleagues
Over the last year I have gradually moved from using the menus in SPSS v14 to creating macros. The last stage of this has been to create a file to do exploratory data analysis. Below is a summary of my new syntax. I'd appreciate any comments or suggestions for improvements. Essentially I considered that checking missing values was the stage 1. Then in stage 2 seeing the distribution of the survey responses as frequencies using bar graph was next. Then in stage 3 using Tukey style graphics to reconsider the distributions. The in stage 4 to obtain the spot values for central tendency and dispersion. I realise I have not isolated extreme values nor have I separated kurtosis and skew for the variables to make it easy to focus on them (as this seems not to be allowed by the command syntax). There also seems no requirement to do a list of the variables as I just cust and paste the list from the variables view where I have the placeholder <list of variables>. TIA/gary * TITLE "STAGE 1 MISSING VALUES" . MVA <list of variables> EXECUTE . TITLE "STAGE 2 GRAPHICAL DISTRIBUTION OF FREQUENCIES OF KEY UNIVARIATE VARIABLES" . FREQUENCIES VARIABLES= <list of variables> /HISTOGRAM /ORDER= ANALYSIS . EXECUTE . TITLE "STAGE 3 TUKEY GRAPHICAL UNIVARIATE VARIABLES" . EXAMINE VARIABLES= <list of variables> /PLOT STEMLEAF BOXPLOT NPPLOT . EXECUTE . TITLE "STAGE 4 DESCRIPTIVE CENTRAL TENDENCY & DISPERSION UNIVARIATE VARIABLES" . EXAMINE VARIABLES= <list of variables> /STATISTICS DESCRIPTIVE . EXECUTE . |
|
In reply to this post by Richard Ristow
At 07:15 PM 6/21/2007, Richard Ristow wrote:
[snip] >[Bob had written:] >> * I also have an annual case summary file, called RSA911, that is >>supposed to summarize information on the same cases. However, RSA911 >>is not derived from QuExtAll. Rather, it is derived independently >>from the same computerized database. RSA911 does not have the >>CASEKEY. Instead, it has the SSN and DATEAPPL. >> >>I need to combine these two files into one, because although many of >>the variables are the same, there are important variables in the first >>file that are not in the second, and important variables in the second >>file that are not in the first. Wherever there is any conflict in the >>value of a variable common to both files, I want the value from RSA911 >>to be used, unless it is "missing" or undefined. > >Clear enough specification here, too,so far. > >>I am assuming that the key field to match the two files will be SSN + >>DATEAPPL. > >(Grin) A reasonable assumption, since that pair of variables is the >only shared key you have. > >>The question of the hour is, should I be using the syntax for ADD >>FILES, or for UPDATE? > >This one is definitely UPDATE. Like this, but not tested: > >UPDATE > /FILE=QuExtAll /IN=IN_Qtrs > /FILE=RSA911 /IN=IN_RSA > /BY SSN DATEAPPL. This *seemed to* work like a charm. >>I plan to discard any cases that do not have a match in both files. > >That would be, any case for which 'IN_Qtrs' and 'IN_RSA' are not both 1. . >. . I changed my mind about this, because I want to retain information about active cases, and that information is available ONLY from the Qtrs files. In other words, I kept cases where IN_Qtrs = 1 but IN_RSA = 0. However, the Update did not work as expected. That is, a field present in both files but blank in the Qtrs file was NOT filled in by values from the field in the RSA file. Why not? Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
|
Ah, ever onward. At 05:34 PM 6/28/2007, Bob Schacht wrote:
>At 07:15 PM 6/21/2007, Richard Ristow wrote: > >> >>This one is definitely UPDATE. Like this, but not tested: >> >>UPDATE >> /FILE=QuExtAll /IN=IN_Qtrs >> /FILE=RSA911 /IN=IN_RSA >> /BY SSN DATEAPPL. > >This *seemed to* work like a charm. > >However, the Update did not work as expected. That is, a field present >in both files but blank in the Qtrs file was NOT filled in by values >from the field in the RSA file. Why not? Well, I'm not sure. May I remind you that, if some code doesn't work, it's hard to diagnose from the bare statement that it doesn't? It's much better if you can give some test data, with a statement of what doesn't work - in your case, which variable this happens to. And in your case, among other things: Are there any string variables that *do* appear to be processed as you desire? Off the top of my head, I don't see it. From the Command Syntax Reference article on "UPDATE", it should work as you desire. This is from the SPSS 15 version, but I think it's been the same for many releases: >If the master and transaction files contain common variables for >matched cases, the values for those variables are taken from the >transaction file, provided that the values are not missing or blanks. >Missing or blank values in the transaction files are not used to >update values in the master file. (p.1840) Running with the /MAP subcommand may help, by at least telling you whether SPSS thinks that the two variables *are* the same. That's all I can think of, with what you've given us to work with. Try /MAP, and ask again if that doesn't solve it. -Good luck, Richard |
|
At 03:49 PM 6/28/2007, Richard Ristow wrote:
>Ah, ever onward. At 05:34 PM 6/28/2007, Bob Schacht wrote: > >>At 07:15 PM 6/21/2007, Richard Ristow wrote: >> >>> >>>This one is definitely UPDATE. Like this, but not tested: >>> >>>UPDATE >>> /FILE=QuExtAll /IN=IN_Qtrs >>> /FILE=RSA911 /IN=IN_RSA >>> /BY SSN DATEAPPL. >> >>This *seemed to* work like a charm. >> >>However, the Update did not work as expected. That is, a field present in >>both files but blank in the Qtrs file was NOT filled in by values from >>the field in the RSA file. Why not? > >Well, I'm not sure. > >May I remind you that, if some code doesn't work, it's hard to diagnose >from the bare statement that it doesn't? It's much better if you can give >some test data, with a statement of what doesn't work - in your case, >which variable this happens to. Sorry, I sent in too much haste. Here's an example. A variable in common to both files is Employment Status at Closure, or CLOEMPST. This is a one digit numeric field coded 1-7. Code '5' is for homemakers. A variable found only in the RSA file is Occupation at Closure. It is coded '599999' for Homemakers. However, when I did a crosstab on these two variables using the combined datafile, I got nuthin': The table was empty: >Warnings > >The crosstabulation of Occupation@Clo * Employment Status @Closure is empty. But when I backed up and used the original RSA file, I got a table populated with 667 cases-- which was the number of cases closed in employment. Does that help? >And in your case, among other things: Are there any string variables that >*do* appear to be processed as you desire? In this case, Occupation at Closure was a string variable, but CLOEMPST was numeric. No problems with Occupation at Closure, but it was only in the RSA file. Why should it make any difference if its a string variable? >Off the top of my head, I don't see it. From the Command Syntax Reference >article on "UPDATE", it should work as you desire. This is from the SPSS >15 version, but I think it's been the same for many releases: Mine is version 12. >>If the master and transaction files contain common variables for matched >>cases, the values for those variables are taken from the transaction >>file, provided that the values are not missing or blanks. Missing or >>blank values in the transaction files are not used to update values in >>the master file. (p.1840) > >Running with the /MAP subcommand may help, by at least telling you whether >SPSS thinks that the two variables *are* the same. Yes, it thinks they're the same. I re-ran the Occupation @ Closure by Employment Status @ closure, and it was empty again. Any suggestions? In any case, thanks with your efforts to help! Bob >That's all I can think of, with what you've given us to work with. Try >/MAP, and ask again if that doesn't solve it. > >-Good luck, > Richard |
|
At 10:55 PM 6/28/2007, Bob Schacht wrote:
>>UPDATE >> /FILE=QuExtAll /IN=IN_Qtrs >> /FILE=RSA911 /IN=IN_RSA >> /BY SSN DATEAPPL. >> >>>The Update did not work as expected. A field present in both files >>>but blank in the Qtrs file was NOT filled in by values from the >>>field in the RSA file. Why not? >> >>If code doesn't work, it's much better if you give some test data, >>with a statement of what doesn't work - in your case, which variable >>this happens to. > >A variable in common to both files is Employment Status at Closure, or >CLOEMPST. This is a one-digit numeric field coded 1-7. Code '5' is for >homemakers. > >A variable found only in the RSA file is Occupation at Closure. It is >coded '599999' for Homemakers. However, when I did a crosstab on these >two variables using the combined datafile, I got nuthin': The table >was empty: > >>Warnings >> >>The crosstabulation of Occupation@Clo * Employment Status @Closure is >>empty. > >But when I backed up and used the original RSA file, I got a table >populated with 667 cases-- which was the number of cases closed in >employment. Does that help? It's closer. Still a couple of remarks, though: - You wrote "A field present in both files but blank in the Qtrs file was NOT filled in by values from the field in the RSA file." Now it looks like the field wasn't blank (which applies to string variables) but system-missing (which applies to numeric variables). That is just the kind of thing it's important to state accurately. If you're not clear about the difference, give some test data, and a listing of the variables' attributes. Use DISPLAY DICTIONARY/VARIABLES=... - You wrote, "when I did a crosstab, the table was empty." Best practice is to 'think globally, debug locally.' Crosstab is global: it gives you overall statistics from the file, but no indication of what records were involved. It would not be hard to list the QuExtAll records, the RSA911 records, and the merged records, for a few cases that have values for both variables in the RSA911 file. If there are too many variables, use the /VARIABLES subcommand on LIST to restrict to the relevant variables. You might even solve your own problem, doing this. Besides, your CROSSTABS doesn't tell you anything about the individual variables, only that they never both have valid values in the same record. What happens if you do FREQUENCIES on both variables, before and after the merge? That'll test whether or not the variables individually seem to have come through. Do you have any user-missing values for either variable, in either file? FREQUENCIES would catch that. In other words, you have some very potent diagnostic tools available easily. I don't see your problem offhand, so I'd just have to run the same diagnostics myself (except you've sent no data, as noted previously). So, give it a go. LIST plus FREQUENCIES will tell you a lot you don't know now; there's a fair chance they'll solve your problem. And it's getting late, here in EDT. Good luck, and let's see what tomorrow brings. -Best wishes, Richard |
|
At 06:03 PM 6/28/2007, Richard Ristow wrote:
>At 10:55 PM 6/28/2007, Bob Schacht wrote: > >>>UPDATE >>> /FILE=QuExtAll /IN=IN_Qtrs >>> /FILE=RSA911 /IN=IN_RSA >>> /BY SSN DATEAPPL. >>> >>>>The Update did not work as expected. A field present in both files but >>>>blank in the Qtrs file was NOT filled in by values from the field in >>>>the RSA file. Why not? [snip] >>A variable in common to both files is Employment Status at Closure, or >>CLOEMPST. This is a one-digit numeric field coded 1-7. Code '5' is for >>homemakers. . . . >> >>But when I backed up and used the original RSA file, I got a table >>populated with 667 cases-- which was the number of cases closed in >>employment. Does that help? > >It's closer. Still a couple of remarks, though: > >- You wrote "A field present in both files but blank in the Qtrs file was >NOT filled in by values from the field in the RSA file." > Now it looks like the field wasn't blank (which applies to string > variables) but system-missing (which applies to numeric variables). . . . You are right. I should not have described the field as blank. >- You wrote, "when I did a crosstab, the table was empty." Best practice >is to 'think globally, debug locally.' Crosstab is global: it gives you >overall statistics from the file, but no indication of what records were >involved. > It would not be hard to list the QuExtAll records, the RSA911 records, > and the merged records, for a few cases that have values for both > variables in the RSA911 file. If there are too many variables, use the > /VARIABLES subcommand on LIST to restrict to the relevant variables. You > might even solve your own problem, doing this. I looked at the original QuExtAll file, the original RSA911 file, and the combined file, for these 17 cases closed "in employment" as "homemakers." Since presenting tables in this forum is awkward, a few limited examples will have to suffice. Three variables, SSN, Comp-Emp, and CLOEMPST were available in both files. Variables OccuCode and CLOSTYPE were present only in the RSA911 file, and STAT was present only in QuExtAll. The 17 cases in question were selected in Quextall using Comp-emp + STAT; the same cases were isolated in the RSA911 file using Comp-Emp + CLOSTYPE. The SSN were cross-checked to align the case information. Selected Selected Fields in Fields in QuExtAll RSA911 ------------- ------------ SSN SSN OccuCode Comp-Emp Comp-Emp STAT CLOSTYPE CLOEMPST CLOEMPST The value of CLOEMPST in these 17 cases was always "5" in the RSA911 file, and always ". " in the QuExtAll file, indicating system missing, right? In the combined file, the value of CLOEMPST was always ". ". Obviously, this is not what I wanted, which was that the result, if the field is present in both files, is the non-missing value. I left the syntax commands from the original message up top to show the order in which they were invoked by my UPDATE command. > Besides, your CROSSTABS doesn't tell you anything about the individual > variables, only that they never both have valid values in the same > record. What happens if you do FREQUENCIES on both variables, before and > after the merge? That'll test whether or not the variables individually > seem to have come through. Frequencies on CLOEMPST, original QuExtAll file (closed + active cases): Employment Status @Closure FrequencyPercent Missing036.5 16488.3 310.1 45.1 517.2 711.1 System704790.6 Total7774100.0 Frequencies on CLOEMPST, original RSA911 File (closed cases only): Employment status@clo FrequencyPercentValid PercentCumulative Percent Valid162628.393.993.9 39.41.395.2 44.2.695.8 517.82.598.4 711.51.6100.0 Total66730.1100.0 MissingSystem154869.9 Total2215100.0 This draws my attention to the fact that for the QuExtAll file, I declared "0" to be a missing value, but did not do the same for the RSA911 file. The frequencies for the combined file were as follows: Employment Status @Closure FrequencyPercent Missing036.5 16488.3 310.1 45.1 517.2 711.1 System704790.6 Total7774100.0 It finally has dawned on me that I was using the missing value specification incorrectly, and had inadvertently specified all possible values as missing. The combination of files process copied this specification along with everything else, so the values all got copied, but were all considered missing! My bad. Thanks for pushing me towards the correct solution. Bob Schacht |
|
At 08:59 PM 6/29/2007, Bob Schacht wrote:
>At 06:03 PM 6/28/2007, Richard Ristow wrote [double-quoted text, >below]: >>>>UPDATE >>>> /FILE=QuExtAll /IN=IN_Qtrs >>>> /FILE=RSA911 /IN=IN_RSA >>>> /BY SSN DATEAPPL. >>- You wrote "A field present in both files but blank in the Qtrs file >>was >>NOT filled in by values from the field in the RSA file." >> Now it looks like the field wasn't blank (which applies to string >>variables) but system-missing (which applies to numeric >>variables). . . . > >You are right. I should not have described the field as blank. Thank you. That kind of accuracy is a useful discipline to remember, for the future. >> It would not be hard to list the QuExtAll records, the RSA911 >> records, and the merged records, for a few cases that have values >> for both variables in the RSA911 file. If there are too many >> variables, use the /VARIABLES subcommand on LIST to restrict to the >> relevant variables. You might even solve your own problem, doing this. > >I looked at the original QuExtAll file, the original RSA911 file, and >the combined file, for these 17 cases closed "in employment" as >"homemakers." Since presenting tables in this forum is awkward Goodness, is it really that bad? I don't see that it's more than 5 variables from any of the files. It would be an awfully useful tool to work with. Is it awkward because there are a lot of *other* variables? There are several ways of getting only the relevant ones, including /VARIABLES= on LIST (see previous posting), and /KEEP on GET FILE. Either way, use LIST to print a table to a draft output file, copy and paste. >Three variables, SSN, Comp-Emp, and CLOEMPST were available in both >files. Variables OccuCode and CLOSTYPE were present only in the RSA911 >file, and STAT was present only in QuExtAll. The 17 cases in question >were selected in Quextall using Comp-emp + STAT; the same cases were >isolated in the RSA911 file using Comp-Emp + CLOSTYPE. The SSN were >cross-checked to align the case information. OK. The following is the /MAP listing from UPDATE, right? >Selected Selected >Fields in Fields in >QuExtAll RSA911 >------------- ------------ >SSN SSN > OccuCode >Comp-Emp Comp-Emp >STAT > CLOSTYPE >CLOEMPST CLOEMPST > >The value of CLOEMPST in these 17 cases was always "5" in the RSA911 >file, and always "." in the QuExtAll file, indicating system missing, >right? Yes, that's how system-missing appears in LIST output. >In the combined file, the value of CLOEMPST was always ".". What I >wanted, which was, if the field is present in both files, the >non-missing value. I left the syntax commands from the original >message ... Namely this, right? >UPDATE > /FILE=QuExtAll /IN=IN_Qtrs > /FILE=RSA911 /IN=IN_RSA > /BY SSN DATEAPPL. One thing raises question whether this is, after all, be the syntax you ran: The UPDATE command includes DATEAPPL as a key variable; but DATEAPPL does not appear in the map from UPDATE, above. What happens if you do FREQUENCIES on both variables, before and >>after the merge? That'll test whether or not the variables >>individually >>seem to have come through. > >Frequencies on CLOEMPST, original QuExtAll file (closed + active >cases): >Employment Status @Closure [...] >This draws my attention to the fact that for the QuExtAll file, I >declared "0" to be a missing value, but did not do the same for the >RSA911 file. > >It finally has dawned on me that I was using the missing value >specification incorrectly, and had inadvertently specified all >possible values as missing. The combination of files process copied >this specification along with everything else, so the values all got >copied, but were all considered missing! My bad. These things happen. To say once more, as we close this: It helps a great deal, it would have helped a great deal, to have all the relevant information, when working on a problem. In this case, giving us complete dictionary information on the relevant variables, and a listing of the data, would probably have got you to the solution several posting cycles, and several days, sooner, which would likely have been to your advantage. In fact, you might well have solved the problem yourself from that information, as in fact you did. >Thanks for pushing me towards the correct solution. You're most welcome. And, with a deep sigh of relief, on to the next. -Wishing you the best of good analysis, Richard Ristow |
| Free forum by Nabble | Edit this page |
