I have two largish databases (250 fields, 80-300 records) that both have
data from the same questionnaire-- one is an electronic database collected via Zoomerang, the other built from the pencil & paper version. Most (240?) of the fields should be identical, but some are different. Assume for present purposes that there is no overlap of cases. My problem is this: the field names are all different between the two databases because they were assigned by different people at different times. Furthermore, both databases are still 'alive'-- cases are being added, although one will be closed as of November 30. I guess I can use the RENAME VARIABLES command in a syntax file that will change the variable names from one file's set of names to the other, and if more cases come in, I can use the same syntax file with those. I have already generated a list of the variable names, copying from the "variable view" window to columns in an Excel spreadsheet, and from there to a table in Word. then I mainly have to be sure that the variable names align properly (I already know that in some cases, they don't), so that the RENAME lists are correctly set up. Then I can use Data/Merge Files/Add cases, right? Any other suggestions, comments, or warnings? Thanks, Bob in HI Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
> Bob,
It sounds like you've got the right idea. Once you've got the rename vars command set up, you can re-use it later as long as everything stays the same. To echo the wise words of others, 'protect the original files' before you add them/change them. Whenever possible/practical, I like to set permissions on the original file to read-only and make all the changes using syntax, saving out to a seperate file. Doing it this way has the benefit of being able to easily undo changes, or scrutinize prior decisions about what/how data was altered. The syntax file is a sort of 'RNA' for producing current, accurate data. -Gary |
In reply to this post by Bob Schacht-3
At 06:53 PM 11/17/2006, Bob Schacht asked:
>I have two largish databases (250 fields, 80-300 records) with that >have data from the same questionnaire > >My problem is this: the 'field' [i.e., variable] names are all >different between the two databases. Furthermore, both databases are >still 'alive'-- cases are being added, although one will be closed as >of November 30. > >I guess I can use the RENAME VARIABLES command in a syntax file that >will change the variable names from one file's set of names to the >other, and if more cases come in, I can use the same syntax file with >those. > >Any other suggestions, comments, or warnings? Your way will work; see also the advice from Hal 9000, especially the note about protecting the original data. My taste would be to use the /RENAME subcommand on my ADD FILES, rather than a separate RENAME VARIABLES. That does mean writing the ADD FILES in syntax, rather than using the menu version "Data/Merge Files/Add cases". |
At 11:22 AM 11/20/2006, Richard Ristow wrote:
>At 06:53 PM 11/17/2006, Bob Schacht asked: > >>I have two largish databases (250 fields, 80-300 records) with that have >>data from the same questionnaire >> >>My problem is this: the 'field' [i.e., variable] names are all different >>between the two databases. Furthermore, both databases are still >>'alive'-- cases are being added, although one will be closed as of November 30. Richard and Hal, Thank you for your previous responses. I have discovered that my problem is a bit more complicated than this, in that the fields that are supposed to be the same often differ in type, width or "measure" (Nominal or Scale). See more below. >>. . . I can use the RENAME VARIABLES command in a syntax file that will >>change the variable names from one file's set of names to the other, and >>if more cases come in, I can use the same syntax file with those. . . . > >Your way will work; see also the advice from Hal 9000, especially the note >about protecting the original data. > >My taste would be to use the /RENAME subcommand on my ADD FILES, rather >than a separate RENAME VARIABLES. That does mean writing the ADD FILES in >syntax, rather than using the menu version "Data/Merge Files/Add cases". Because some of the fields that are supposed to match, don't match in type, width, or measure, do I need to handle that issue first, before using the Add Files? Also, I seem to recall from dialogue here on the L that changing a variable's type is not possible with syntax, but only through the user interface via "Variable view", and clicking on the gray box on the right side of the "variable type" cell, which brings up a menu of variable types. Do I remember correctly? The way I understand it, then, is that I need to 1. Open the Variable View for the donor database 2. Manually change as needed the type, width and measure of each variable so that it will correspond with the type, width and measure of the intended variables in the receiving database 3. Use the Add Files command with the /Rename subcommand to match the donor and recipient variables, and add the cases from the donor database. Is this right? Any easier way to do it? Thanks, Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
At 02:26 PM 11/30/2006, Bob Schacht wrote:
>>>I have two largish databases (250 fields, 80-300 records) with that >>>have >>>data from the same questionnaire [to be concanated]. >>> >>>The 'field' [i.e., variable] names are all different between the two >>>databases. > >I have discovered that my problem is a bit more complicated, in that >the fields that are supposed to be the same often differ in type, >width or "measure" (Nominal or Scale). >See more below. I wrote that >>my taste would be to use the /RENAME subcommand on my ADD FILES, >>rather than a separate RENAME VARIABLES. > >Because some of the fields that are supposed to match, don't match in >type, width, or measure, do I need to handle that issue first, before >using the Add Files? No. As I wrote before, having incompatible variable names actually makes this problem easier. Suppose you have File_A and File_B. Variable names for corresponding quantities are different; for example, the answer to question 1 is called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the correspondence of names wouldn't be that neat.) Suppose that you want to keep the variable names and attributes as they are in File_A. Then, ADD FILES /FILE=File_A /FILE=File_B. For variables with identical types and lengths in the two files (don't worry about measurement level), add RENAME clauses for File_B. If, say, variables for questions 1 and 3 match this way, ADD FILES /FILE=File_A /FILE=File_B/RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3). For variables that don't have identical types and lengths, after the ADD FILES command, * If they're both strings and the lengths are different, use simple COMPUTE statements: . COMPUTE A_Ques2 = B_Ques2. (But be careful of losing text if the variable from file B is longer.) * If the variable in file A is numeric and the variable from file B is character, say two spaces long, . COMPUTE A_Ques4 = NUMBER(B_Ques4,F2). (But be careful of any values in file B that can't be converted to numbers.) * Similarly, if the variable in file A is string and the variable from file B is numeric, . COMPUTE A_Ques6 = STRING(B_Ques6,F2). After, you may want to delete all variables from file B that were converted by such COMPUTE statements. But that's as you prefer. >Also, I seem to recall from dialogue here on the L that changing a >variable's type is not possible with syntax, but only through the user >interface via "Variable view", and clicking on the gray box on the >right >side of the "variable type" cell, which brings up a menu of variable >types. >Do I remember correctly? > >The way I understand it, then, is that I need to >1. Open the Variable View for the donor database >2. Manually change as needed the type, width and measure of each >variable >so that it will correspond with the type, width and measure of the >intended >variables in the receiving database >3. Use the Add Files command with the /Rename subcommand to match the >donor >and recipient variables, and add the cases from the donor database. > >Is this right? Any easier way to do it? > >Thanks, >Bob > >Robert M. Schacht, Ph.D. <[hidden email]> >Pacific Basin Rehabilitation Research & Training Center >1268 Young Street, Suite #204 >Research Center, University of Hawaii >Honolulu, HI 96814 |
Colleagues
My syntax files are full of comments either clustering similar calculations on similar variables together or reminding me what I am doing with SPSS commands Of course SPSS then does the calculations and they then appear in the output file (and in the explorer view of the output file) as a series of nodes. I would like to insert before some of those nodes in my output file node saying things like "Attitude Variables Measured Directly" "Attitude Variables Measured Indirectly" etc. Pls can anyone suggest ways for me to achieve this TIA/gary Unit Co-ordinator for Business Information Systems (A Postgraduate Master of Commerce and Master of Business unit) School of Business The University of Sydney ------------------------ ,-_|\ Building H69, Office 437 / \ Corner Codrington Street \_,-._* & Rose Street Darlington 2006 @ Australia -------------------------------------- E-mail: [hidden email] ------------------------ Location details: Travelling from Broadway, turn south off City Road Navigate toward the Acquatic Centre ------------------------ University Map: http://db.auth.usyd.edu.au/directories/map/index.stm University Website: www.usyd.edu.au Faculty Website www.econ.usyd.edu.au ------------------------ Faculty Student Information Office (Timetables, Special Consideration) Merewether Building Enter from City Road side e-mail: [hidden email] Phone: 9351-3076 ---------------------------------- Executive Officer for Business Information Systems Katy Roy Room 347, Building H69 E-mail: [hidden email] Phone: 9036 9432 --------------------------------- |
Have you tried the ECHO command?
----- Original Message ----- From: "Gary Oliver" <[hidden email]> To: <[hidden email]> Sent: Thursday, November 30, 2006 6:32 PM Subject: Inserting Section Headings as nodes in Output file > Colleagues > > My syntax files are full of comments either clustering similar > calculations on similar variables together or reminding me what I am > doing with SPSS commands > > Of course SPSS then does the calculations and they then appear in the > output file (and in the explorer view of the output file) as a series of > nodes. I would like to insert before some of those nodes in my output > file node saying things like "Attitude Variables Measured Directly" > "Attitude Variables Measured Indirectly" etc. > > Pls can anyone suggest ways for me to achieve this > > TIA/gary > > Unit Co-ordinator for Business Information Systems > (A Postgraduate Master of Commerce and Master of Business unit) > School of Business > The University of Sydney > ------------------------ > ,-_|\ Building H69, Office 437 > / \ Corner Codrington Street > \_,-._* & Rose Street > Darlington 2006 > @ Australia > -------------------------------------- > E-mail: [hidden email] > ------------------------ > Location details: > Travelling from Broadway, turn south off City Road > Navigate toward the Acquatic Centre > ------------------------ > University Map: http://db.auth.usyd.edu.au/directories/map/index.stm > University Website: > www.usyd.edu.au > Faculty Website > www.econ.usyd.edu.au > ------------------------ > Faculty Student Information Office > (Timetables, Special Consideration) > Merewether Building > Enter from City Road side > e-mail: [hidden email] > Phone: 9351-3076 > ---------------------------------- > Executive Officer for Business Information Systems > Katy Roy > Room 347, Building H69 > E-mail: [hidden email] > Phone: 9036 9432 > --------------------------------- > |
In reply to this post by Gary Oliver
If you want these comments to appear in the log items in the Viewer, you can use echo or, with preferences set to show the syntax in the log (which is now the default in SPSS 15), comment commands will do this.
If you want text blocks inserted, which would make the text stand out from syntax echoing and such and you want this synchronized with your actual output (which you probably do want), then you need a scripting solution. For synchronization, you need SPSS 14.0.2 or later and the Python programmability Plug-In. But then it is very simple to insert these text blocks using the viewer module ViewerText.insert method. I can post details if you want to go this route. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Oliver Sent: Thursday, November 30, 2006 7:32 PM To: [hidden email] Subject: [SPSSX-L] Inserting Section Headings as nodes in Output file Colleagues My syntax files are full of comments either clustering similar calculations on similar variables together or reminding me what I am doing with SPSS commands Of course SPSS then does the calculations and they then appear in the output file (and in the explorer view of the output file) as a series of nodes. I would like to insert before some of those nodes in my output file node saying things like "Attitude Variables Measured Directly" "Attitude Variables Measured Indirectly" etc. Pls can anyone suggest ways for me to achieve this TIA/gary Unit Co-ordinator for Business Information Systems (A Postgraduate Master of Commerce and Master of Business unit) School of Business The University of Sydney ------------------------ ,-_|\ Building H69, Office 437 / \ Corner Codrington Street \_,-._* & Rose Street Darlington 2006 @ Australia -------------------------------------- E-mail: [hidden email] ------------------------ Location details: Travelling from Broadway, turn south off City Road Navigate toward the Acquatic Centre ------------------------ University Map: http://db.auth.usyd.edu.au/directories/map/index.stm University Website: www.usyd.edu.au Faculty Website www.econ.usyd.edu.au ------------------------ Faculty Student Information Office (Timetables, Special Consideration) Merewether Building Enter from City Road side e-mail: [hidden email] Phone: 9351-3076 ---------------------------------- Executive Officer for Business Information Systems Katy Roy Room 347, Building H69 E-mail: [hidden email] Phone: 9036 9432 --------------------------------- |
In reply to this post by Richard Ristow
At 12:55 PM 11/30/2006, Richard Ristow wrote:
>At 02:26 PM 11/30/2006, Bob Schacht wrote: > >>>>I have two largish databases (250 fields, 80-300 records) with that >>>>have >>>>data from the same questionnaire [to be concanated]. Thanks for your helpful response, Richard. And yes, concatenated is the appropriate word. What I meant by "same questionnaire" is that most people responded by pencil and paper in the old fashioned way, while others responded to an electronic version of the same questionnaire on Zoomerang. Zoomerang provides an Excel file of the results, whereas I had constructed an SPSS database, based on the printed version. It would have been easier (?) if I had started out by using the initial Zoomerang files to make an Excel file, then making an SPSS file from that, and using that as the database to record the responses from the pencil and paper questionnaires Some interesting differences: Electronically, if a question has six possible responses and I want them to choose only one, I can force that in Zoomerang, because if the respondent clicks on a second choice, the first choice mark is removed by the software. But in the pencil and paper version, people can ignore the directions and mark more than one choice, no matter what the directions say. If only one or two people do that, it doesn't matter. But if many people do that, I have a problem. Unless I'm going to choose arbitrarily among their choices, I have to convert the response format to a field for each choice, as if it were a "mark all that apply" question, rather than one field with a code for the different choices. In this sense, although the questions in the electronic and written forms are the same, the responses are not! >>>>The 'field' [i.e., variable] names are all different between the two >>>>databases. >> >>I have discovered that my problem is a bit more complicated, in that >>the fields that are supposed to be the same often differ in type, >>width or "measure" (Nominal or Scale). >>See more below. I have also discovered since then, that if SPSS can't figure out the type of variable it is reading, it automatically defaults to Numeric 8.2. Consequently, many of the inconsistencies in type and width that I have encountered seem to be due to this default designation. >I wrote that >>>my taste would be to use the /RENAME subcommand on my ADD FILES, >>>rather than a separate RENAME VARIABLES. >> >>Because some of the fields that are supposed to match, don't match in >>type, width, or measure, do I need to handle that issue first, before >>using the Add Files? > >No. As I wrote before, having incompatible variable names actually >makes this problem easier. > >Suppose you have File_A and File_B. Variable names for corresponding >quantities are different; for example, the answer to question 1 is >called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the >correspondence of names wouldn't be that neat.) Suppose that you want >to keep the variable names and attributes as they are in File_A. Thanks for these detailed examples! >Then, > >ADD FILES > /FILE=File_A > /FILE=File_B. > >For variables with identical types and lengths in the two files (don't >worry about measurement level), add RENAME clauses for File_B. If, say, >variables for questions 1 and 3 match this way, > > >ADD FILES > /FILE=File_A > /FILE=File_B/RENAME=(B_Ques1, B_Ques3 > = A_Ques1, A_Ques3). > >For variables that don't have identical types and lengths, after the >ADD FILES command, > >* If they're both strings and the lengths are different, use simple >COMPUTE statements: >. COMPUTE A_Ques2 = B_Ques2. >(But be careful of losing text if the variable from file B is longer.) > >* If the variable in file A is numeric and the variable from file B is >character, say two spaces long, >. COMPUTE A_Ques4 = NUMBER(B_Ques4,F2). >(But be careful of any values in file B that can't be converted to >numbers.) > >* Similarly, if the variable in file A is string and the variable from >file B is numeric, >. COMPUTE A_Ques6 = STRING(B_Ques6,F2). > >After, you may want to delete all variables from file B that were >converted by such COMPUTE statements. But that's as you prefer. Again, thanks for these detailed examples. Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
the format should be whatever you set in the defaults
go to <edit><options> <data> about half way down on the left of the box set teh default for new variables to whatever you would like. The default I use is zero. or in syntax before bringing in the data set format = f6.0. Art Kendall Social Research Consultants Bob Schacht wrote: <parts snipped> > > I have also discovered since then, that if SPSS can't figure out the type > of variable it is reading, it automatically defaults to Numeric 8.2. > Consequently, many of the inconsistencies in type and width that I have > encountered seem to be due to this default designation. > > > Bob > > > Robert M. Schacht, Ph.D. <[hidden email]> > Pacific Basin Rehabilitation Research & Training Center > 1268 Young Street, Suite #204 > Research Center, University of Hawaii > Honolulu, HI 96814 > > |
In reply to this post by Bob Schacht-3
Bob Schacht raised an interesting further issue:
At 04:51 PM 12/1/2006, Bob Schacht wrote: >What I meant by "same questionnaire" is that most people responded by >pencil and paper in the old fashioned way, while others responded to >an electronic version of the same questionnaire on Zoomerang. >Zoomerang provides an Excel file of the results, whereas I had >constructed an SPSS database, based on the printed version. It would >have been easier (?) if I had started out by using the initial >Zoomerang files to make an Excel file, then making an SPSS file from >that, and using that as the database to record the responses from the >pencil and paper questionnaires. That's probably true, though see below. Of course, we'd all find our work easier if, at the beginning, we could always make the decisions that, at the end, will turn out to have been the best. >Some interesting differences: Electronically, if a question has six >possible responses and I want them to choose only one, I can force >that in Zoomerang. But in the pencil and paper version, people can >ignore the directions and mark more than one choice, no matter what >the directions say. If only one or two people do that, it doesn't >matter. But if many people do that, I have a problem. Unless I'm going >to choose arbitrarily among their choices, I have to convert the >response format to a field for each choice, as if it were a "mark all >that apply" question, rather than one field with a code for the >different choices. In this sense, although the questions in the >electronic and written forms are the same, the responses are not! Or, in fact, the questions are effectively NOT the same, in that the 'space' of possible responses is different. And that is an interesting difference, in fact a fascinating one. It isn't a data-management matter; it's a substantive observation about your project. What does it seem the multiple responses mean? There's always the 'ballot' solution: reject any case where more than one was chosen. I don't know; maybe, below about 5% incidence of multiple answers, I'd take this alternative. Otherwise, consider the nature of the question, and the answers. . If it's a Likert scale or some such, you could take something like the mid-range of the responses. But I'd want another variable indicating there was more than one response, and maybe the largest and smallest response. If somebody marked both '1' and '5' on a scale, it's fair to think you've no idea what they meant. . If it seems inherently multiple-response ("What make of car do you own?"), probably represent the paper results as a multiple-response item, and consider changing the electronic version to allow multiple responses. . If the question invites ambiguity ("Which is your favorite kind of cake?" or uncertainty ("Where are you going for your next vacation?"), I don't know. Probably, consult somebody with deep knowledge about such kinds of survey question, and the underlying psychological theory. Superficially, I seem to recall that forced-choice questions (one response taken) are considered different from the same question with multiple responses allowed, possibly with advantages and disadvantages of each. . Finally, of course, if there are significant number of multiple responses, recognize that the questions are NOT the same on the paper and electronic versions, though they're phrased the same, and use caution in interpreting, accordingly. -Best of luck, Richard |
At 01:38 PM 12/4/2006, Richard Ristow wrote:
>Bob Schacht raised an interesting further issue: > >At 04:51 PM 12/1/2006, Bob Schacht wrote: > >>What I meant by "same questionnaire" is that most people responded by >>pencil and paper in the old fashioned way, while others responded to an >>electronic version of the same questionnaire on Zoomerang. Zoomerang >>provides an Excel file of the results, whereas I had constructed an SPSS >>database, based on the printed version. It would have been easier (?) if >>I had started out by using the initial Zoomerang files to make an Excel >>file, then making an SPSS file from that, and using that as the database >>to record the responses from the pencil and paper questionnaires. >[snip] >>Some interesting differences: Electronically, if a question has six >>possible responses and I want them to choose only one, I can force that >>in Zoomerang. But in the pencil and paper version, people can ignore the >>directions and mark more than one choice, no matter what the directions >>say. If only one or two people do that, it doesn't matter. But if many >>people do that, I have a problem. Unless I'm going to choose arbitrarily >>among their choices, I have to convert the response format to a field for >>each choice, as if it were a "mark all that apply" question, rather than >>one field with a code for the different choices. In this sense, although >>the questions in the electronic and written forms are the same, the >>responses are not! > >Or, in fact, the questions are effectively NOT the same, in that the >'space' of possible responses is different. And that is an interesting >difference, in fact a fascinating one. It isn't a data-management matter; >it's a substantive observation about your project. What does it seem the >multiple responses mean? In some cases, it means that the question was poorly designed! That's the trouble with using freshly minted, untested questions sprung full blown from the forehead of Zeus (or, in this case, me). In the case of one of the questions under discussion, I designed what I thought at the time were mutually exclusive alternatives arranged in what was intended to be a scalar dimension. But it turned out that adjacent alternatives were not always mutually exclusive. >There's always the 'ballot' solution: reject any case where more than one >was chosen. I don't know; maybe, below about 5% incidence of multiple >answers, I'd take this alternative. Me, too. But the incidence seemed to be more than 5%. >Otherwise, consider the nature of the question, and the answers. > >. If it's a Likert scale or some such, you could take something like the >mid-range of the responses. But I'd want another variable indicating there >was more than one response, and maybe the largest and smallest response. >If somebody marked both '1' and '5' on a scale, it's fair to think you've >no idea what they meant. Yes on all counts (see above). In fact, we briefly considered this strategy. >. If it seems inherently multiple-response ("What make of car do you >own?"), probably represent the paper results as a multiple-response item, >and consider changing the electronic version to allow multiple responses. The question was intended to place respondents with disabilities on a scale of independence ranging from "independent" to "dependent." In other words, we wanted to place the respondents on a scale, the underlying value of which was "how independent are you?" >. If the question invites ambiguity ("Which is your favorite kind of >cake?" or uncertainty ("Where are you going for your next vacation?"), I >don't know. Well, it wasn't *meant* to invite ambiguity, but it did. >Probably, consult somebody with deep knowledge about such kinds of survey >question, and the underlying psychological theory. Superficially, I seem >to recall that forced-choice questions (one response taken) are considered >different from the same question with multiple responses allowed, possibly >with advantages and disadvantages of each. > >. Finally, of course, if there are significant number of multiple >responses, recognize that the questions are NOT the same on the paper and >electronic versions, though they're phrased the same, and use caution in >interpreting, accordingly. > >-Best of luck, > Richard I like your first "Otherwise" alternative best for this situation. Thanks. Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
In reply to this post by Richard Ristow
At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my previous
question: [snip] >Suppose you have File_A and File_B. Variable names for corresponding >quantities are different; for example, the answer to question 1 is >called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the >correspondence of names wouldn't be that neat.) Suppose that you want >to keep the variable names and attributes as they are in File_A. > >Then, > >ADD FILES > /FILE=File_A > /FILE=File_B. > >For variables with identical types and lengths in the two files (don't >worry about measurement level), add RENAME clauses for File_B. If, say, >variables for questions 1 and 3 match this way, > >ADD FILES > /FILE=File_A > /FILE=File_B/RENAME=(B_Ques1, B_Ques3 > = A_Ques1, A_Ques3). > >For variables that don't have identical types and lengths, after the >ADD FILES command, > >* If they're both strings and the lengths are different, use simple >COMPUTE statements: >. COMPUTE A_Ques2 = B_Ques2. >(But be careful of losing text if the variable from file B is longer.) > >* If the variable in file A is numeric and the variable from file B is >character, say two spaces long, >. COMPUTE A_Ques4 = NUMBER(B_Ques4,F2). >(But be careful of any values in file B that can't be converted to >numbers.) > >* Similarly, if the variable in file A is string and the variable from >file B is numeric, >. COMPUTE A_Ques6 = STRING(B_Ques6,F2). Richard, Thanks for these suggestions. Can these be listed in any order? Could I do the whole thing just with COMPUTE statements? i.e., instead of RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3). Could I use instead COMPUTE A_Ques1 = B_Ques1 COMPUTE A_Ques3 = B_Ques3 etc.??? Thanks, Bob |
At 03:07 PM 12/8/2006, Bob Schacht wrote:
>At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my >previous >question: > >>Suppose you have File_A and File_B. Suppose that you want >>to keep the variable names and attributes as they are in File_A. >> >>Then, >> >>ADD FILES >> /FILE=File_A >> /FILE=File_B. >> >>For variables with identical types and lengths in the two files >>(don't >>worry about measurement level), add RENAME clauses for File_B. >> >>For variables that don't have identical types and lengths, after the >>ADD FILES command [copy values using COMPUTE statements]. > >Thanks for these suggestions. >Can these be listed in any order? Sure. You have to be careful about the order you want your variables in. But if all the quantities you want exist in File_A, and the variables in File_A are in the order you want, that will take care of itself. >Could I do the whole thing just with COMPUTE statements? i.e., instead >of >RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3). >Could I use instead > >COMPUTE A_Ques1 = B_Ques1 >COMPUTE A_Ques3 = B_Ques3 Again, sure you could. Doing it with COMPUTEs leaves variables B_Ques1 and B_Ques3 in the combined file; doing it with RENAME drops those variables. But that's likely not a large drawback, since you'll probably be dropping variables anyway. You could even do it with a big DO REPEAT: DO REPEAT A_stuff = <all File_A variables except the keys> /B_stuff = <all File_B variables except the keys>. . COMPUTE A_stuff = B_stuff. END REPEAT. That won't completely work, and will have to be tweaked, if . There are any variables in one file but not the other. . There are any variables that are character in one file and numeric in the other. . The same variables occur in both files, but not in the same order. (This is the most subtle, as the code could run without errors but give badly wrong results.) Otherwise, it will work fine. The values from File_B will be in the corresponding variables in File_A, with the names and attributes of the File_A variables. As before, drop the File_B variables when and as you like. -Cheers and good luck, Richard |
At 12:05 PM 12/8/2006, Richard Ristow wrote:
>At 03:07 PM 12/8/2006, Bob Schacht wrote: > >>At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my previous >>question: >> >>>Suppose you have File_A and File_B. Suppose that you want >>>to keep the variable names and attributes as they are in File_A. >>> >>>Then, >>> >>>ADD FILES >>> /FILE=File_A >>> /FILE=File_B. >>> >>>. . . For variables that don't have identical types and lengths, after the >>>ADD FILES command [copy values using COMPUTE statements such as COMPUTE >>>A_Ques2 = B_Ques2.] >> >>Thanks for these suggestions. >>Can these be listed in any order? > >Sure. You have to be careful about the order you want your variables in. >But if all the quantities you want exist in File_A, and the variables in >File_A are in the order you want, that will take care of itself. > >>Could I do the whole thing just with COMPUTE statements? i.e., instead of >>RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3). >>Could I use instead >> >>COMPUTE A_Ques1 = B_Ques1 >>COMPUTE A_Ques3 = B_Ques3 > >Again, sure you could. Doing it with COMPUTEs leaves variables B_Ques1 and >B_Ques3 in the combined file; doing it with RENAME drops those variables. >But that's likely not a large drawback, since you'll probably be dropping >variables anyway. I was at first confused, because in the RENAME statement, FILE_A variable names are on the *RIGHT* of the equal sign, whereas in the COMPUTE statements, they're on the LEFT side of the = sign. I think I understand the implications now. My next question is about listing RENAME and COMPUTE statements in the same (?) FILE subcommand. Or... should the COMPUTE statements all follow the FILE_A subcommand, rather than the FILE_B subcommand? To use your examples in more fully fleshed out form, should I use >>>ADD FILES >>> /FILE=File_A/COMPUTE A_Ques3 = B_Ques3 >>> /FILE=File_B/RENAME=(B_Ques1 = A_Ques1). or should I use ADD FILES /FILE=File_A /FILE=File_B/RENAME=(B_Ques1 = A_Ques1) /COMPUTE A_Ques3 = B_Ques3. Thanks, Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
There's a 'whoops', an IMPORTANT point that I missed, near the end of
this posting; I may have caused you trouble by omitting it. At 05:06 PM 12/12/2006, Bob Schacht wrote: >>>Could I do the whole thing just with COMPUTE statements? i.e., >>>instead of >>>RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3). >>>Could I use instead >>> >>>COMPUTE A_Ques1 = B_Ques1 >>>COMPUTE A_Ques3 = B_Ques3 >> >>Again, sure you could. > >My next question is about listing RENAME and COMPUTE statements in the >same >(?) FILE subcommand. Or... should the COMPUTE statements all follow >the FILE_A subcommand, rather than the FILE_B subcommand? OK, and I'll say more below, but: /RENAME is a subcommand of ADD FILES; each /RENAME belongs to a particular /FILE on the list; but, COMPUTE is a separate statement, and has to follow the whole of the ADD FILES command. >To use your examples in more fully fleshed out form, should I use > >ADD FILES > /FILE=File_A/COMPUTE A_Ques3 = B_Ques3 > /FILE=File_B/RENAME=(B_Ques1 = A_Ques1). >or >ADD FILES > /FILE=File_A > /FILE=File_B/RENAME=(B_Ques1 = A_Ques1) > /COMPUTE A_Ques3 = B_Ques3. Neither. In both cases, you're writing /COMPUTE as a subcommand of ADD FILES. You'd write, ADD FILES /FILE=File_A /FILE=File_B/RENAME=(B_Ques1 = A_Ques1). COMPUTE A_Ques3 = B_Ques3. That is, the COMPUTE is an entirely separate statement. (I've left a blank line, to make that clear.) It's a transformation statement that is part of the transformation program that follows the ADD FILES, and it acts on the data from the files as combined by the ADD FILES. *** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from the File_B variables to the File_A variables *if* the data is in the File_B variables. If it's already in the File_A variables, those COMPUTEs will wipe out the good data. So you have to do the COMPUTEs *only* for records that came from File_B. Like this: ADD FILES /FILE=File_A /FILE=File_B/IN=B_Record /* New subcommand /RENAME=(B_Ques1 = A_Ques1). DO IF B_Record EQ 1. . COMPUTE A_Ques3 = B_Ques3. END IF. That adds two things: . The /IN subcommand, which applies to File_B, creates variable B_Record, which is 1 in records that come from File_B, and 0 in records that come from File_A. . The DO IF, which tests variable B_Record, and executes the COMPUTEs only for records that come from File_B. You can have all the COMPUTEs within a single DO IF, and probably should: DO IF B_Record EQ 1. . COMPUTE A_Ques3 = B_Ques3. . COMPUTE A_Ques4 = B_QUES4. END IF. And so forth. And you see I've written . COMPUTE A_Ques3 = B_Ques3. instead of COMPUTE A_Ques3 = B_Ques3. They're both valid. The first is 'pseudo-indented'; i.e., the command starts in the first space on the line, but the keyword COMPUTE is set back. I use pseudo-indents inside constructs like DO IF, because I think it makes the code easier to read. -Good luck, Richard |
At 02:18 PM 12/12/2006, Richard Ristow wrote:
>There's a 'whoops', an IMPORTANT point that I missed, near the end of this >posting; I may have caused you trouble by omitting it. >. . . COMPUTE is a separate statement, and has to follow the whole of the >ADD FILES command. > >. . . You'd write, > >ADD FILES > /FILE=File_A > /FILE=File_B/RENAME=(B_Ques1 = A_Ques1). > >COMPUTE A_Ques3 = B_Ques3. > >That is, the COMPUTE is an entirely separate statement. (I've left a blank >line, to make that clear.) It's a transformation statement that is part of >the transformation program that follows the ADD FILES, and it acts on the >data from the files as combined by the ADD FILES. Good point. Thanks for pointing that out. >*** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from >the File_B variables to the File_A variables *if* the data is in the >File_B variables. If it's already in the File_A variables, those COMPUTEs >will wipe out the good data. Point well taken! >So you have to do the COMPUTEs *only* for records that came from File_B. >Like this: > >ADD FILES > /FILE=File_A > /FILE=File_B/IN=B_Record /* New subcommand > /RENAME=(B_Ques1 = A_Ques1). > >DO IF B_Record EQ 1. >. COMPUTE A_Ques3 = B_Ques3. >END IF. > >That adds two things: >. The /IN subcommand, which applies to File_B, creates variable B_Record, >which is 1 in records that come from File_B, and 0 in records that come >from File_A. Thanks for this suggestion. I may already have a variable that can function in that way. >. The DO IF, which tests variable B_Record, and executes the COMPUTEs only >for records that come from File_B. > >You can have all the COMPUTEs within a single DO IF, and probably should: > >DO IF B_Record EQ 1. >. COMPUTE A_Ques3 = B_Ques3. >. COMPUTE A_Ques4 = B_QUES4. >END IF. Yes, I agree. What this means, in effect, is that when variables that are supposed to match don't match in type or length, when I merge the files, the ADD command should add the (somewhat redundant) variables (i.e., all the ones that are not renamed, which will be most of them) as well as add on the new cases. I then use the COMPUTE statements, as above, to correctly fill in all the variables in File_A, and then I can DROP all of the duplicate variables from originating from File_B. Right? Thanks, Bob |
At 08:57 PM 12/12/2006, Bob Schacht wrote:
>>. The DO IF, which tests variable B_Record, and executes the COMPUTEs >>only for records that come from File_B. >> >>You can have all the COMPUTEs within a single DO IF, and probably >>should: >> >>DO IF B_Record EQ 1. >>. COMPUTE A_Ques3 = B_Ques3. >>. COMPUTE A_Ques4 = B_QUES4. >>END IF. > >[So] when variables that are supposed to match don't match in type or >length, when I merge the files, the ADD command should add the >(somewhat redundant) variables (i.e., all the ones that are not >renamed, which will be most of them) as well as add on the new cases. Yes. That's automatic with ADD FILES: all variables from both files are kept, except those explicitly DROPped. >I then use the COMPUTE statements, as above, to correctly fill in all >the variables in File_A, and then I can DROP all of the duplicate >variables from originating from File_B. Right? You've got it. Good luck to you! Richard |
In reply to this post by Richard Ristow
At 02:18 PM 12/12/2006, Richard Ristow wrote:
[snip] >*** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from >the File_B variables to the File_A variables *if* the data is in the >File_B variables. If it's already in the File_A variables, those COMPUTEs >will wipe out the good data. So you have to do the COMPUTEs *only* for >records that came from File_B. Like this: > >ADD FILES > /FILE=File_A > /FILE=File_B/IN=B_Record /* New subcommand > /RENAME=(B_Ques1 = A_Ques1). Richard, OK, what I actually wrote was ADD FILES /FILE=Needs_Assessment_rev2 /FILE=FinalComplete /IN=Zoomrang /RENAME=(Question4Ididntgetanyhelpforthis= Serv_2a) etc. But this earned me the following error message: > >Error # 5109 in column 1. Text: > >The first subcommand is not FILE or TABLE. The entire command is skipped. > >This command not executed. So, what am I doing wrong here? Thanks, Bob |
At 04:27 PM 12/18/2006, Bob Schacht wrote:
>OK, what I actually wrote was > >ADD FILES > /FILE=Needs_Assessment_rev2 > /FILE=FinalComplete /IN=Zoomrang > /RENAME=(Question4Ididntgetanyhelpforthis= Serv_2a) >etc. > >But this earned me the following error message: >>Error # 5109 in column 1. Text: >>The first subcommand is not FILE or TABLE. The entire command is >>skipped. >>This command not executed. > >So, what am I doing wrong here? I'm not sure. It seems to work for me. Below is SPSS draft output. I haven't put in any of the COMPUTE statements. * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2006d . * \2006-12-18 Schacht - Merging databases by renaming variables.SPS. * In response to (follow-up) posting . * Date: Mon, 18 Dec 2006 11:27:46 -1000 . * From: Bob Schacht <[hidden email]> . * Subject: Re: Merging databases by renaming variables . * To: [hidden email] . * Syntax error using ADD FILES. . * =================== . * APPENDIX: TEST DATA . * =================== . * Uses 'dataset' data (SPSS 14.0 and above). NEW FILE. * "File_A" .......... . DATA LIST LIST /ID (F3) A_Ques1 (A8) A_Ques2 (F2) A_Ques3 (A10) A_Ques4 (F2). BEGIN DATA 001 Agree 17 Why_not? 42 003 Not_Me 02 Maybe 99 END DATA. DATASET NAME Needs_Assessment_rev2. Dataset Name |-----------------------------|---------------------------| |Output Created |19-DEC-2006 03:05:07 | |-----------------------------|---------------------------| LIST. List |-----------------------------|---------------------------| |Output Created |19-DEC-2006 03:05:07 | |-----------------------------|---------------------------| [Needs_Assessment_rev2] ID A_Ques1 A_Ques2 A_Ques3 A_Ques4 1 Agree 17 Why_not? 42 3 Not_Me 2 Maybe 99 Number of cases read: 2 Number of cases listed: 2 NEW FILE. * "File_B" .......... . DATA LIST LIST /ID (F3) B_Ques1 (A8) B_Ques2 (F2) B_Ques3 (A10) B_Ques4 (F4). BEGIN DATA 002 Uh_uh 07 Nevermore! 451 004 Cool! 13 Yeah_man 987 END DATA. DATASET NAME FinalComplete. Dataset Name |-----------------------------|---------------------------| |Output Created |19-DEC-2006 03:05:08 | |-----------------------------|---------------------------| LIST. List |-----------------------------|---------------------------| |Output Created |19-DEC-2006 03:05:08 | |-----------------------------|---------------------------| [FinalComplete] ID B_Ques1 B_Ques2 B_Ques3 B_Ques4 2 Uh_uh 7 Nevermore! 451 4 Cool! 13 Yeah_man 987 Number of cases read: 2 Number of cases listed: 2 * .......... The ADD FILES ................ . ADD FILES /FILE=Needs_Assessment_rev2 /FILE=FinalComplete /IN=Zoomrang /RENAME=(B_Ques1 = A_Ques1). LIST. List |-----------------------------|---------------------------| |Output Created |19-DEC-2006 03:05:24 | |-----------------------------|---------------------------| ID A_Ques1 A_Ques2 A_Ques3 A_Ques4 B_Ques2 B_Ques3 B_Ques4 Zoomrang 1 Agree 17 Why_not? 42 . . 0 3 Not_Me 2 Maybe 99 . . 0 2 Uh_uh . . 7 Nevermore! 451 1 4 Cool! . . 13 Yeah_man 987 1 Number of cases read: 4 Number of cases listed: 4 |
Free forum by Nabble | Edit this page |