|
Dear Experts,
Iâm facing a problem with a string variable which Iâd like to transfer into dummy variables (1/0). The string variable is actually a comment containing various names of actors âMike Miller, Steven Baldwin, Hans Meierâ . Now Iâd like to create separate dummy variables to measure the influence of each actor on DVD sales volume. The dummy might be coded like VAR1: 1=Mike Miller 0=non Mike Miller; VAR2: 1=Steve Baldwin, 0=non Steve Baldwin and so on. Has anybody an idea how this could be done with SPSS Syntax? Iâve tried various syntax like Ifâ¦EQ but failed because of having three (and more) different actors in one field. Any help is highly appreciated. Thanks, Stephan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Stephan,
Explain something for me. You say: >>The string variable is actually a comment containing various names of actors: Mike Miller, Steven Baldwin, Hans Meier. Are you saying that your dataset has a string variable that contains the name of different actors. For example, Id name 1 Steven Baldwin 2 Hans Meier OR, are you saying that your dataset has a string variable that contains the names of different actors. For example, Id name 1 Mike Miller, Steven Baldwin, Hans Meier 2 George W. Bush, Brad Pitt, Zhao Hu-Jen OR are you saying something else? What's confusing is this. '... a comment containing various names of actors ...' Gene Maguin I’m facing a problem with a string variable which I’d like to transfer into dummy variables (1/0). The string variable is actually a comment containing various names of actors “Mike Miller, Steven Baldwin, Hans Meier†. Now I’d like to create separate dummy variables to measure the influence of each actor on DVD sales volume. The dummy might be coded like VAR1: 1=Mike Miller 0=non Mike Miller; VAR2: 1=Steve Baldwin, 0=non Steve Baldwin and so on. Has anybody an idea how this could be done with SPSS Syntax? I’ve tried various syntax like If…EQ but failed because of having three (and more) different actors in one field. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Stephan-36
Dear Gene,
thanks for the response. The latter is the case, sorry for any confusion. There's a string variable that contains names of various actors which I'd like to extract separately. Here an excerpt from the data set which shows sample ID and string variable ACTOR ID---------String VAR ACTOR IDMovie1; 'Mike Miller, Steven Baldwin, Hans Meier'; IDMovie2; 'Mike Miller, Jessica Alba, Hans Meier'; IDMovie3; 'Mike Miller, Steven Baldwin and Hans Meier'; IDMovie4; 'Mike Miller, Steven Baldwin plus Hans Meier'; IDMovie5; 'Mike Miller, Steven Baldwin and as supporting actor Hans Meier'; IDMovie6; 'Emil Nolde and Catherine Deneuve'; I'd like to create separate dummies for each actor to measure the effect of dummy-actor on DVD sales volume. Thanks for your help. Regards, Stephan Stephan, Explain something for me. You say: >>The string variable is actually a comment containing various names of actors: Mike Miller, Steven Baldwin, Hans Meier. Are you saying that your dataset has a string variable that contains the name of different actors. For example, Id name 1 Steven Baldwin 2 Hans Meier OR, are you saying that your dataset has a string variable that contains the names of different actors. For example, Id name 1 Mike Miller, Steven Baldwin, Hans Meier 2 George W. Bush, Brad Pitt, Zhao Hu-Jen OR are you saying something else? What's confusing is this. '... a comment containing various names of actors ...' Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Stephan,
Thanks, that helps. Another thing needed is what the results is supposed to look like. I've taken your example data and shortened lines by replacing names with initials. So, is this what the result of the yet-unwritten transformation should be? A1, A2, and A3 are the actor names. ID ACTOR A1 A2 A3 1; 'MM, SB, HM'; MM SB HM 2; 'MM, JA, HM'; MM JA HM 3; 'MM, SB and HM'; MM SB HM 4; 'MM, SB plus HM'; MM SB HM 5; 'MM, SB and as supporting actor HM'; MM SB HM 6; 'EN and CD'; EN CD Details. 1) Do you have a text file of the dataset or is it only an spss datafile? I ask because I want to know if the text strings are really enlosed by apostrophes (') in the dataset and if the values of ID and Actor really separated by semi-colons. 2) The thing that makes this project difficult is lack of structure for values in the Actor variable. If every name were comma-separated, that would be easy to search for commas. However, you show that a wide range of text separates names. That fact makes the job hard, very hard. If the dataset is a text file, I would bring the file into a text editor and search and replace, for instance, ' and ' with ', '. >>I'd like to create separate dummies for each actor to measure the effect of dummy-actor on DVD sales volume. I don't know what you mean here. Let's take the restructured data from above. This is now how it will look in spss. I now want to add the dummy variables. I'm going to assume that you have also made a master list of actors and have also identfied how many actors are in each movie. So the rule is that there are as many dummies as actors listed (6 in this example: MM, SB, HM, JA, EN, CD) and the value is 1 if the actor is listed, 0 if not. So: ID A1 A2 A3 D1 D2 D3 D4 D5 D6 1 MM SB HM 1 1 1 0 0 0 2 MM JA HM 1 0 1 1 0 0 3 MM SB HM 1 1 1 0 0 0 4 MM SB HM 1 1 1 0 0 0 5 MM SB HM 1 1 1 0 0 0 6 EN CD 0 0 0 0 1 1 Is this where you want to get to? Gene Maguin ID---------String VAR ACTOR IDMovie1; 'Mike Miller, Steven Baldwin, Hans Meier'; IDMovie2; 'Mike Miller, Jessica Alba, Hans Meier'; IDMovie3; 'Mike Miller, Steven Baldwin and Hans Meier'; IDMovie4; 'Mike Miller, Steven Baldwin plus Hans Meier'; IDMovie5; 'Mike Miller, Steven Baldwin and as supporting actor Hans Meier'; IDMovie6; 'Emil Nolde and Catherine Deneuve'; ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Stephan-36
Hi
I have another suggestion on this. Have you considered using the Internet Movie Database? http://www.imdb.com/ This probably has the information you need, although of course it is not directly available in the form you want (eg as download of the relevant database fields). However, given that people have developed various applications (eg the Oracle of Bacon) based, I believe (not checked), on databases derived from the IMD data, maybe you could ask IMD for a download. On the other hand, you may, with Gene's help, have solved the decoding of your strings by now. regards Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Stephan-36
Over the weekend, I was writing Get Data statements for nondelimited, i.e.,
fixed, records and found that that statement counts column 1 as column 0. The example in the syntax reference shows an example with 0 as the starting column. But, so far as I noticed there was no explanation of this Get Data idiosyncracy. So far as I know, Data list would not allow a column of 0. So my question is this. Why was the Get Data code written to define column 1 as column 0? Even though I have an opinion about this, I'm just curious as to the explanation. I also think that a sentence needs to be added to Get data documentation to point out that the first column is actually the zeroth column because it's not obvious. Will Data list be changed to conform to the '1=0' structure? Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
There may not be an explanation of the difference, but the difference between the commands is documented. The topic "VARIABLES Subcommand for ARRANGEMENT = FIXED" contains the following statement:
"Column numbering starts with 0, not 1 (in contrast to DATA LIST)." As far as I know, there are no plans to change the convention used by either command, since changing the behavior of Data List to match Get Data (or vice-versa) would break existing jobs. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Monday, December 01, 2008 8:38 AM To: [hidden email] Subject: Column specification in get data Over the weekend, I was writing Get Data statements for nondelimited, i.e., fixed, records and found that that statement counts column 1 as column 0. The example in the syntax reference shows an example with 0 as the starting column. But, so far as I noticed there was no explanation of this Get Data idiosyncracy. So far as I know, Data list would not allow a column of 0. So my question is this. Why was the Get Data code written to define column 1 as column 0? Even though I have an opinion about this, I'm just curious as to the explanation. I also think that a sentence needs to be added to Get data documentation to point out that the first column is actually the zeroth column because it's not obvious. Will Data list be changed to conform to the '1=0' structure? Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
As Rick Oliver has commented, this difference is documented. As he was too polite to say, though, this was a goof. When GET DATA was designed, it was implicit that the first column would be numbered 1 as with DATA LIST, but the programmer who implemented this, being a C programmer not aware of this convention, just assumed 0, and this was not caught until after the command was released. At that point, it couldn't be fixed as it would have broken existing usage.
Since the syntax is usually pasted from the dialog, most users would never notice, but it is an unintentional inconsistency. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Monday, December 01, 2008 7:38 AM To: [hidden email] Subject: [SPSSX-L] Column specification in get data Over the weekend, I was writing Get Data statements for nondelimited, i.e., fixed, records and found that that statement counts column 1 as column 0. The example in the syntax reference shows an example with 0 as the starting column. But, so far as I noticed there was no explanation of this Get Data idiosyncracy. So far as I know, Data list would not allow a column of 0. So my question is this. Why was the Get Data code written to define column 1 as column 0? Even though I have an opinion about this, I'm just curious as to the explanation. I also think that a sentence needs to be added to Get data documentation to point out that the first column is actually the zeroth column because it's not obvious. Will Data list be changed to conform to the '1=0' structure? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
