Administrator
|
Well, I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data. Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable. TRAIN WRECK!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hey David, you've used the term 'normalize' a number of times in recent days and I'm unclear what exactly you mean by it. Would you educate me a bit? (Gently with the clue stick [probably, :-) is needed here]).
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Monday, December 17, 2012 2:09 PM To: [hidden email] Subject: Re: SPSS running slow Well, I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data. Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable. TRAIN WRECK! GauravSrivastava wrote > it's a bit different .. see below > CaseID Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60 > 1 1 1 > 0 > 2 0 1 > 1 > 3 1 0 > 1 > . . . > . > . . . > . > 800 1 1 > . > > > Regards, > Gaurav > > > On Mon, Dec 17, 2012 at 10:56 PM, David Marso < > david.marso@ > > wrote: > >> That is a completely unmanagable data arrangement. >> -- >> Consider: >> Partially normalized >> CaseID Brand Attrib01....Attrib60. >> 1 1 >> 1 ... >> 1 350 >> 2 1 >> 2 ... >> 2 350 >> ---- >> >> OR Fully normalized. >> >> CASEID Brand Attrib Value >> 1 1 1 00100101 >> 1 ..... >> 1 1 60 00100160 >> .... >> 1 350 60 00135060 >> ... >> 800 350 60 80035060 >> ---------------- >> >> GauravSrivastava wrote >> > Hi Gene, >> > >> > Yes, Variables are 27K but caes are not too much. it's only approx. >> 800. >> > actually my data is in loop with brand (350 brands) vs brands >> > attribute (approx 60). Since it'a a tracker so there are many >> > variable which we >> kept >> > to keep my data consistent. hope this give you a clear picture. >> > >> > Regards, >> > Gaurav >> > >> > >> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene < >> >> > emaguin@ >> >> > > wrote: >> > >> >> Gaurav, **** >> >> >> >> ** ** >> >> >> >> I’m curious about this problem you’re having with your dataset. >> >> Let’s talk about the dataset. Are you saying you have 27,000 >> >> (thousand) variables >> in >> >> the file? How many cases in the file? **** >> >> >> >> ** ** >> >> >> >> Gene Maguin **** >> >> >> >> ** ** >> >> >> >> *From:* SPSSX(r) Discussion [mailto: >> >> > SPSSX-L@.UGA >> >> > ] *On Behalf >> >> Of *I Am Gaurav >> >> *Sent:* Monday, December 17, 2012 6:57 AM >> >> *To:* >> >> > SPSSX-L@.UGA >> >> >> *Subject:* Re: SPSS running slow**** >> >> >> >> ** ** >> >> >> >> Thanks for all your response. **** >> >> >> >> I am not sure if there is any specific requirement with MATCH >> >> FILES syntax. >> >> **** >> >> >> >> I did it easily using **** >> >> >> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"**** >> >> >> >> /KEEP**** >> >> >> >> respid ... till 27K variable.**** >> >> >> >> exe.**** >> >> >> >> ** ** >> >> >> >> Regards,**** >> >> >> >> Gaurav**** >> >> >> >> ** ** >> >> >> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso < >> >> > david.marso@ >> >> > > >> >> wrote:**** >> >> >> >> See item 02 of my horrible practices list. >> >> >> >> GauravSrivastava wrote >> >> > Hi David, >> >> > >> >> > Still I couldn't figure out the problem with my SPSS. I am >> >> > trying to reorder my SPSS file using below syntax: >> >> > MATCH FILES FILE=*/KEEP >> >> > respid ..... till all 27K variable. >> >> > exe. >> >> > >> >> > But My spss is running very slow and running from last 2 hour >> >> > but no outcome. Can you suggest any help? >> >> > >> >> > Regards, >> >> > Gaurav >> >> >> >> >> >> >> >> >> >> >> >> ----- >> >> Please reply to the list and not to my personal email. >> >> Those desiring my consulting or training services please feel free >> >> to email me. >> >> -- >> >> View this message in context: >> >> >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571 >> 6941p5716977.html >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> >> >> ===================== >> >> To manage your subscription to SPSSX-L, send a message to >> >> >> >> > LISTSERV@.UGA >> >> > (not to SPSSX-L), with no body text except the >> >> command. To leave the list, send the command SIGNOFF SPSSX-L For a >> >> list of commands to manage subscriptions, send the command INFO >> >> REFCARD**** >> >> >> >> ** ** >> >> >> >> >> >> >> >> ----- >> Please reply to the list and not to my personal email. >> Those desiring my consulting or training services please feel free to >> email me. >> -- >> View this message in context: >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571 >> 6941p5716989.html Sent from the SPSSX Discussion mailing list archive >> at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command SIGNOFF SPSSX-L For a >> list of commands to manage subscriptions, send the command INFO >> REFCARD >> ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by John F Hall
Dear John,
The square brackets enclosing the second data source indicate that it's optional rather than mandatory. It's explained right at the start of Universals in the FM -which indeed is sometimes more F than other times... Best, Ruben Date: Mon, 17 Dec 2012 13:04:21 +0100 From: [hidden email] Subject: Re: SPSS running slow To: [hidden email] FM not so F then?
John F Hall (Mr)
Email: [hidden email] Website: www.surveyresearch.weebly.com
-----Original Message-----
Using MATCH FILES on a single file is perfectly fine. It can be used to RENAME, DROP, Flag FIRST/LAST occurrences etc.
John F Hall wrote > MATCH FILES needs at least one more file. See page 1141 of the FM. > > > John F Hall (Mr) > > Email: <mailto:
> johnfhall@
> >
> johnfhall@
> Website: <http://surveyresearch.weebly.com/> > www.surveyresearch.weebly.com > > <SNIP FM details>
----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716978.html Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Dear all, So indeed, "long" may be a better idea than "wide" as David already indicated. Best, Ruben P.s. I think "normalization" here refers to Database Normalization, see: http://en.wikipedia.org/wiki/Database_normalization. > Date: Mon, 17 Dec 2012 11:09:17 -0800 > From: [hidden email] > Subject: Re: SPSS running slow > To: [hidden email] > > Well, I had imagined so much and am suggesting you back out of that slow > motion train wreck and think about normalizing the data. Any analysis you > attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually > inoperable. TRAIN WRECK! > > > GauravSrivastava wrote > > it's a bit different .. see below > > CaseID Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60 > > 1 1 1 > > 0 > > 2 0 1 > > 1 > > 3 1 0 > > 1 > > . . . > > . > > . . . > > . > > 800 1 1 > > . > > > > > > Regards, > > Gaurav > > > > > > On Mon, Dec 17, 2012 at 10:56 PM, David Marso < > > > david.marso@ > > > > wrote: > > > >> That is a completely unmanagable data arrangement. > >> -- > >> Consider: > >> Partially normalized > >> CaseID Brand Attrib01....Attrib60. > >> 1 1 > >> 1 ... > >> 1 350 > >> 2 1 > >> 2 ... > >> 2 350 > >> ---- > >> > >> OR Fully normalized. > >> > >> CASEID Brand Attrib Value > >> 1 1 1 00100101 > >> 1 ..... > >> 1 1 60 00100160 > >> .... > >> 1 350 60 00135060 > >> ... > >> 800 350 60 80035060 > >> ---------------- > >> > >> GauravSrivastava wrote > >> > Hi Gene, > >> > > >> > Yes, Variables are 27K but caes are not too much. it's only approx. > >> 800. > >> > actually my data is in loop with brand (350 brands) vs brands attribute > >> > (approx 60). Since it'a a tracker so there are many variable which we > >> kept > >> > to keep my data consistent. hope this give you a clear picture. > >> > > >> > Regards, > >> > Gaurav > >> > > >> > > >> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene < > >> > >> > emaguin@ > >> > >> > > wrote: > >> > > >> >> Gaurav, **** > >> >> > >> >> ** ** > >> >> > >> >> I’m curious about this problem you’re having with your dataset. Let’s > >> >> talk > >> >> about the dataset. Are you saying you have 27,000 (thousand) variables > >> in > >> >> the file? How many cases in the file? **** > >> >> > >> >> ** ** > >> >> > >> >> Gene Maguin **** > >> >> > >> >> ** ** > >> >> > >> >> *From:* SPSSX(r) Discussion [mailto: > >> > >> > SPSSX-L@.UGA > >> > >> > ] *On Behalf > >> >> Of *I Am Gaurav > >> >> *Sent:* Monday, December 17, 2012 6:57 AM > >> >> *To:* > >> > >> > SPSSX-L@.UGA > >> > >> >> *Subject:* Re: SPSS running slow**** > >> >> > >> >> ** ** > >> >> > >> >> Thanks for all your response. **** > >> >> > >> >> I am not sure if there is any specific requirement with MATCH FILES > >> >> syntax. > >> >> **** > >> >> > >> >> I did it easily using **** > >> >> > >> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"**** > >> >> > >> >> /KEEP**** > >> >> > >> >> respid ... till 27K variable.**** > >> >> > >> >> exe.**** > >> >> > >> >> ** ** > >> >> > >> >> Regards,**** > >> >> > >> >> Gaurav**** > >> >> > >> >> ** ** > >> >> > >> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso < > >> > >> > david.marso@ > >> > >> > > > >> >> wrote:**** > >> >> > >> >> See item 02 of my horrible practices list. > >> >> > >> >> GauravSrivastava wrote > >> >> > Hi David, > >> >> > > >> >> > Still I couldn't figure out the problem with my SPSS. I am trying to > >> >> > reorder my SPSS file using below syntax: > >> >> > MATCH FILES FILE=*/KEEP > >> >> > respid ..... till all 27K variable. > >> >> > exe. > >> >> > > >> >> > But My spss is running very slow and running from last 2 hour but no > >> >> > outcome. Can you suggest any help? > >> >> > > >> >> > Regards, > >> >> > Gaurav > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> ----- > >> >> Please reply to the list and not to my personal email. > >> >> Those desiring my consulting or training services please feel free to > >> >> email me. > >> >> -- > >> >> View this message in context: > >> >> > >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716977.html > >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> >> > >> >> ===================== > >> >> To manage your subscription to SPSSX-L, send a message to > >> >> > >> > >> > LISTSERV@.UGA > >> > >> > (not to SPSSX-L), with no body text except the > >> >> command. To leave the list, send the command > >> >> SIGNOFF SPSSX-L > >> >> For a list of commands to manage subscriptions, send the command > >> >> INFO REFCARD**** > >> >> > >> >> ** ** > >> >> > >> > >> > >> > >> > >> > >> ----- > >> Please reply to the list and not to my personal email. > >> Those desiring my consulting or training services please feel free to > >> email me. > >> -- > >> View this message in context: > >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716989.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> > > > LISTSERV@.UGA > > > (not to SPSSX-L), with no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > >> > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to email me. > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Bear in mind that these posts are all quite
old, so the places where you might run into trouble have moved farther
out, especially if you are using a 64-bit version of Statistics, but the
general message still applies - there is a lot of overhead in lugging around
huge numbers of variables, not to mention the maintenance and management
of extremely wide datasets. Narrow is good.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Ruben van den Berg <[hidden email]> To: [hidden email], Date: 12/17/2012 12:51 PM Subject: Re: [SPSSX-L] SPSS running slow Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear all, I think this post may be relevant as well: http://listserv.uga.edu/cgi-bin/wa?A2=ind0506&L=spssx-l&P=45672. Especially the last bit is interesting, as it suggests that vast numbers of variables are more problematic than vast numbers of cases. So indeed, "long" may be a better idea than "wide" as David already indicated. Best, Ruben P.s. I think "normalization" here refers to Database Normalization, see: http://en.wikipedia.org/wiki/Database_normalization. > Date: Mon, 17 Dec 2012 11:09:17 -0800 > From: [hidden email] > Subject: Re: SPSS running slow > To: [hidden email] > > Well, I had imagined so much and am suggesting you back out of that slow > motion train wreck and think about normalizing the data. Any analysis you > attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually > inoperable. TRAIN WRECK! > > > GauravSrivastava wrote > > it's a bit different .. see below > > CaseID Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60 > > 1 1 1 > > 0 > > 2 0 1 > > 1 > > 3 1 0 > > 1 > > . . . > > . > > . . . > > . > > 800 1 1 > > . > > > > > > Regards, > > Gaurav > > > > > > On Mon, Dec 17, 2012 at 10:56 PM, David Marso < > > > david.marso@ > > > > wrote: > > > >> That is a completely unmanagable data arrangement. > >> -- > >> Consider: > >> Partially normalized > >> CaseID Brand Attrib01....Attrib60. > >> 1 1 > >> 1 ... > >> 1 350 > >> 2 1 > >> 2 ... > >> 2 350 > >> ---- > >> > >> OR Fully normalized. > >> > >> CASEID Brand Attrib Value > >> 1 1 1 00100101 > >> 1 ..... > >> 1 1 60 00100160 > >> .... > >> 1 350 60 00135060 > >> ... > >> 800 350 60 80035060 > >> ---------------- > >> > >> GauravSrivastava wrote > >> > Hi Gene, > >> > > >> > Yes, Variables are 27K but caes are not too much. it's only approx. > >> 800. > >> > actually my data is in loop with brand (350 brands) vs brands attribute > >> > (approx 60). Since it'a a tracker so there are many variable which we > >> kept > >> > to keep my data consistent. hope this give you a clear picture. > >> > > >> > Regards, > >> > Gaurav > >> > > >> > > >> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene < > >> > >> > emaguin@ > >> > >> > > wrote: > >> > > >> >> Gaurav, **** > >> >> > >> >> ** ** > >> >> > >> >> I’m curious about this problem you’re having with your dataset. Let’s > >> >> talk > >> >> about the dataset. Are you saying you have 27,000 (thousand) variables > >> in > >> >> the file? How many cases in the file? **** > >> >> > >> >> ** ** > >> >> > >> >> Gene Maguin **** > >> >> > >> >> ** ** > >> >> > >> >> *From:* SPSSX(r) Discussion [mailto: > >> > >> > SPSSX-L@.UGA > >> > >> > ] *On Behalf > >> >> Of *I Am Gaurav > >> >> *Sent:* Monday, December 17, 2012 6:57 AM > >> >> *To:* > >> > >> > SPSSX-L@.UGA > >> > >> >> *Subject:* Re: SPSS running slow**** > >> >> > >> >> ** ** > >> >> > >> >> Thanks for all your response. **** > >> >> > >> >> I am not sure if there is any specific requirement with MATCH FILES > >> >> syntax. > >> >> **** > >> >> > >> >> I did it easily using **** > >> >> > >> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"**** > >> >> > >> >> /KEEP**** > >> >> > >> >> respid ... till 27K variable.**** > >> >> > >> >> exe.**** > >> >> > >> >> ** ** > >> >> > >> >> Regards,**** > >> >> > >> >> Gaurav**** > >> >> > >> >> ** ** > >> >> > >> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso < > >> > >> > david.marso@ > >> > >> > > > >> >> wrote:**** > >> >> > >> >> See item 02 of my horrible practices list. > >> >> > >> >> GauravSrivastava wrote > >> >> > Hi David, > >> >> > > >> >> > Still I couldn't figure out the problem with my SPSS. I am trying to > >> >> > reorder my SPSS file using below syntax: > >> >> > MATCH FILES FILE=*/KEEP > >> >> > respid ..... till all 27K variable. > >> >> > exe. > >> >> > > >> >> > But My spss is running very slow and running from last 2 hour but no > >> >> > outcome. Can you suggest any help? > >> >> > > >> >> > Regards, > >> >> > Gaurav > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> ----- > >> >> Please reply to the list and not to my personal email. > >> >> Those desiring my consulting or training services please feel free to > >> >> email me. > >> >> -- > >> >> View this message in context: > >> >> > >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716977.html > >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> >> > >> >> ===================== > >> >> To manage your subscription to SPSSX-L, send a message to > >> >> > >> > >> > LISTSERV@.UGA > >> > >> > (not to SPSSX-L), with no body text except the > >> >> command. To leave the list, send the command > >> >> SIGNOFF SPSSX-L > >> >> For a list of commands to manage subscriptions, send the command > >> >> INFO REFCARD**** > >> >> > >> >> ** ** > >> >> > >> > >> > >> > >> > >> > >> ----- > >> Please reply to the list and not to my personal email. > >> Those desiring my consulting or training services please feel free to > >> email me. > >> -- > >> View this message in context: > >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716989.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> > > > LISTSERV@.UGA > > > (not to SPSSX-L), with no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > >> > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to email me. > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
In reply to this post by Maguin, Eugene
In this case I mean merely a long rather than a wide data representation.
However onsider a case where there was also information about the question (say sub form of a questionaire). In order to encode that in the current format would require 350 x 60 new completely redundant columns. In the long format one would create a table with 350 rows with the question number as a key. Use a MATCH with a TABLE to associate that additional info. Say there was also subject info (that would be another table with 800 rows . If one simply stores the subID question_number and attributes with the value 1 it is a simple matter to use CASESTOVARS or VECTOR -> AGGREGATE to build out the wide version if and when one might (can't imagine why) require it. Ultimately it depends upon the end use of the data but it is rarely (if ever) a great idea to build out 20K + columns.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by GauravSrivastava
Hi Gaurav, You can learn more about the principles behind normalization, for example, here: http://en.wikipedia.org/wiki/Database_normalization SPSS is not a relational database. However, you should try to normalize the dataset you want to analyze (in database language a dataset is called
for a table). The suggestions given here by other list members are sound: Make one long file (or several). I would also put effort in making good value labels. You have given very little information about what you are trying to achieve with your analysis. It is possible that it would suit better your needs
to create multiple data files. I will give one example of how the analytical interest can guide the process of finding a useful form for the data file: I was once trying to find out which assumptions behind a theory were the most useful. Basically, I wanted to compare 800 cross tables with each
other. I created a process where (one set of assumptions => coding of data => a unique data file with 18 variables of interest => analysis of data => save selected results in a common data file) was looped 40 times, once for each possible combination of assumptions.
In other words, I had 40 temporary data files each consisting of 5000 cases from a survey, while only saving 20 new lines in the common data file. Each case in the common data file was a representation of a cross table: the names of the variables used, the
number of categories in the variables used; the number of categories with less than 20 respondents; a measure of the strength of the relationship between two variables; and a description of the assumptions used. Afterwards, I analyzed the common data file
for patterns that allowed me to make informed decisions about the usefulness of the assumptions. Later the same procedure was replicated using a different survey. (After I was happy with the assumptions and corresponding coding of variables, I created proper
data file with all 250 variables for more typical analyses) An other way to do the same, would have been to have 40 variations of the 18 variables I were using for this analysis, which would have lead to
a data file with 720 variables. However, the coding of data was complicated and isolated into separate syntax files, in order to, ensure that every stage of the analysis uses the same coding, which was hard to achieve with a wide data file. In addition, analyzing
a wide data file would have been hard, even if it has a conceptual simplicity (the cases are the same 5000, even if missing is different and each variation of a variable is present in the data file), because one must be consistent with the assumptions in each
analysis. Thus, out of the 720 variables I would need to find those 18 that can be used together. I am a strong believer in human error: If it is possible to make mistakes, one will. Therefore, all in all, it was much better to create 40 temporary data files,
than one 40 times as wide data file. Best, Eero Olli
Eero Olli phone +47 23 15 73 44 Senior Adviser at Equality- and Anti-Discrimination Ombuds office Mail: Post office box 8048 Dep, 0031 Oslo Visits: Mariboesgate 13, Oslo |
Free forum by Nabble | Edit this page |