Administrator
|
I am currently working on updating a tutorial on merging files. The old version of it showed how one could exit the GUI via PASTE and obtain MATCH FILES syntax. But in v22, exiting the GUI via PASTE gives me STAR JOIN syntax, and I don't see any check-box or whatever that allows me to choose MATCH FILES instead (i.e., no equivalent of the "legacy dialog" you see in other areas of the GUI). So, as my subject line asks, can one still obtain MATCH FILES syntax by pasting from the GUI in v22? If so, how?
Supplementary question: What are the advantages of STAR JOIN over MATCH FILES? Thanks, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
It's incredible how frequently formulating a question leads to finding the answer yourself. Here's what I've discovered about pasting from the GUI for Data > Merge Files > Add Variables:
1. If I presort both datasets on the key variables, and check the box indicating that they are already sorted, PASTE gives me MATCH FILES syntax. 2. If I do not check the box indicating that the datasets are already sorted on the key variables, PASTE gives me STAR JOIN syntax (without SORT commands, because it does not require them). The supplementary question still stands though: What are the main advantages of STAR JOIN over MATCH FILES (besides not needing to sort on the key variables first)?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
For some people, the ability to merge without
pre-sorting is a fairly significant feature, but additionally: "You
can also merge data files based on string keys of different defined lengths
in each file and merge a case data file with multiple table-lookup files
with different keys in each table-lookup file."
The latter feature is not available in the dialog UI because the UI only allows you to merge two data files. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Bruce Weaver <[hidden email]> To: [hidden email] Date: 09/16/2014 01:51 PM Subject: Re: Can MATCH FILES syntax be PASTED from the GUI in v22? Sent by: "SPSSX(r) Discussion" <[hidden email]> It's incredible how frequently formulating a question leads to finding the answer yourself. Here's what I've discovered about pasting from the GUI for Data > Merge Files > Add Variables: 1. If I presort both datasets on the key variables, and check the box indicating that they are already sorted, PASTE gives me MATCH FILES syntax. 2. If I do not check the box indicating that the datasets are already sorted on the key variables, PASTE gives me STAR JOIN syntax (without SORT commands, because it does not require them). The supplementary question still stands though: What are the main advantages of STAR JOIN over MATCH FILES (besides not needing to sort on the key variables first)? Bruce Weaver wrote > I am currently working on updating a tutorial on merging files. The old > version of it showed how one could exit the GUI via PASTE and obtain MATCH > FILES syntax. But in v22, exiting the GUI via PASTE gives me STAR JOIN > syntax, and I don't see any check-box or whatever that allows me to choose > MATCH FILES instead (i.e., no equivalent of the "legacy dialog" you see in > other areas of the GUI). So, as my subject line asks, can one still > obtain MATCH FILES syntax by pasting from the GUI in v22? If so, how? > > Supplementary question: What are the advantages of STAR JOIN over MATCH > FILES? > > Thanks, > Bruce ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Can-MATCH-FILES-syntax-be-PASTED-from-the-GUI-in-v22-tp5727278p5727279.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
STAR JOIN you can match different files on different variables as well (you can even match the same table on different keys). Besides that if someone is more used to SQL the programmer may just prefer STAR JOIN for familiarity.
I think STAR JOIN immediately writes the results to disk, so there is a tradeoff of not needing to sort the variables beforehand, but then forcing a pass of the data. Some experimentation (for anecdote) with fairly large N files (around 10 million+) in my own work SORT then MATCH FILES worked much faster than just the STAR JOIN. (Smaller files this of course won't be a big deal either way.) |
Administrator
|
That's useful info. Thanks Andy. And thanks Rick for your earlier reply.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce,
The motivation for STAR JOIN was the popular Star Schema often used in data bases/data warehouses. Although what Statistics has isn't quit the same, you might want to look at the Wikipedia article on STAR JOIN for some background. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Bruce Weaver <[hidden email]> To: [hidden email] Date: 09/16/2014 01:47 PM Subject: Re: [SPSSX-L] Can MATCH FILES syntax be PASTED from the GUI in v22? Sent by: "SPSSX(r) Discussion" <[hidden email]> That's useful info. Thanks Andy. And thanks Rick for your earlier reply. Andy W wrote > STAR JOIN you can match different files on different variables as well > (you can even match the same table on different keys). Besides that if > someone is more used to SQL the programmer may just prefer STAR JOIN for > familiarity. > > I think STAR JOIN immediately writes the results to disk, so there is a > tradeoff of not needing to sort the variables beforehand, but then forcing > a pass of the data. Some experimentation (for anecdote) with fairly large > N files (around 10 million+) in my own work SORT then MATCH FILES worked > much faster than just the STAR JOIN. (Smaller files this of course won't > be a big deal either way.) ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Can-MATCH-FILES-syntax-be-PASTED-from-the-GUI-in-v22-tp5727278p5727282.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |