Ok – so I have several large CTables runs –in the order of 500 -1000 tables each .. I run this and I can use SPSSINC MODIFY TABLES hide footnotes (to do just that) .. but it takes anywhere from 20 minutes to ½ hour to run it on each file – and if I need to run 30 decks of these you can see the time is enormous .. I have found its faster for me to export as HTML and search and replace the superscripts by searching for the tags, ie., <sup> text </sup> - now, there are various forms it takes so while this takes some time – its still only 5-10 minutes (so maybe a 1/3 the time as the SPSS script) I am thinking – maybe a python script that would open the HTML (its only text) and search for the tag <sup> and then delete it and everything up until and including its finds the end tag </sup> might be faster still – the HTML files are about 30 MB. Any comments? Thanks! |
MODIFY TABLES uses the scripting apis, and it can be slow, although this depends a lot on the table sizes and other factors. However, if you export the html, the following code will strip all the superscripts. Set inputdir to the directory holding the files to convert and outputdir to a directory to hold the converted files. The output directory must already exist. The search expression uses non-greedy matching. Otherwise you would remove everything from the first <sup> through the last </sup>. It uses the re.M flag so that the search will match across lines import re, glob, os inputdir = "c:/temp" outputdir = "c:/tempout" for f in glob.glob(inputdir + os.sep + "*.htm"): with open(f) as input, open(outputdir + os.sep + os.path.basename(f), "wb") as output: inputs = input.read() inputs = re.sub(r"<sup>.*?</sup>", "", inputs, flags=re.M) output.write(inputs) print "done" On Wed, Jun 8, 2016 at 12:53 PM, Timothy Hennigar <[hidden email]> wrote:
|
Wow- that’s super fast .. (only a few seconds) .. FANTASTIK (I can likely modify that for other uses also!) Thanks! ********************************* Notice: This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, do not use the information, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. Email transmissions cannot be guaranteed to be secure or error free. The sender therefore does not accept any liability for errors or omissions in the contents of this message that arise as a result of email transmissions. From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck MODIFY TABLES uses the scripting apis, and it can be slow, although this depends a lot on the table sizes and other factors. However, if you export the html, the following code will strip all the superscripts. Set inputdir to the directory holding the files to convert and outputdir to a directory to hold the converted files. The output directory must already exist. The search expression uses non-greedy matching. Otherwise you would remove everything from the first <sup> through the last </sup>. It uses the re.M flag so that the search will match across lines import re, glob, os inputdir = "c:/temp" outputdir = "c:/tempout" for f in glob.glob(inputdir + os.sep + "*.htm"): with open(f) as input, open(outputdir + os.sep + os.path.basename(f), "wb") as output: inputs = input.read() inputs = re.sub(r"<sup>.*?</sup>", "", inputs, flags=re.M) output.write(inputs) print "done" On Wed, Jun 8, 2016 at 12:53 PM, Timothy Hennigar <[hidden email]> wrote: Ok – so I have several large CTables runs –in the order of 500 -1000 tables each .. I run this and I can use SPSSINC MODIFY TABLES hide footnotes (to do just that) .. but it takes anywhere from 20 minutes to ½ hour to run it on each file – and if I need to run 30 decks of these you can see the time is enormous .. I have found its faster for me to export as HTML and search and replace the superscripts by searching for the tags, ie., <sup> text </sup> - now, there are various forms it takes so while this takes some time – its still only 5-10 minutes (so maybe a 1/3 the time as the SPSS script) I am thinking – maybe a python script that would open the HTML (its only text) and search for the tag <sup> and then delete it and everything up until and including its finds the end tag </sup> might be faster still – the HTML files are about 30 MB. Any comments? Thanks! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |