|
I am trying to write syntax to import a worksheet of an Excel file
starting at row 30. The Excel file currently is a template that will ultimately have an unknown number of rows. GET DATA /TYPE=XLS /FILE='D:\Template file.xls' /SHEET=name 'Worksheet 1' /CELLRANGE=range 'A30:BO3000' /READNAMES=on /ASSUMEDSTRWIDTH=32767. I noticed that when I choose the second entry on the CELLRANGE (currently at BO3000) I get an error message if there are fewer rows with data than included in the CELLRANGE. For instance, I will get an error message if the data end in row 2999. However, the data are imported properly if row 3000 is added to the data set. Is there any way to write syntax that can accomodate an unknown number of rows in an Excel worksheet assuming I don't want to start importing at row 1? Thank you, Warren Chrusciel Municipal Property Assessment Corporation 1420 Blair Place, Suite 300 Ottawa, Ontario K1J 9L8 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
|
In reply to this post by Warren Chrusciel
Warren-
I believe that the only way to deal with an unknown number of rows is to use /CELLRANGE=FULL -- but that starts that data import at row 1 of course. What version of SPSS are you using? One idea, though not elegant, is to brute force it together and set the formats using the ALTER TYPE syntax -- which was introduced in version 16. Two additional pieces to help the 'durability' of your code would be to use the FILE HANDLE command and to change the SHEET subcommand. By using the FILE HANDLE command, the GET DATA syntax is more generalized. By using the /SHEET=INDEX subcommand, it refers to the first worksheet in the Excel file rather than the worksheet named "Worksheet 1" -- that way you don't get an error if someone edits the worksheet name. So here's the syntax: FILE HANDLE TemplateFile /NAME='D:\Template file.xls'. GET DATA /TYPE=XLS /FILE= TemplateFile /SHEET=index 1. That will bring *everything* into Excel. And here's where the brute force begins ... If we brought everything in, then we have a couple of things to fix: 1) The first 29 rows need to be deleted. 2) The variable names need to be set. 3) The variable formats need to be set. 1) The first 29 rows need to be deleted. Here's the syntax that will delete the first 29 rows: COMPUTE case=$casenum. EXECUTE. SELECT IF case>=30. EXECUTE. 2) The variable names need to be set. The variable names will default to V1, V2, etc if there isn't any text in the first row of the Excel file. Suppose that the third column had "STUFF" in the first row and the rest were blank, then the variable names would be V1, V2, STUFF, V4 -- it skips V3 because it found a name. I'm crossing my fingers for you that the first row will have consistent values due to the template. You'll need to customize the syntax below to fit your variables and I've written it a couple different ways to give you examples: * If you just want to paste a column of variables from SPSS and then paste a column of variables from Excel, this is the best bet. The first variable in the list before the equal sign to be renamed to the first variable in the list after the equal sign. Not much effort for you to write, but if someone else looks at the syntax it might be hard to see what the 14th variable would be renamed to. RENAME VARIABLES (V1 V2 = Var1 Var2). * If you want to quickly see that V1 will be Var1, this is the best bet. This makes it easy to see what the 14th variable would be named to (per my above example). RENAME VARIABLES (v1=Var1) (v2=Var2). * If either of the above give you errors (because it applies all the name changes at once), then you use one (it applies the name changes one at a time). RENAME VARIABLES (V1=Var1). RENAME VARIABLES (V2=Var2). 3) The variable formats need to be set. Normally this would be set, but since we're brute forcing it we have to do this manually. And it had the potential to get a bit thorny -- specifically, if you're dealing with dates and times. I am guessing that the first 29 rows certainly have text in them and the only way to syntactically change a string variable to a numeric variable is with the ALTER TYPE command. I've included the syntax that will convert an Excel serial number to an SPSS serial number. For example, if you type "1/1/2010" into Excel and then format it as a number, it would be 40179. I've also included the syntax that will convert an Excel serial time to an SPSS serial time. If the dates or times get imported as text rather than serial numbers ... then it gets thorny because the date conversion is a bit more labor intensive. I've given one example of how to change a text value of "1/1/2010" (mm/dd/yyyy) into an SPSS date -- it doesn't require a leading zero on the month or day. * If you just want to change something to a number. ALTER TYPE Var1 (F8.2). * If the variable has values with a % sign. ALTER TYPE Var1 (PCT8.2). * This will convert an Excel date serial number to an SPSS date serial number, which can then be formatted as a date. compute SPSS_Date_Time = date.mdy(01,01,1900) + Time.days( Excel_Serial_Value - 2 ). formats SPSS_Date_Time (date11). execute. *This will convert an Excel time serial number to an SPSS time serial number, which can then be formatted as time. compute SPSS_Time = Excel_Serial_Value * 24 * 60 * 60. formats SPSS_Time (time5). execute. * Convert a string date (mm/dd/yyyy) to SPSS date. compute mo = number(substr(STRING_DATE,1,index(STRING_DATE,"/")-1),F2.0). compute day = number(substr(STRING_DATE,index(STRING_DATE,"/")+1,index(substr(STRING_D ATE,index(STRING_DATE,"/")+1,50),"/")-1),F2.0). compute year = number(substr(STRING_DATE,char.rindex(STRING_DATE,"/")+1,5),F4.0). compute SPSS_Date = date.mdy(mo, day, year). formats SPSS_Date (date11). exe. Since I'm not sure exactly what your data will look like when it's brought into SPSS, I've tried to cover the road bumps that I typically run into, but that's far from exhaustive. Please let me know if you have any issues and I'll try to help. Good luck! -Eric Background on the date conversion: Excel measures dates as the number of days after Jan 1, 1900 and counts 1900 as if it were a leap year (which it wasn't). That was the easiest way for Microsoft to make Excel compatible with Lotus 1-2-3. SPSS measures dates as the number of seconds since midnight, October 14, 1582 - the first day of the Gregorian Calendar. The first part of the compute generates the number of seconds from Oct 14, 1582 to Jan 1, 1900. The second part generates the number of seconds after Jan 1, 1900 - if Excel correctly assumed that 1900 was not a leap year, then it would only be necessary to subtract 1. For reference, if you have an SPSS serial value that you want to convert to an Excel serial value -- if you copy and paste output from SPSS to Excel. Here's an Excel formula that will give convert it -- you'll still need to format the result as a date (Keyboard shortcut Ctrl + Shift + 3) =(B3/86400)-115859 -----Original Message----- From: Warren Chrusciel [mailto:[hidden email]] Sent: Thursday, May 13, 2010 7:26 AM Subject: Importing Excel file -- unknown number of rows I am trying to write syntax to import a worksheet of an Excel file starting at row 30. The Excel file currently is a template that will ultimately have an unknown number of rows. GET DATA /TYPE=XLS /FILE='D:\Template file.xls' /SHEET=name 'Worksheet 1' /CELLRANGE=range 'A30:BO3000' /READNAMES=on /ASSUMEDSTRWIDTH=32767. I noticed that when I choose the second entry on the CELLRANGE (currently at BO3000) I get an error message if there are fewer rows with data than included in the CELLRANGE. For instance, I will get an error message if the data end in row 2999. However, the data are imported properly if row 3000 is added to the data set. Is there any way to write syntax that can accomodate an unknown number of rows in an Excel worksheet assuming I don't want to start importing at row 1? Thank you, Warren Chrusciel Municipal Property Assessment Corporation 1420 Blair Place, Suite 300 Ottawa, Ontario K1J 9L8 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Warren Chrusciel
We had come up with a âbrute forceâ syntax already. Yours is actually
less âbrute forceâ than ours so I intend to borrow liberally from yours. In particular, I like: 1) The way you deleted the first 29 rows. 2) The systematic way you renamed the variables. 3) The suggestion to use the âALTER TYPEâ command. 4) Your detailed explanation of the way to convert dates from Excel to SPSS. 5) The suggestion to use the /SHEET Index subcommand. Even though our syntax is not yet elegant it is much improved with the changes. Thanks for your assistance. Warren Chrusciel ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
