I have a client who receives an Excel data set and she has no control over how it comes to her. The data comes to her in one spreadsheet. It contains different groups separated by blank rows. Is there a way to use syntax/Python/R to import the data and then create a new column called group based on where there are blank rows to delineate where one group stops and another starts? So, if my first 5 rows are followed by a blank line, I'd like those first 5 rows to be Group 1.
Comes this way: Bob 5 Sam 6 Lou 4 Pam 3 Want: 1 Bob 5 1 Sam 6 2 Lou 4 2 Pam 3 Thanks to all for being willing to assist. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I created a small Excel file to mimic what you describe, but added a bit more
data than in your example. NEW FILE. DATASET CLOSE ALL. GET DATA /TYPE=XLSX /FILE='C:\Temp\Junk.xlsx' /SHEET=name 'Sheet1' /CELLRANGE=FULL /READNAMES=ON /DATATYPEMIN PERCENTAGE=95.0 /HIDDEN IGNORE=YES. LIST. Here is the output from that LIST command: name x Bob 5 Sam 6 . Lou 4 Pam 3 . Alphonse 7 Gaston 2 Tom 3 Jerry 8 Now the rest of the code. Notice the use of MISSING and LAG() in the COMPUTE command. NUMERIC Group (F5.0). COMPUTE Group = SUM(MISSING(x),LAG(Group)). * Rare situation where EXECUTE is needed. * If I omit it, Group = 0 on all rows. EXECUTE. SELECT IF NOT MISSING(x). LIST Group name x. Here is the output from that last LIST command: Group name x 0 Bob 5 0 Sam 6 1 Lou 4 1 Pam 3 2 Alphonse 7 2 Gaston 2 2 Tom 3 2 Jerry 8 If you're okay with the first group being Group 0, you're done. If you want the first group to be Group 1, add 1 to group. * If you're happy with groups from 0-2, stop here. * If you need groups 1-3, add 1 to group. COMPUTE Group = Group + 1. LIST Group name x. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |