Creating new variable based on blank rows as delineators

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating new variable based on blank rows as delineators

Nancy Rusinak-2
I have a client who receives an Excel data set and she has no control over how it comes to her. The data comes to her in one spreadsheet. It contains different groups separated by blank rows. Is there a way to use syntax/Python/R to import the data and then create a new column called group based on where there are blank rows to delineate where one group stops and another starts? So, if my first 5 rows are followed by a blank line, I'd like those first 5 rows to be Group 1.

Comes this way:

Bob   5
Sam  6

Lou   4
Pam  3

Want:

1  Bob   5
1  Sam  6
2  Lou   4
2  Pam  3

Thanks to all for being willing to assist.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating new variable based on blank rows as delineators

Bruce Weaver
Administrator
I created a small Excel file to mimic what you describe, but added a bit more
data than in your example.  

NEW FILE.
DATASET CLOSE ALL.
GET DATA
  /TYPE=XLSX
  /FILE='C:\Temp\Junk.xlsx'
  /SHEET=name 'Sheet1'
  /CELLRANGE=FULL
  /READNAMES=ON
  /DATATYPEMIN PERCENTAGE=95.0
  /HIDDEN IGNORE=YES.

LIST.

Here is the output from that LIST command:

name     x
 
Bob      5
Sam      6
         .
Lou      4
Pam      3
         .
Alphonse 7
Gaston   2
Tom      3
Jerry    8


Now the rest of the code.  Notice the use of MISSING and LAG() in the
COMPUTE command.


NUMERIC Group (F5.0).
COMPUTE Group = SUM(MISSING(x),LAG(Group)).
* Rare situation where EXECUTE is needed.
* If I omit it, Group = 0 on all rows.
EXECUTE.
SELECT IF NOT MISSING(x).
LIST Group name x.


Here is the output from that last LIST command:

Group name     x
 
    0 Bob      5
    0 Sam      6
    1 Lou      4
    1 Pam      3
    2 Alphonse 7
    2 Gaston   2
    2 Tom      3
    2 Jerry    8

If you're okay with the first group being Group 0, you're done.  If you want
the first group to be Group 1, add 1 to group.  


* If you're happy with groups from 0-2, stop here.
* If you need groups 1-3, add 1 to group.
COMPUTE Group = Group + 1.
LIST Group name x.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).