Creating a new file for a subset of cases/IDs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a new file for a subset of cases/IDs

Oliver
Hi everyone,

I have a dataset (Dataset 1) with 3000 different cases (i.e., IDs) and let's
say, two variables (y1, y2). I would like to create a new dataset (Dataset
2) that includes data for y1 and y2, but only for a subset of IDs from
Dataset 1.  I've created Dataset 2 with only the subset (i.e., n = 1500) of
IDs of interest and then tried to use the "Merge file-->Add variables", but
the newly created dataset (i.e., Dataset 2) always generates 3000 cases, not
just the 1500 that I need.

Any assistance would be greatly appreciated.
Thanks in advance.
O.



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new file for a subset of cases/IDs

Alejandro González Heras
Hi Oliver,

I don't fully understand why you try to add variables after creating the subset that is of your interest. Could you please elaborate that? Consider that you are subsetting cases and then you are trying to add variables (and not cases)

What you need is to create a subset based on some criteria, is this correct?

Have you tried the SELECT IF command? I believe that is the name for the command that selects cases based on any variable criteria and deletes the non selected cases. Through the menu, you could go to "filter cases by" and click on "delete non selected cases", if I'm remembering it correctly


All the best,
A


El 5 abr. 2021 20:09, Oliver <[hidden email]> escribió:
Hi everyone,

I have a dataset (Dataset 1) with 3000 different cases (i.e., IDs) and let's
say, two variables (y1, y2). I would like to create a new dataset (Dataset
2) that includes data for y1 and y2, but only for a subset of IDs from
Dataset 1.  I've created Dataset 2 with only the subset (i.e., n = 1500) of
IDs of interest and then tried to use the "Merge file-->Add variables", but
the newly created dataset (i.e., Dataset 2) always generates 3000 cases, not
just the 1500 that I need.

Any assistance would be greatly appreciated.
Thanks in advance.
O.



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new file for a subset of cases/IDs

Jon Peck
In reply to this post by Oliver
Here is a simple solution using the SPSSINC TRANS extension command, which you can install if you don't already have it from the Extensions > Extension Hub menu.

Suppose you have a dataset named subset containing the cases  you want to select from and an id variable named id in both datasets.  I'll assume that it is numeric, but if it is a string that's an easy adjustment.
Then, with the main dataset active, run this command

spssinc trans result=insubset
/initial "extendedTransforms.vlookup ('id','id','subset')"
/formula func(id).

This produces a variable named insubset that will be the id number if the case is in the subset dataset and otherwise system missing.  So then you can just select on a not missing condition for that variable in the main dataset
(and drop any variables you don't want).  Just be sure to save this under a different file name so that you don't lose the main data.

On Mon, Apr 5, 2021 at 12:09 PM Oliver <[hidden email]> wrote:
Hi everyone,

I have a dataset (Dataset 1) with 3000 different cases (i.e., IDs) and let's
say, two variables (y1, y2). I would like to create a new dataset (Dataset
2) that includes data for y1 and y2, but only for a subset of IDs from
Dataset 1.  I've created Dataset 2 with only the subset (i.e., n = 1500) of
IDs of interest and then tried to use the "Merge file-->Add variables", but
the newly created dataset (i.e., Dataset 2) always generates 3000 cases, not
just the 1500 that I need.

Any assistance would be greatly appreciated.
Thanks in advance.
O.



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Creating a new file for a subset of cases/IDs

MLIves
In reply to this post by Oliver
Hi Oliver,

Sounds like you should look at the /in= or the /table= subcommands for Match files (merge files--add variables).
If you include /in=inDataset2 after naming dataset2, the resulting file will be 0 for the records you don't want and 1 for those in Dataset2 that you do want. So in the resulting file, you would need one more step:
        Select if inDataset2.

/Table could be used to find and keep only those in the lookup table (Dataset2).

Melissa
-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Oliver
Sent: Monday, April 5, 2021 2:10 PM
To: [hidden email]
Subject: [SPSSX-L] Creating a new file for a subset of cases/IDs

EXTERNAL EMAIL: This email originated from outside of the organization. Do not click any links or open any attachments unless you trust the sender and know the content is safe.

Hi everyone,

I have a dataset (Dataset 1) with 3000 different cases (i.e., IDs) and let's say, two variables (y1, y2). I would like to create a new dataset (Dataset
2) that includes data for y1 and y2, but only for a subset of IDs from Dataset 1.  I've created Dataset 2 with only the subset (i.e., n = 1500) of IDs of interest and then tried to use the "Merge file-->Add variables", but the newly created dataset (i.e., Dataset 2) always generates 3000 cases, not just the 1500 that I need.

Any assistance would be greatly appreciated.
Thanks in advance.
O.



--
Sent from: https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspssx-discussion.1045642.n5.nabble.com%2F&amp;data=04%7C01%7CMelissa.Ives%40ct.gov%7C0c62d938816b4a15f6f908d8f85e02ff%7C118b7cfaa3dd48b9b02631ff69bb738b%7C0%7C0%7C637532429994861101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QFcmI1wM9kq3LTpQTkA1e16lrjzcV%2F9%2F7AYrxjEevmo%3D&amp;reserved=0

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

________________________________

This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD