SPSSX Discussion

Re: reading pdf files directly via spss

Classic

List

Threaded

3 messages Options

Jon Peck

Re: reading pdf files directly via spss

No, PDF files cannot be read directly into Statistics, but there are various converters available (free and fee). See

https://pdf.iskysoft.com/pdf-converter/list-of-best-pdf-to-excel-converter.html

for some.

Also Microsoft Word can read pdf files, which you can then save in a format that Statistics can read.

On Mon, Jan 14, 2019 at 9:25 AM Maguin, Eugene <[hidden email]> wrote:

Is it possible for spss to read pdf files directly as in a get files command or a data list command? Without looking, I suspect not but might there be some code that would do so and would run through spss, like a python routine?

If not that, then are there any options for this?

Thanks, Gene Maguin

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: reading pdf files directly via spss

I'll look into those. They may help with data from applicants for
scholarships from our retirement community.
Right now we transcribe from paper to Excel.

WordPerfect can also read many kinds of PDF files.

WordPerfect has built in ways to create fillable PDF forms.
However, it does not have a way to extract only the filled in data to a
comma, tab, or pipe separated file.

I was wondering about 2 possible Python ideas.
(1) compare the unfilled PDF form against each of a pile of filled in PDF
forms and extract the filled in data into something that can be read into
SPSS via something like comma, tab, or pipe separated fields. Comma
separated would likely need quote marks around fields.

(2) Suppose I create a fillable PDF forms and put in something ugly like
{{filled in stuff}} where there is something like double curly brackets
around filled in fields.
Could Python extract the contents between pairs of brackets into a set of
fields with one line per returned form.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Art Kendall-4

Re: reading pdf files directly via spss

I'll check on that.

On 1/14/2019 3:51 PM, Jon Peck wrote:

There is a Python library called PyPDF2 that reads PDF files. I have not used it, but I expect that it could do the job. Probably easier if the PDF file has some invisible markers around the data fields. I would need some examples of the PDF files in question to see what could easily be done.

On Mon, Jan 14, 2019 at 1:28 PM Art Kendall <[hidden email]> wrote:

I'll look into those. They may help with data from applicants for
scholarships from our retirement community.
Right now we transcribe from paper to Excel.

WordPerfect can also read many kinds of PDF files.

WordPerfect has built in ways to create fillable PDF forms.
However, it does not have a way to extract only the filled in data to a
comma, tab, or pipe separated file.

I was wondering about 2 possible Python ideas.
(1) compare the unfilled PDF form against each of a pile of filled in PDF
forms and extract the filled in data into something that can be read into
SPSS via something like comma, tab, or pipe separated fields. Comma
separated would likely need quote marks around fields.

(2) Suppose I create a fillable PDF forms and put in something ugly like
{{filled in stuff}} where there is something like double curly brackets
around filled in fields.
Could Python extract the contents between pairs of brackets into a set of
fields with one line per returned form.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--

Jon K Peck
[hidden email]