SearchWP

This Documentation is for SearchWP Version 3

PDF FAQs

For a detailed look into how PDF parsing works please see this KB article: How does PDF parsing and indexing work?

PDF parsing is arguably the most advanced feature of SearchWP. PDFs can be built in various ways, many of which don’t match any of the established standard formats of PDFs, which can give SearchWP’s PDF parser some trouble.

When SearchWP isn’t able to index a PDF, you will see a message at the top of your screen indicating:

SearchWP failed to index 4 posts

When this happens you’re given a link to list the problematic PDFs. At that time you can either copy and paste the content into the SearchWP File Content meta box on the edit screen within the Media section of the WordPress Dashboard, or flag the PDF as something that should not be indexed using Exclude UI or searchwp_exclude.

What size PDFs can SearchWP index?

Unfortunately there is not a file size limit that dictates whether a PDF can be indexed. Every PDF is very different and parsing depends on both the internal structure of the PDF (how it was generated) and the limitations of the server (PHP memory limit, PHP time limit, other configuration limitations on a server level). When a PDF fails to index the only option is to manually populate the content by copying and pasting it from the PDF into the SearchWP File Content box on the Media edit screen, or try using Xpdf Integration.

How many pages can a PDF have to be indexed?

There is no hard limit to the number of pages that a PDF can have in order to be parsed and indexed by SearchWP. Much like the file size, everything depends on how the PDF was generated/built and the limitations of the server itself. When a PDF fails to index the only option is to manually populate the content by copying and pasting it from the PDF into the SearchWP File Content box on the Media edit screen, or try using Xpdf Integration.

[wpforms id="3080"]