Xpdf Integration

Current version: 1.1.1 View Changelog

Download available with active license

Warning: This Extension requires the use of exec() and also requires you to install Xpdf (upload a file to a non-public location) yourself.


SearchWP offers the unique feature of extracting plain text from PDF files uploaded to your WordPress website. Out of the box, SearchWP attempts to do this using only PHP, but due to the complexity and variation of the PDF format that sometimes results in content not being accurately extracted. Enter Xpdf.

Xpdf is a command line utility that must be installed on your server in order for this Extension to work. Installation is simple, and instructions are included.

Using the Xpdf Integration Extension you can offload all the work PHP has to do in processing your PDF files to Xpdf, which is extremely fast and accurate when extracting content from your PDFs. After activating the Extension, you will need to follow the installation instructions. Once installed, SearchWP will offload the PDF content extraction process to Xpdf.


Installing Xpdf

Using this extension you can utilize Xpdf to extract the content from your PDFs.

IMPORTANT: Xpdf is not provided in this download. You must download Xpdf and upload it to a non-public (outside your Web root) location

Xpdf offers binary distributions for both Windows and Linux at http://www.foolabs.com/xpdf/download.html.

Installation

Once downloaded:

  1. Extract xpdfbin-linux-3.03.tar.gz (the version number may be different)
  2. Upload the pdftotext binary (found in either the bin32 or bin64 directory after extracting) to a non-public location, outside your Web root
  3. Ensure you have set the proper permissions to the file

The last step is to tell SearchWP Xpdf Integration where you installed Xpdf. Add the following to your theme’s functions.php, replacing /path/to/pdftotext with the actual path to the pdftotext binary (not the folder) on your server.

function my_searchwp_xpdf_path() {
	return '/path/to/pdftotext'; // path to the binary NOT A FOLDER
}

add_filter( 'searchwp_xpdf_path', 'my_searchwp_xpdf_path' );

That’s it!

See also: Adding PDF password support


Manually Testing Xpdf Integration

After uploading and activating the Xpdf Integration Extension and defining your path to pdftotext, you can manually confirm that Xpdf text extraction is working as expected on specific PDFs uploaded to your Media library. Begin by going to the SearchWP Settings screen (Settings > SearchWP) and find the Xpdf Integration link in the Extensions menu:

Screen Shot 2013-12-09 at 11.32.22 AM

On the Xpdf Integration Testing screen, you can enter in the ID of the PDF you’d like to test:

Screen Shot 2013-12-09 at 11.34.24 AM

The ID can be found by navigating to your Media section and then clicking the Edit link of your PDF, the ID will be in the URL, followed by post=

After submitting a valid ID you will be given a detailed log of the steps taken by the Xpdf Integration Extension as well as any failure points that may have occurred. You’re also shown the exact content Xpdf extracted from the PDF:

Screen Shot 2013-12-09 at 11.36.16 AM

If the log displays a point of failure, please include that in any support requests you submit.

Changelog

1.1.1

  • [New] New filter: searchwp_xpdf_command allowing manipulation of Xpdf command

1.1

  • [Improvement] Added support for auto-updates based on SearchWP license key

0.7.2

  • [Fix] Better handling of Windows directory separators

0.7

  • Initial release
Fix Search on Your Site. No Coding Required!

Now you can utilize all of the content that's gone unrecognized by native WordPress search instantly with SearchWP

Get SearchWP