Append Document Content to Parent Post Content
This article is an expansion on Adding extra data to indexed entries
SearchWP has the ability to transfer relevance weight in a number of directions. One direction is particularly applicable to Media in that you can have SearchWP transfer weight for Media entries to their respective ‘Uploaded to’ post. In the database this relationship is established using the post_parent
column.
When this weight transfer has been set up in the Engine settings (and taking as an example PDFs) when searching for content that appears in a PDF, the result that SearchWP returns will not be the PDF itself but instead be the post to which it has been attached.
Depending on your site and circumstances this can be very beneficial!
There are some edge cases however, and sometimes weight transfer is not something you want to occur in all cases. We can take advantage of SearchWP’s adaptability and customize our implementation to instead:
- Disable Media from the SearchWP Engine entirely
- Hook into SearchWP’s indexer and retrieve any ‘child’ PDFs for posts as they’re indexed
This dynamic application allows us to better evaluate the situation on a post-by-post basis.
All hooks should be added to your custom SearchWP Customizations Plugin.
<?php | |
// @link https://searchwp.com/documentation/knowledge-base/append-document-content-to-parent-post-content/ | |
// Retrieve child PDF content and add as 'extra' data to a SearchWP Entry. | |
add_filter( 'searchwp\entry\data', function( $data, \SearchWP\Entry $entry ) { | |
// Convert the SearchWP Entry into it's native object type. | |
$entry = $entry->native(); | |
// We only want to consider WP_Post objects. | |
if ( ! $entry instanceof \WP_Post ) { | |
return $data; | |
} | |
// Retrieve PDFs that have been uploaded to this Entry. | |
$pdfs = get_posts( [ | |
'post_type' => 'attachment', | |
'post_mime_type' => 'application/pdf', | |
'post_status' => 'inherit', | |
'nopaging' => true, | |
'post_parent' => $entry->ID, | |
] ); | |
if ( empty( $pdfs ) ) { | |
return $data; | |
} | |
// Retrieve PDF content for PDFs and store as extra data. | |
$data['meta'][ 'searchwp_child_pdf_content' ] = array_map( function( $pdf ) { | |
return \SearchWP\Document::get_content( $pdf ); | |
}, $pdfs ); | |
return $data; | |
}, 20, 2 ); | |
// Add "Attached PDF Content" as available option to SearchWP Source Attributes. | |
add_filter( 'searchwp\source\attribute\options', function( $keys, $args ) { | |
if ( $args['attribute'] !== 'meta' ) { | |
return $keys; | |
} | |
// This key is the same as the one used in the searchwp\entry\data hook above, they must be the same. | |
$pdf_content_key = 'searchwp_child_pdf_content'; | |
// Add "Attached PDF Content" Option if it does not exist already. | |
if ( ! in_array( | |
$pdf_content_key, | |
array_map( function( $option ) { return $option->get_value(); }, $keys ) | |
) ) { | |
$keys[] = new \SearchWP\Option( $pdf_content_key, 'Attached PDF Content' ); | |
} | |
return $keys; | |
}, 20, 2 ); |
These hooks tap into SearchWP’s indexing process and for each post (e.g. Posts, Pages, Custom Post Types) and will look for any PDFs that have been ‘uploaded to’ this current post (by utilizing the established post_parent
relationship) and then parse each of those PDFs for content.
The content of all ‘child’ PDFs will be stored as an extra Custom Field with a name of searchwp_child_pdf_content
. The second hook will add this extra Custom Field to the Custom Fields dropdown when managing post Attributes in your SearchWP Engine and the entry will have a name of Attached PDF Content which will allow you to give an individual relevance weight to the extracted PDF content.
Note: In order for this change to take effect you will need to rebuild your index using the button on the Engines tab of the SearchWP settings screen.