Customizing (and verifying) document content
When SearchWP’s indexer processes documents, the extracted content is stored and subsequently indexed. You have full access to this content by navigating to the edit screen for any document within the Media library.
There are two views for Media: grid view (default) and list view.
Grid view
When viewing Media as a grid, locate and select your PDF to bring up the details modal. In the sidebar will be a link titled Edit more details.
List view
When using List view, click either the title or the Edit link as you would any other post type:
SearchWP File Content
The indexed file content is displayed in the SearchWP File Content meta box:
You are free to customize this content by hand, and upon updating the post, SearchWP will give your edited version priority over the extracted content. This way you can make any edits you wish and SearchWP’s indexer will index it accordingly.
The content contained in the SearchWP File Content box is the content indexed by and searchable through SearchWP.
Supported File Formats
SearchWP will extract the text from many common file types including:
- Plain text
- CSV
- Rich text (RTF)
- PDFs (that have readable text*)
- Office Documents (
.docx
,.xlsx
,.pptx
, NOT.doc
) - OpenOffice Documents (
.odt, .ods, .odp)
* To verify your PDF has readable text, try to copy a sentence to your clipboard and paste it somewhere. If you cannot select or paste it, the PDF does not have readable text.