Available since: 1.9
searchwp_term_pattern_whitelist
View Parameters »Note: Use of this hook will require a manual reindex
SearchWP is a token-based indexer and search algorithm. That means that all content is tokenized and broken up by both whitespace and special characters. To avoid breaking apart specially formatted strings like SKUs, version numbers, and dates, SearchWP implements the idea of a Regex Whitelist. The Regex Whitelist is an array of regular expression patterns that the indexer uses to extract content before it gets tokenized.
The default regex patterns are as follows (they are ordered from most strict to least strict):
<?php | |
// THE DEFAULT SEARCHWP REGEX WHITELIST | |
private $term_pattern_whitelist = array( | |
// these should go from most strict to most loose | |
// functions | |
"/(\\w+?)?\\(|[\\s\\n]\\(/is", | |
// Date formats | |
"/([0-9]{4}-[0-9]{1,2}-[0-9]{1,2})/is", // date: YYYY-MM-DD | |
"/([0-9]{1,2}-[0-9]{1,2}-[0-9]{4})/is", // date: MM-DD-YYYY | |
"/([0-9]{4}\\/[0-9]{1,2}\\/[0-9]{1,2})/is", // date: YYYY/MM/DD | |
"/([0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4})/is", // date: MM/DD/YYYY | |
// IP | |
"/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/is", // IPv4 | |
// initials | |
"/\\b((?:[A-Za-z]\\.\\s{0,1})+)/isu", | |
// version numbers: 1.0 or 1.0.4 or 1.0.5b1 | |
"/([a-z0-9]+(?:\\.[a-z0-9]+)+)/is", | |
// serial numbers | |
"/(\\b[-_]?[0-9a-zA-Z]+(?:[-_]+[0-9a-zA-Z]+)+[-_]?)/isu", // hyphen/underscore separator | |
// strings of digits | |
"/\\b(\\d{1,})\\b/is", | |
// e.g. M&M, M & M | |
"/\\b([[:alnum:]]+\\s?(?:&\\s?[[:alnum:]]+)+)/isu", | |
); |
If you would like to modify the Regex Whitelist, add something like the following to your theme’s functions.php
while retaining the more-strict-to-less-strict order:
<?php | |
function my_searchwp_term_pattern_whitelist( $whitelist ) { | |
$my_whitelist = array( | |
"/\\b(IT)\\b/u", // always keep "IT" (all caps only!) | |
); | |
// we want our pattern to be considered the most specific | |
// so that false positive matches do not interfere | |
$whitelist = array_merge( $my_whitelist, $whitelist ); | |
return $whitelist; | |
} | |
add_filter( 'searchwp_term_pattern_whitelist', 'my_searchwp_term_pattern_whitelist' ); |
Parameters
Parameter | Type | Description |
---|---|---|
$patterns |
Array |
Regular expression patterns to match against from most strict to least strict |