Function Fast_custom_index->tokenise_text
Definitions
sources/database_search.php
- Tokenise some text, so it can be indexed by token.
- Visibility: protected
- Is abstract?: No
- Is static?: No
- Is final?: No
- Returns: array
Parameters
Name | Type | Passed by reference? | Variadic? | Default | Set | Range | Description |
---|---|---|---|---|---|---|---|
$text | string | No | No | required parameter | N/A | N/A | The text |
$lang | LANGUAGE_NAME | No | No | required parameter | N/A | N/A | Language codename |
$ngrams_exclude | ?array | No | No | Null | N/A | N/A | A list of ngrams to explicitly exclude (used internally to stop repetitions across multiple APPEARANCE_CONTEXTs, ultimately required to stop row repetition in output) (null: none) |
&$total_singular_ngram_tokens | ?integer | Yes | No | Null | N/A | N/A | Maintain a count of singular ngrams (typically words) in here (null: do not maintain) |
&$statistics_map | ?array | Yes | No | Null | N/A | N/A | Write into this map of singular ngram (typically, words) to number of occurrences (null: do not maintain a map) |
Returns
- Map between ngrams and number of occurrences
- Type: array
- Set: N/A
- Range: N/A
Preview
Code (PHP)
/**
* Tokenise some text, so it can be indexed by token.
*
* @param string $text The text
* @param LANGUAGE_NAME $lang Language codename
* @param ?array $ngrams_exclude A list of ngrams to explicitly exclude (used internally to stop repetitions across multiple APPEARANCE_CONTEXTs, ultimately required to stop row repetition in output) (null: none)
* @param ?integer $total_singular_ngram_tokens Maintain a count of singular ngrams (typically words) in here (null: do not maintain)
* @param ?array $statistics_map Write into this map of singular ngram (typically, words) to number of occurrences (null: do not maintain a map)
* @return array Map between ngrams and number of occurrences
*/
protected function tokenise_text(string $text, string $lang, ?array $ngrams_exclude = null, ?int &$total_singular_ngram_tokens = null, ?array &$statistics_map = null) : array
* Tokenise some text, so it can be indexed by token.
*
* @param string $text The text
* @param LANGUAGE_NAME $lang Language codename
* @param ?array $ngrams_exclude A list of ngrams to explicitly exclude (used internally to stop repetitions across multiple APPEARANCE_CONTEXTs, ultimately required to stop row repetition in output) (null: none)
* @param ?integer $total_singular_ngram_tokens Maintain a count of singular ngrams (typically words) in here (null: do not maintain)
* @param ?array $statistics_map Write into this map of singular ngram (typically, words) to number of occurrences (null: do not maintain a map)
* @return array Map between ngrams and number of occurrences
*/
protected function tokenise_text(string $text, string $lang, ?array $ngrams_exclude = null, ?int &$total_singular_ngram_tokens = null, ?array &$statistics_map = null) : array