Sphinx Fulltext Search

Sphinx fulltext search provides a new feature to use Sphinx Open Source Search Server for phpBB 3.1 search. Using Sphinx will improve the performance of searching as well as indexing particularly in boards with large databases. Sphinx server being both flexible and fast, provides a better alternative as a search backend.

Minimum Requirements
Sphinx Search server 2.0.1+ and phpBB 3.1 board running on either MySQL or PostgreSQL Databases.

Sphinx Installation
Follow the Instructions to install sphinx. Only the actual installation is required, no need to follow "Sphinx Quick Usage Tour" for phpBB search.

Sphinx Configuration
Sphinx configuration file data can either be generated through ACP and then copy pasted into the sphinx.conf or phpBB/docs/sphinx.sample.conf can be manually edited and used. Following folders/files need to be created and defined in the sphinx.conf:
 * Config directory which will have sphinx.conf and stopwords.txt (If defined).
 * Data directory which will have binary and index files.
 * Log directory as a sub directory of Data directory which will save all logs related to sphinx search server.

Creating Required Directories
mkdir -p {DATA_PATH} mkdir -p {DATA_PATH}/log
 * Data Directory
 * Log Directory

Indexing
Board administrator needs to select Sphinx Fulltext Search as the search backend and Create Search Index through the ACP UI. This will create a SPHINX_TABLE in the database. Then the sphinx indexer should be manually run from the shell.

indexer --config {CONFIG_PATH}/sphinx.conf index_phpbb_{SPHINX_ID}_main >> {DATA_PATH}/log/indexer.log 2>&1 &
 * Index Main

indexer --config {CONFIG_PATH}/sphinx.conf index_phpbb_{SPHINX_ID}_delta >> {DATA_PATH}/log/indexer.log 2>&1 &
 * Index Delta

indexer --rotate --config {CONFIG_PATH}/sphinx.conf index_phpbb_{SPHINX_ID}_delta >> {DATA_PATH}/log/indexer.log 2>&1 &
 * Re-Index

Test Sphinx
Test whether sphinx is working. The following command will return the search result. search --config {CONFIG_PATH}/sphinx.conf search string

Incremental Updates
Crontab file on most Unix Systems can be edited by crontab -e Add this line to update the delta index every five minutes */5 * * * * indexer --rotate --config {CONFIG_PATH}/sphinx.conf index_phpbb_{SPHINX_ID}_delta >> {DATA_PATH}/log/indexer.log 2>&1 & Add this line to set up cron job for full index once every night 0 3 * * * indexer --rotate --config {CONFIG_PATH}/sphinx.conf index_phpbb_{SPHINX_ID}_main >> {DATA_PATH}/log/indexer.log 2>&1 &

Start Searchd
Start sphinx daemon. searchd --config {CONFIG_PATH}/sphinx.conf >> {DATA_PATH}/log/searchd-startup.log 2>&1 &

Troubleshooting
Log files present in the {DATA_PATH}/log/ directory can be checked for errors. See Sphinx Documentation for details.

Manual Configuration
Sample Sphinx config file for phpBB sphinx search backend is available [# here]. It has many options which include database details as well as the directory details for sphinx data and config folders.

Database Details
Database details on which sphinx daemon and the board are running.
 * type - database type, default mysql.
 * sql_host - hostname, default localhost
 * sql_user
 * sql_pass
 * sql_port - database port, default 3306 for mysql
 * db_name

Searchd Details

 * listen - IP address : Sphinx Daemon port, default 127.0.0.1 : 3312
 * read_timeout - Network client request read timeout in seconds, default 5
 * max_children - Maximum amount of children to fork (concurrent searches to run in parallel), default 30
 * max_matches - the number of search hits to display per result page, default 20000

Wildcard searching
By default, wildcard searching is DISABLED and use of * operator will not work. To enable wildcard searching, consider configuring the following parameters:


 * ignore_chars - characters (in Unicode format) ignored and truncated in search index. default none. ignore_chars = U+00AD, U+002D will truncate hyphenated words into single word eg "re-establish" will be indexed as "reestablish". Ignored characters cannot be listed in charset_table.
 * min_prefix_len - minimum prefix length to index. Value greater than 0 will enable partial word match using wordstart* wildcard, default 0 (wildcards disabled). Suggested value 3 (tes* will find test, tested, testing etc)
 * min_infix_len - minimum infix length to index. Value greater than 0 will enable partial word match using 'start*', '*end', and '*middle*' wildcards, default 0 (wildcards disabled). Suggested value 3 (*est* will find test, tested, testing, estimated, shortest etc).

NOTE: only use one of either min_prefix_len or min_infix_len, not both. The unused parameter should be set as 0. Enabling wildcard indexing will increase search index size.

Stopwords
Sphinx config file provides an option for specifying a file containing search stop words. Stop words are those common words like 'a' and 'the' that appear commonly in text and should really be ignored from searching. A somewhat complete list of English stop words can be found [# here]. These words can be copied into a text file and added to sphinx.conf under index_phpbb section as stopwords = path/to/stopwords.txt