Lucene PHP - Important Notes
Sunday, August 16th, 2009*** Lucene - Use integers ***
Actually the book ‘Practical Symfony’ (version jobeet-1.2-propel-en-2009-02-05) is not accurate:
1) the primary key should be defined as a Keyword
$doc->addField(Zend_Search_Lucene_Field::Keyword(’pk’, $this->getId()));
2) by default, Lucene can’t find numbers in its index [and a primary key is a number!]. Then, *before any* invocation of the find method for searching a primary key, we need:
$currentWorkingDirectory = getcwd();
chdir(sfConfig::get(’sf_root_dir’) . “/lib/vendor”);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive());
chdir($currentWorkingDirectory);
$doc->addField(Zend_Search_Lucene_Field::Keyword(’pk’, $this->getId()));
2) by default, Lucene can’t find numbers in its index [and a primary key is a number!]. Then, *before any* invocation of the find method for searching a primary key, we need:
$currentWorkingDirectory = getcwd();
chdir(sfConfig::get(’sf_root_dir’) . “/lib/vendor”);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive());
chdir($currentWorkingDirectory);
*** Lucene Character Escaping ***
Lucene escapes special characters automatically so you need just to make sure that the query coming from users, doesn’t contains the character that you use to build the query internally in PHP (i.e.: double quotes):
$userQuerySector = str_replace(’”‘, ‘ ‘, $querySector);
$luceneSubqueriesArray[] = ” (category:\”$querySector\” OR title:\”$querySector\”) “;
That means you don’t need to implement any data sanitization
Lucene escapes special characters automatically so you need just to make sure that the query coming from users, doesn’t contains the character that you use to build the query internally in PHP (i.e.: double quotes):
$userQuerySector = str_replace(’”‘, ‘ ‘, $querySector);
$luceneSubqueriesArray[] = ” (category:\”$querySector\” OR title:\”$querySector\”) “;
That means you don’t need to implement any data sanitization

