Tuesday, September 16, 2014

Full text Search Using MongoDB and Pymongo

Released from version 2.3.2 MongoDB supports text indexes to support text search of string content in documents.  Text indexes can include any field whose value is a string or an array of string elements. In this post I will talk about creating and using text indexes in MongoDB using pymongo to make a full text search.
  • Features of MongoDB Full text Search
  •  Full text search as an index type when creating new indexes.
  •  Advanced queries like negation and phrase matching supported.
  • Multiple fields indexing with weighting to give different fields higher priority.
  • Avoid stop words.
  • Stemming, to deal with plurals.
  • Multiple language support with initially Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.


Creating MongoDB text Indexes

First step to create MongoDB text index is to enable text indexing. You can enable the text indexing using following command in mongo console.

Enabling Index Support

use <DB_NAME>
db.adminCommand ({ setParameter : "*", textSearchEnabled : true });

After this you we can insert data into the database and start using the text index for search. Here I use a question database to create index on questions text and then make a question search. The structure of the “question” collection is as follows:-       

       

{
    "_id" : ObjectId("53a71fb3421aa9422f49ac8c"),
    "answer" : {
        "a" : {
            "text" : "$ sin1^{0} > sin1 $"
            "image" : "",
        },
        "b" : {
            "image" : "",
            "text" : "$ sin1^{0} < sin1 $"
        },
        "c" : {
            "image" : "",
            "text" : "$ sin1^{0} = sin1 $"
        },
        "correct" : "b",
        "d" : {           
            "image" : "",
            "text" : "$ sin1^{0} = \\frac {\\pi}{180} sin1 $"
        }
    },
    "exam_code" : 201,
    "exam_type" : "ENGINEERING",
    "marks" : 1,
    "question" : {
        "html" : "Which one of the following is correct?",
        "image" : "",
        "text" : "Which one of the following is correct?"
    },
    "question_number" : 1,
    "subject" : "mathematics"
}

Creating Indexing

We will create index on text field of question in the question collection and then use the index from mongo console using following command:-

db.questions.ensureIndex( { "question.text": "text" } );

The index created above is the single index we can also create compound index in the collection.

Using the index 

Now our index is ready to be used.  For pymongo with MongoDB version 2.4
client = MongoClient(<host>, <port>)
db = client[‘db name’]           
result =  db.command ("text", question , search = ‘query text’, project = fields, limit = limit)

Command execution in  Mongo Console:
result = db.question.runCommand("text",  search = ‘query text’, project = fields, limit = limit)

For pymongo with MongoDB version 2.6

client = MongoClient(<host>, <port>)
db = client[db_name] 
result = db.question.find ({"$text”: {“$search”: ‘query text’ }}, project={‘_id’:0}).limit(10)

Command execution in Mongo Console:
result = db.question.find ({"$text”: {“$search”: ‘query text’ }}, {‘_id’:0}).limit(10)

Using limit we can find the top documents that matches the search queries.