Thursday, April 30, 2015

MongoDB Java Driver Breaks the Principle of Least Astonishment

I have been going through the MongoDB University introductory course for Java Developers  these past few weeks and happy to report that I have learned quite a few things about MongoDB.

The course I'm taking is pretty decent as an intro and it's free so be sure to check it out if you've been wanting to learn a little more about MongoDB. The video lectures are short and sweet, averaging about two to three minutes each, with only a few that go a little longer but no more than ten minutes or so. Each week, there will be a dozen or so such lectures with each lecture usually ending with a quick quiz that doesn't factor in to your final course grade.

Despite its title, the course, designated as M101J, doesn't focus on using MongoDB from Java as much as it does on introducing MongoDB concepts in general. In fact, most of the examples shown in the lectures are done in the MongoDB shell, which takes JavaScript commands, and there are even a few mentions of PyMongo, the MongoDB driver for Python, in the Week 6 lectures.

The session I'm in right now ends on May 5 and all I have left to do before I complete this course and, hopefully, earn a certificate of completion (yay!) is to score at least 65% on the final exam. Actually, I could get less and still qualify for a certificate since I pretty much aced all the homework but I'm not the kind to set the bar low like that.

Anyway, to the point of this post: I discovered that the MongoDB Java Driver has violated the Principle of Least Astonishment in the MongoCollection.insertOne() method.

I came across the surprising behavior in the MongoDB Java Driver as I was answering one of the final exam questions in the M101J course. I am using the most current version of the driver, 3.0.0, which was released a few weeks ago as of this writing (April 2015).

There doesn't appear to be any equivalent of MongoCollection.insertOne() in the Mongo shell, which is fine. That's not a biggie. To add a new document to a collection in the Mongo shell, you would just use the collection.insert()method. Figure 1 below shows an example of how that's done with the fubar collection in the test database.


Figure 1. Inserting a document in Mongo Shell

$ mongo

> use test
switched to db test

> doc = {a:5, b:5, c:5555}
{ "a" : 5, "b" : 5, "c" : 5555 }

> db.fubar.insert(doc)
WriteResult({ "nInserted" : 1 })

> doc
{ "a" : 5, "b" : 5, "c" : 5555 }

> db.fubar.find({a:5})
{ "_id" : ObjectId("5542234eb8aa3c1b4dd7b89b"), "a" : 5, "b" : 5, "c" : 5555 }

The Mongo Shell commands are pretty straightforward and the results are very reasonably what you would expect them to be. Notice that after I successfully insert the document and display it again, nothing in the document changes. "Duh," you might say, "Why would it?" Exactly. When I run find() on the fubar collection to retrieve the document I just inserted, the result is a document that has an extra key which I didn't specify in the document that I passed to the insert method. Again, this is not surprising since MongoDB automatically appends an _id key to any document you insert into a collection and assigns it a unique value.

Now here's where the Mongo Shell and the MongoDB Java Driver diverge in behavior. Figure 2 below shows what happens when I use the same doc variable to insert yet another document. The command succeeds, doc is unchanged as before, and when I do a find, I find that I now have two documents which are identical except for their _id. Again, nothing really surprising about this.


Figure 2. Inserting a Document multiple times

// continuing from before...

> doc
{ "a" : 5, "b" : 5, "c" : 5555 }

> db.fubar.insert(doc)
WriteResult({ "nInserted" : 1 })

> doc
{ "a" : 5, "b" : 5, "c" : 5555 }

> db.fubar.find({a:5})
{ "_id" : ObjectId("5542234eb8aa3c1b4dd7b89b"), "a" : 5, "b" : 5, "c" : 5555 }
{ "_id" : ObjectId("55422366b8aa3c1b4dd7b89c"), "a" : 5, "b" : 5, "c" : 5555 }

The MongoDB Java Driver behavior for inserting documents is quite different though and this is very surprising. Figure 3 shows how you would insert a document into the same collection.


Figure 3. Inserting a Document in Java

public static void main(String[] args) {
   MongoClient c =  new MongoClient();
   MongoDatabase db = c.getDatabase("test");
   MongoCollection<Document> fubar = db.getCollection("fubar");

   Document foo = new Document("a", 5)
           .append("b", 5)
           .append("c", 555);

   System.out.println("Before: " + foo);
   fubar.insertOne(foo);
   System.out.println("After: " + foo);
}

The console output from running this Java code looks something like this:

Before: Document{{a=5, b=5, c=555}}
After: Document{{a=5, b=5, c=555, _id=5542351b5f6987eb755b795a}}

This is quite surprising since the driver mutates the Document that you passed in as a parameter to the insertOne method. What's more, the insertOne method does not return anything: it's declared as a void method. This spells trouble for the unwary Java developer who has come to expect otherwise from their experience in the Mongo Shell. There are a couple of consequences of this design flaw, both of which are not good.

First, you can't readily use the same Document instance as a template to insert multiple documents into a collection. That is, you can't just initialize a bunch of keys with common values and set up a loop to just change the keys that vary. Doing this will result in a MongoWriteException being thrown with a message telling you that a duplicate key was found in the _id index. To work around this problem, you need to remove the _id key before each subsequent call to insertOne using the same document instance. This is the second bad consequence of the design flaw.


Figure 4. Avoiding duplicate key error with MongoDB for Java

public static void main(String[] args) {
   MongoClient c =  new MongoClient();
   MongoDatabase db = c.getDatabase("test");
   MongoCollection<Document> fubar = db.getCollection("fubar");

   Document foo = new Document("a", 5)
           .append("b", 5)
           .append("c", 555);

   System.out.println("Before: " + foo);
   fubar.insertOne(foo);
   System.out.println("After: " + foo);

   // fubar.insertOne(foo);  <== duplicate _id error!

   foo.remove("_id")
   foo.remove("a");
   foo.append("a", 55);

   System.out.println("Before: " + foo);
   fubar.insertOne(foo);
   System.out.println("After: " + foo);
}

The output of this program looks something like this:

Before: Document{{a=5, b=5, c=555}}
After: Document{{a=5, b=5, c=555, _id=5542392f5f6987019de3a75b}}
Before: Document{{b=5, c=555, a=55}}
After: Document{{b=5, c=555, a=55, _id=5542392f5f6987019de3a75c}}

Not nice, MongoDB Java Driver!

In my opinion, it would have been more reasonable for the insertOne method to instead return the Document that was inserted, the one with the _id key in it. The original Document passed in as a parameter should not experience any change at all. This would have been the least astonishing behavior for this method and the behavior most symmetrical with the MongoDB shell behavior for inserting documents.

(Update #1) Upon further investigation, I found that this is the behavior in the Mongo Shell:

Figure 5. Value returned by collection.insert() in Mongo Shell

> newdoc = db.fubar.insert(doc)
WriteResult({ "nInserted" : 1 })

> newdoc
WriteResult({ "nInserted" : 1 })

> typeof newdoc
object

Which means that the insert method in the Mongo Shell doesn't return the inserted document either. No biggie, I suppose that's a reasonable design choice, too. So, I guess if you were to really make the MongoDB Java Driver be aligned with this behavior and adhere to the Principle of Least Astonishment, you wouldn't return the inserted Document as I suggested above but instead return some kind of object, like a WriteResult maybe.

(Update #2) Looking at the PyMongo Driver, the behavior of the insert_one method is to return a pymongo.results.InsertOneResult object so something similar in the Java driver would make more sense.