Wednesday, March 2, 2011

Translating Many to Many relationship in a relational database to schema-less structure like Solr

In a relational schema many to many relationships are generally implemented using intersection tables. Lets say you have two entities Cases and Actions that have many to many relationship. Relationally, you'd create a Case_Action table that stores action_id and case_id. Now, what if you have to represent this relationship in a schema-less data-store like Solr? To start, you'd try to index so that you can get a document of type Case which stores the list of Case fields. This will be your base entity.

(NOTE: In Solr the term document is equivalent to a database row)
Well, there are several ways of doing this:
1. OPTION A: Create a multivalued field called action_id_list that lists all actions for a particular case. Populate this field for every document of type Case.
2. OPTION B: Create a multivalued field called case_id_list that lists all cases for a particular action. Populate this field for every document of type Action.
3. OPTION C: Don't create a multivalued field, instead, create multiple action documents for each case that the action is related to. This means your id will be a composite string of action_id and case_id

The option you choose depends on how you want to access the entities in a flattened structure. Option A and Option B are cleaner and easier to implement than C. But they have a limitation.
  • Option B should be used only when an action document is expected to store only the case_id and no other case related field. Otherwise, an extra query needs to fetch the required fields from a case document.
  • Similarly, Option A should be used if no other action field needs to be queried when you query a case document except for the field action_id. 
  • Option C is like a intersection table where it gives you the flexibility to pick whichever fields of the case and action entity. The only drawback is that it makes the id field a little more complicated and perhaps difficult to use in your query.
If you find this article confusing, let me know. If enough people are interested, I can illustrate it with a diagram.

UPDATE: Here are the diagrams as promised. Again,

1. OPTION A: Create a multivalued field called action_id_list that lists all actions for a particular case. Populate this field for every document of type Case.
2. OPTION B: Create a multivalued field called case_id_list that lists all cases for a particular action. Populate this field for every document of type Action.
3. OPTION C: Don't create a multivalued field, instead, create multiple action documents for each case that the action is related to. This means your id will be a composite string of action_id and case_id





3 comments:

  1. Hi, could you illustrate with a diagram? it is still hard to understand. thanks!

    ReplyDelete
  2. could you illustrate? please!

    ReplyDelete
  3. can you please provide a code example like database-config.xml schema. it would be very helpful.
    thanks

    ReplyDelete