Data Catalyst: Translating Many to Many relationship in a relational database to schema-less structure like Solr

In a relational schema many to many relationships are generally implemented using intersection tables. Lets say you have two entities Cases and Actions that have many to many relationship. Relationally, you'd create a Case_Action table that stores action_id and case_id. Now, what if you have to represent this relationship in a schema-less data-store like Solr? To start, you'd try to index so that you can get a document of type Case which stores the list of Case fields. This will be your base entity.

(NOTE: In Solr the term document is equivalent to a database row)
Well, there are several ways of doing this:
1. OPTION A: Create a multivalued field called action_id_list that lists all actions for a particular case. Populate this field for every document of type Case.
2. OPTION B: Create a multivalued field called case_id_list that lists all cases for a particular action. Populate this field for every document of type Action.
3. OPTION C: Don't create a multivalued field, instead, create multiple action documents for each case that the action is related to. This means your id will be a composite string of action_id and case_id

The option you choose depends on how you want to access the entities in a flattened structure. Option A and Option B are cleaner and easier to implement than C. But they have a limitation.

Option B should be used only when an action document is expected to store only the case_id and no other case related field. Otherwise, an extra query needs to fetch the required fields from a case document.
Similarly, Option A should be used if no other action field needs to be queried when you query a case document except for the field action_id.
Option C is like a intersection table where it gives you the flexibility to pick whichever fields of the case and action entity. The only drawback is that it makes the id field a little more complicated and perhaps difficult to use in your query.

If you find this article confusing, let me know. If enough people are interested, I can illustrate it with a diagram.

UPDATE: Here are the diagrams as promised. Again,

1. OPTION A: Create a multivalued field called action_id_list that lists all actions for a particular case. Populate this field for every document of type Case.

2. OPTION B: Create a multivalued field called case_id_list that lists all cases for a particular action. Populate this field for every document of type Action.

3. OPTION C: Don't create a multivalued field, instead, create multiple action documents for each case that the action is related to. This means your id will be a composite string of action_id and case_id

4 comments:

AnonymousMarch 22, 2011 at 3:15 AM
Hi, could you illustrate with a diagram? it is still hard to understand. thanks!
AnonymousApril 26, 2011 at 11:45 PM
could you illustrate? please!
Rizwan UllahDecember 19, 2016 at 9:39 AM
can you please provide a code example like database-config.xml schema. it would be very helpful.
thanks
EvrenSeptember 6, 2023 at 11:28 PM
Translating a many-to-many relationship in database design requires an intermediary table to link related entities. Is Arcade Snake This ensures accurate data representation.

Wednesday, March 2, 2011

Translating Many to Many relationship in a relational database to schema-less structure like Solr

4 comments: