Is there a canonical way to handle JSON data format changes?Storing object-graphs with class-evolution in Java with transformation (long time archiving)Best way of validating Class propertiesxml serialization and deserialization complex situationHow to migrate an XML serialization framework in Java in a tightly coupled system?What's a good way to make sure that locally serialized data can be deserialized in newer code?How to store/deal with data available to other classesHow Should I Design JSON Serializable Data Classes To Respect Future @NonNull FieldsHow can you easily unit test deserialization to different versions of an object?How to calculate new property for old records efficiently?Why isn't there a data type for just a date?

Bash awk command with quotes

Test to know when to use GLM over Linear Regression?

Why is belonging not transitive?

Some Prime Peerage

Is there a tool to measure the "maturity" of a code in Git?

How clean are pets?

What is the meaning of "order" in this quote?

Ambiguity in notation resolved by +

What's the benefit of prohibiting the use of techniques/language constructs that have not been taught?

Amortized Loans seem to benefit the bank more than the customer

How to make a bold sparkline in Google Sheets?

How to be sure services and researches offered by the University are not becoming cases of unfair competition?

What was the ultimate objective of The Party in 1984?

Is "you will become a subject matter expert" code for "you'll be working on your own 100% of the time"?

How to control the output voltage of a solid state relay

Why is the car dealer insisting on a loan instead of cash?

Can I travel to European countries with the Irish passport and without destination Visa?

What is the mathematical notation for rounding a given number to the nearest integer?

What organs or modifications would be needed for a life biological creature not to require sleep?

Building Truncatable Primes using Nest(List), While, Fold

Teleport everything in a large zone; or teleport all living things and make a lot of equipment disappear

Output a Super Mario Image

Is it possible to determine the index of a bip32 address?

Are space camera sensors usually round, or square?



Is there a canonical way to handle JSON data format changes?


Storing object-graphs with class-evolution in Java with transformation (long time archiving)Best way of validating Class propertiesxml serialization and deserialization complex situationHow to migrate an XML serialization framework in Java in a tightly coupled system?What's a good way to make sure that locally serialized data can be deserialized in newer code?How to store/deal with data available to other classesHow Should I Design JSON Serializable Data Classes To Respect Future @NonNull FieldsHow can you easily unit test deserialization to different versions of an object?How to calculate new property for old records efficiently?Why isn't there a data type for just a date?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















Problem



Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:



public class User

public string authInfo;



If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.



public class User

public string username;
public string token;



Solution (my attempt)



We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):



public class User

public string username;
public string token;
[Obsolete] public string authInfo;

[OnDeserialized]
public void FixData()

if (username == null)

var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;





If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.



public class User

[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2



Problem: If null is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:



public class User

[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName



Question



This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)










share|improve this question





















  • 4





    I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

    – T. Sar
    Apr 15 at 11:42











  • I've been looking into Avro for similar reasons

    – Jared Goguen
    Apr 15 at 12:45











  • It might interest

    – Laiv
    Apr 21 at 21:23


















2















Problem



Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:



public class User

public string authInfo;



If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.



public class User

public string username;
public string token;



Solution (my attempt)



We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):



public class User

public string username;
public string token;
[Obsolete] public string authInfo;

[OnDeserialized]
public void FixData()

if (username == null)

var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;





If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.



public class User

[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2



Problem: If null is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:



public class User

[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName



Question



This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)










share|improve this question





















  • 4





    I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

    – T. Sar
    Apr 15 at 11:42











  • I've been looking into Avro for similar reasons

    – Jared Goguen
    Apr 15 at 12:45











  • It might interest

    – Laiv
    Apr 21 at 21:23














2












2








2


3






Problem



Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:



public class User

public string authInfo;



If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.



public class User

public string username;
public string token;



Solution (my attempt)



We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):



public class User

public string username;
public string token;
[Obsolete] public string authInfo;

[OnDeserialized]
public void FixData()

if (username == null)

var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;





If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.



public class User

[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2



Problem: If null is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:



public class User

[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName



Question



This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)










share|improve this question
















Problem



Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:



public class User

public string authInfo;



If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.



public class User

public string username;
public string token;



Solution (my attempt)



We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):



public class User

public string username;
public string token;
[Obsolete] public string authInfo;

[OnDeserialized]
public void FixData()

if (username == null)

var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;





If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.



public class User

[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2



Problem: If null is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:



public class User

[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName



Question



This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)







c# json serialization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 15 at 12:31









Hobbamok

1033 bronze badges




1033 bronze badges










asked Apr 15 at 7:25









piojopiojo

1547 bronze badges




1547 bronze badges










  • 4





    I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

    – T. Sar
    Apr 15 at 11:42











  • I've been looking into Avro for similar reasons

    – Jared Goguen
    Apr 15 at 12:45











  • It might interest

    – Laiv
    Apr 21 at 21:23













  • 4





    I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

    – T. Sar
    Apr 15 at 11:42











  • I've been looking into Avro for similar reasons

    – Jared Goguen
    Apr 15 at 12:45











  • It might interest

    – Laiv
    Apr 21 at 21:23








4




4





I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

– T. Sar
Apr 15 at 11:42





I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.

– T. Sar
Apr 15 at 11:42













I've been looking into Avro for similar reasons

– Jared Goguen
Apr 15 at 12:45





I've been looking into Avro for similar reasons

– Jared Goguen
Apr 15 at 12:45













It might interest

– Laiv
Apr 21 at 21:23






It might interest

– Laiv
Apr 21 at 21:23











2 Answers
2






active

oldest

votes


















6
















Problems



Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:



  • Fields renamed

  • Data types changed

  • Old fields removed

  • New fields added

  • Existing data refactored into multiple fields

  • Existing data combined together into a single field

  • Semantics of existing fields redefined

None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.



Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.




Ideally, migrate all old data into the new Format and disband the old format entirely



The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.



When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.



Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.



Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.




If Migration is not an option, separate your Persistence format away from your Domain Models



The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.



Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.



Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.



Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.



Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.



Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.



For example:



internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */ 

internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */

public class MyStandardModel /* Common/Domain Model */

public class MyRepository

public MyStandardModel GetData(int id)

var row = ReadFromDatabase(id);
MyStandardModel model = null;

if (row.Version == 1)

var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);

else if (row.Version == 2)

var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);

else /* throw exception */

return model;




The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.



It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.






share|improve this answer


































    3
















    I would avoid having the new class know about the old class.



    If the class name changes you can have



    OldRepository

    public List<OldUser> GetAll()


    Converter

    public NewUser Convert(OldUser)


    NewRepository

    public void Add(NewUser)



    You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.



    Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.



    As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.



    The main problem is in handling the change over with zero downtime.






    share|improve this answer



























    • While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

      – Hans-Martin Mosner
      Apr 15 at 7:56











    • If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

      – Hans-Martin Mosner
      Apr 15 at 7:58











    • @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

      – Ewan
      Apr 15 at 8:09











    • Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

      – Hans-Martin Mosner
      Apr 15 at 8:13











    • Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

      – piojo
      May 28 at 5:57













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "131"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f390374%2fis-there-a-canonical-way-to-handle-json-data-format-changes%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6
















    Problems



    Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:



    • Fields renamed

    • Data types changed

    • Old fields removed

    • New fields added

    • Existing data refactored into multiple fields

    • Existing data combined together into a single field

    • Semantics of existing fields redefined

    None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.



    Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.




    Ideally, migrate all old data into the new Format and disband the old format entirely



    The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.



    When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.



    Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.



    Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.




    If Migration is not an option, separate your Persistence format away from your Domain Models



    The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.



    Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.



    Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.



    Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.



    Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.



    Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.



    For example:



    internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */ 

    internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */

    public class MyStandardModel /* Common/Domain Model */

    public class MyRepository

    public MyStandardModel GetData(int id)

    var row = ReadFromDatabase(id);
    MyStandardModel model = null;

    if (row.Version == 1)

    var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
    model = Mapper.Map<MyStandardModel>(data);

    else if (row.Version == 2)

    var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
    model = Mapper.Map<MyStandardModel>(data);

    else /* throw exception */

    return model;




    The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.



    It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.






    share|improve this answer































      6
















      Problems



      Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:



      • Fields renamed

      • Data types changed

      • Old fields removed

      • New fields added

      • Existing data refactored into multiple fields

      • Existing data combined together into a single field

      • Semantics of existing fields redefined

      None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.



      Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.




      Ideally, migrate all old data into the new Format and disband the old format entirely



      The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.



      When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.



      Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.



      Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.




      If Migration is not an option, separate your Persistence format away from your Domain Models



      The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.



      Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.



      Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.



      Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.



      Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.



      Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.



      For example:



      internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */ 

      internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */

      public class MyStandardModel /* Common/Domain Model */

      public class MyRepository

      public MyStandardModel GetData(int id)

      var row = ReadFromDatabase(id);
      MyStandardModel model = null;

      if (row.Version == 1)

      var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
      model = Mapper.Map<MyStandardModel>(data);

      else if (row.Version == 2)

      var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
      model = Mapper.Map<MyStandardModel>(data);

      else /* throw exception */

      return model;




      The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.



      It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.






      share|improve this answer





























        6














        6










        6









        Problems



        Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:



        • Fields renamed

        • Data types changed

        • Old fields removed

        • New fields added

        • Existing data refactored into multiple fields

        • Existing data combined together into a single field

        • Semantics of existing fields redefined

        None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.



        Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.




        Ideally, migrate all old data into the new Format and disband the old format entirely



        The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.



        When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.



        Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.



        Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.




        If Migration is not an option, separate your Persistence format away from your Domain Models



        The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.



        Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.



        Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.



        Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.



        Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.



        Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.



        For example:



        internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */ 

        internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */

        public class MyStandardModel /* Common/Domain Model */

        public class MyRepository

        public MyStandardModel GetData(int id)

        var row = ReadFromDatabase(id);
        MyStandardModel model = null;

        if (row.Version == 1)

        var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
        model = Mapper.Map<MyStandardModel>(data);

        else if (row.Version == 2)

        var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
        model = Mapper.Map<MyStandardModel>(data);

        else /* throw exception */

        return model;




        The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.



        It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.






        share|improve this answer















        Problems



        Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:



        • Fields renamed

        • Data types changed

        • Old fields removed

        • New fields added

        • Existing data refactored into multiple fields

        • Existing data combined together into a single field

        • Semantics of existing fields redefined

        None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.



        Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.




        Ideally, migrate all old data into the new Format and disband the old format entirely



        The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.



        When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.



        Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.



        Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.




        If Migration is not an option, separate your Persistence format away from your Domain Models



        The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.



        Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.



        Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.



        Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.



        Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.



        Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.



        For example:



        internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */ 

        internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */

        public class MyStandardModel /* Common/Domain Model */

        public class MyRepository

        public MyStandardModel GetData(int id)

        var row = ReadFromDatabase(id);
        MyStandardModel model = null;

        if (row.Version == 1)

        var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
        model = Mapper.Map<MyStandardModel>(data);

        else if (row.Version == 2)

        var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
        model = Mapper.Map<MyStandardModel>(data);

        else /* throw exception */

        return model;




        The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.



        It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 15 at 9:51

























        answered Apr 15 at 9:45









        Ben CottrellBen Cottrell

        7,7713 gold badges20 silver badges25 bronze badges




        7,7713 gold badges20 silver badges25 bronze badges


























            3
















            I would avoid having the new class know about the old class.



            If the class name changes you can have



            OldRepository

            public List<OldUser> GetAll()


            Converter

            public NewUser Convert(OldUser)


            NewRepository

            public void Add(NewUser)



            You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.



            Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.



            As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.



            The main problem is in handling the change over with zero downtime.






            share|improve this answer



























            • While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

              – Hans-Martin Mosner
              Apr 15 at 7:56











            • If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

              – Hans-Martin Mosner
              Apr 15 at 7:58











            • @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

              – Ewan
              Apr 15 at 8:09











            • Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

              – Hans-Martin Mosner
              Apr 15 at 8:13











            • Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

              – piojo
              May 28 at 5:57















            3
















            I would avoid having the new class know about the old class.



            If the class name changes you can have



            OldRepository

            public List<OldUser> GetAll()


            Converter

            public NewUser Convert(OldUser)


            NewRepository

            public void Add(NewUser)



            You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.



            Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.



            As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.



            The main problem is in handling the change over with zero downtime.






            share|improve this answer



























            • While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

              – Hans-Martin Mosner
              Apr 15 at 7:56











            • If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

              – Hans-Martin Mosner
              Apr 15 at 7:58











            • @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

              – Ewan
              Apr 15 at 8:09











            • Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

              – Hans-Martin Mosner
              Apr 15 at 8:13











            • Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

              – piojo
              May 28 at 5:57













            3














            3










            3









            I would avoid having the new class know about the old class.



            If the class name changes you can have



            OldRepository

            public List<OldUser> GetAll()


            Converter

            public NewUser Convert(OldUser)


            NewRepository

            public void Add(NewUser)



            You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.



            Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.



            As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.



            The main problem is in handling the change over with zero downtime.






            share|improve this answer















            I would avoid having the new class know about the old class.



            If the class name changes you can have



            OldRepository

            public List<OldUser> GetAll()


            Converter

            public NewUser Convert(OldUser)


            NewRepository

            public void Add(NewUser)



            You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.



            Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.



            As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.



            The main problem is in handling the change over with zero downtime.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 15 at 8:21

























            answered Apr 15 at 7:42









            EwanEwan

            47.9k3 gold badges48 silver badges108 bronze badges




            47.9k3 gold badges48 silver badges108 bronze badges















            • While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

              – Hans-Martin Mosner
              Apr 15 at 7:56











            • If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

              – Hans-Martin Mosner
              Apr 15 at 7:58











            • @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

              – Ewan
              Apr 15 at 8:09











            • Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

              – Hans-Martin Mosner
              Apr 15 at 8:13











            • Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

              – piojo
              May 28 at 5:57

















            • While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

              – Hans-Martin Mosner
              Apr 15 at 7:56











            • If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

              – Hans-Martin Mosner
              Apr 15 at 7:58











            • @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

              – Ewan
              Apr 15 at 8:09











            • Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

              – Hans-Martin Mosner
              Apr 15 at 8:13











            • Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

              – piojo
              May 28 at 5:57
















            While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

            – Hans-Martin Mosner
            Apr 15 at 7:56





            While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.

            – Hans-Martin Mosner
            Apr 15 at 7:56













            If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

            – Hans-Martin Mosner
            Apr 15 at 7:58





            If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.

            – Hans-Martin Mosner
            Apr 15 at 7:58













            @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

            – Ewan
            Apr 15 at 8:09





            @Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool

            – Ewan
            Apr 15 at 8:09













            Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

            – Hans-Martin Mosner
            Apr 15 at 8:13





            Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)

            – Hans-Martin Mosner
            Apr 15 at 8:13













            Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

            – piojo
            May 28 at 5:57





            Is versioning a data row compatible with nested structures, all of which can change? Because if Foo contains Bar, then bumping the version of Bar or any other member would also bump the version of Foo. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?

            – piojo
            May 28 at 5:57


















            draft saved

            draft discarded















































            Thanks for contributing an answer to Software Engineering Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f390374%2fis-there-a-canonical-way-to-handle-json-data-format-changes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Tamil (spriik) Luke uk diar | Nawigatjuun

            Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

            Where does the image of a data connector as a sharp metal spike originate from?Where does the concept of infected people turning into zombies only after death originate from?Where does the motif of a reanimated human head originate?Where did the notion that Dragons could speak originate?Where does the archetypal image of the 'Grey' alien come from?Where did the suffix '-Man' originate?Where does the notion of being injured or killed by an illusion originate?Where did the term “sophont” originate?Where does the trope of magic spells being driven by advanced technology originate from?Where did the term “the living impaired” originate?