Is there a canonical way to handle JSON data format changes?Storing object-graphs with class-evolution in Java with transformation (long time archiving)Best way of validating Class propertiesxml serialization and deserialization complex situationHow to migrate an XML serialization framework in Java in a tightly coupled system?What's a good way to make sure that locally serialized data can be deserialized in newer code?How to store/deal with data available to other classesHow Should I Design JSON Serializable Data Classes To Respect Future @NonNull FieldsHow can you easily unit test deserialization to different versions of an object?How to calculate new property for old records efficiently?Why isn't there a data type for just a date?
Multi tool use
Bash awk command with quotes
Test to know when to use GLM over Linear Regression?
Why is belonging not transitive?
Some Prime Peerage
Is there a tool to measure the "maturity" of a code in Git?
How clean are pets?
What is the meaning of "order" in this quote?
Ambiguity in notation resolved by +
What's the benefit of prohibiting the use of techniques/language constructs that have not been taught?
Amortized Loans seem to benefit the bank more than the customer
How to make a bold sparkline in Google Sheets?
How to be sure services and researches offered by the University are not becoming cases of unfair competition?
What was the ultimate objective of The Party in 1984?
Is "you will become a subject matter expert" code for "you'll be working on your own 100% of the time"?
How to control the output voltage of a solid state relay
Why is the car dealer insisting on a loan instead of cash?
Can I travel to European countries with the Irish passport and without destination Visa?
What is the mathematical notation for rounding a given number to the nearest integer?
What organs or modifications would be needed for a life biological creature not to require sleep?
Building Truncatable Primes using Nest(List), While, Fold
Teleport everything in a large zone; or teleport all living things and make a lot of equipment disappear
Output a Super Mario Image
Is it possible to determine the index of a bip32 address?
Are space camera sensors usually round, or square?
Is there a canonical way to handle JSON data format changes?
Storing object-graphs with class-evolution in Java with transformation (long time archiving)Best way of validating Class propertiesxml serialization and deserialization complex situationHow to migrate an XML serialization framework in Java in a tightly coupled system?What's a good way to make sure that locally serialized data can be deserialized in newer code?How to store/deal with data available to other classesHow Should I Design JSON Serializable Data Classes To Respect Future @NonNull FieldsHow can you easily unit test deserialization to different versions of an object?How to calculate new property for old records efficiently?Why isn't there a data type for just a date?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Problem
Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:
public class User
public string authInfo;
If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.
public class User
public string username;
public string token;
Solution (my attempt)
We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):
public class User
public string username;
public string token;
[Obsolete] public string authInfo;
[OnDeserialized]
public void FixData()
if (username == null)
var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;
If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2
, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.
public class User
[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2
Problem: If null
is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:
public class User
[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName
Question
This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)
c# json serialization
add a comment
|
Problem
Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:
public class User
public string authInfo;
If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.
public class User
public string username;
public string token;
Solution (my attempt)
We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):
public class User
public string username;
public string token;
[Obsolete] public string authInfo;
[OnDeserialized]
public void FixData()
if (username == null)
var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;
If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2
, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.
public class User
[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2
Problem: If null
is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:
public class User
[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName
Question
This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)
c# json serialization
4
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
It might interest
– Laiv
Apr 21 at 21:23
add a comment
|
Problem
Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:
public class User
public string authInfo;
If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.
public class User
public string username;
public string token;
Solution (my attempt)
We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):
public class User
public string username;
public string token;
[Obsolete] public string authInfo;
[OnDeserialized]
public void FixData()
if (username == null)
var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;
If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2
, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.
public class User
[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2
Problem: If null
is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:
public class User
[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName
Question
This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)
c# json serialization
Problem
Say we have a C# class with is serialized to JSON (currently via Newtonsoft's JSON.Net) and stored in a database:
public class User
public string authInfo;
If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, data may be lost unless we have server downtime during conversion.
public class User
public string username;
public string token;
Solution (my attempt)
We may use a callback which is run after deserialization that converts the old data to the new data format. (The attribute and parameters need to be adapted based on which serialization framework is being used.):
public class User
public string username;
public string token;
[Obsolete] public string authInfo;
[OnDeserialized]
public void FixData()
if (username == null)
var parts = authInfo.Split("/");
username = parts[0];
token = parts[1];
authInfo = null;
If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2
, and incremented when the type changes again.
If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.
public class User
[Obsolete] public List<string> address;
public List<AddressLine> address_2;
// FixData() will convert from address to address_2
Problem: If null
is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:
public class User
[Obsolete] public List<string> name; // serialized old data
private string _familyName; // serialized
private bool _isFamilyNameSet; // serialized
public string familyName get return _familyName; set _familyName = value; _isFamilyNameSet = true; // not serialized
// FixData() will convert from name to familyName
Question
This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)
c# json serialization
c# json serialization
edited Apr 15 at 12:31
Hobbamok
1033 bronze badges
1033 bronze badges
asked Apr 15 at 7:25
piojopiojo
1547 bronze badges
1547 bronze badges
4
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
It might interest
– Laiv
Apr 21 at 21:23
add a comment
|
4
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
It might interest
– Laiv
Apr 21 at 21:23
4
4
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
It might interest
– Laiv
Apr 21 at 21:23
It might interest
– Laiv
Apr 21 at 21:23
add a comment
|
2 Answers
2
active
oldest
votes
Problems
Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:
- Fields renamed
- Data types changed
- Old fields removed
- New fields added
- Existing data refactored into multiple fields
- Existing data combined together into a single field
- Semantics of existing fields redefined
None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.
Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.
Ideally, migrate all old data into the new Format and disband the old format entirely
The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.
When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.
Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.
Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.
If Migration is not an option, separate your Persistence format away from your Domain Models
The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.
Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.
Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.
Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.
Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.
Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.
For example:
internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */
internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */
public class MyStandardModel /* Common/Domain Model */
public class MyRepository
public MyStandardModel GetData(int id)
var row = ReadFromDatabase(id);
MyStandardModel model = null;
if (row.Version == 1)
var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else if (row.Version == 2)
var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else /* throw exception */
return model;
The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.
It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.
add a comment
|
I would avoid having the new class know about the old class.
If the class name changes you can have
OldRepository
public List<OldUser> GetAll()
Converter
public NewUser Convert(OldUser)
NewRepository
public void Add(NewUser)
You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.
Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.
As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.
The main problem is in handling the change over with zero downtime.
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because ifFoo
containsBar
, then bumping the version ofBar
or any other member would also bump the version ofFoo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?
– piojo
May 28 at 5:57
|
show 1 more comment
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "131"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f390374%2fis-there-a-canonical-way-to-handle-json-data-format-changes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Problems
Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:
- Fields renamed
- Data types changed
- Old fields removed
- New fields added
- Existing data refactored into multiple fields
- Existing data combined together into a single field
- Semantics of existing fields redefined
None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.
Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.
Ideally, migrate all old data into the new Format and disband the old format entirely
The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.
When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.
Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.
Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.
If Migration is not an option, separate your Persistence format away from your Domain Models
The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.
Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.
Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.
Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.
Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.
Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.
For example:
internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */
internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */
public class MyStandardModel /* Common/Domain Model */
public class MyRepository
public MyStandardModel GetData(int id)
var row = ReadFromDatabase(id);
MyStandardModel model = null;
if (row.Version == 1)
var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else if (row.Version == 2)
var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else /* throw exception */
return model;
The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.
It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.
add a comment
|
Problems
Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:
- Fields renamed
- Data types changed
- Old fields removed
- New fields added
- Existing data refactored into multiple fields
- Existing data combined together into a single field
- Semantics of existing fields redefined
None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.
Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.
Ideally, migrate all old data into the new Format and disband the old format entirely
The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.
When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.
Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.
Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.
If Migration is not an option, separate your Persistence format away from your Domain Models
The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.
Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.
Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.
Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.
Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.
Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.
For example:
internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */
internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */
public class MyStandardModel /* Common/Domain Model */
public class MyRepository
public MyStandardModel GetData(int id)
var row = ReadFromDatabase(id);
MyStandardModel model = null;
if (row.Version == 1)
var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else if (row.Version == 2)
var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else /* throw exception */
return model;
The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.
It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.
add a comment
|
Problems
Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:
- Fields renamed
- Data types changed
- Old fields removed
- New fields added
- Existing data refactored into multiple fields
- Existing data combined together into a single field
- Semantics of existing fields redefined
None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.
Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.
Ideally, migrate all old data into the new Format and disband the old format entirely
The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.
When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.
Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.
Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.
If Migration is not an option, separate your Persistence format away from your Domain Models
The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.
Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.
Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.
Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.
Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.
Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.
For example:
internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */
internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */
public class MyStandardModel /* Common/Domain Model */
public class MyRepository
public MyStandardModel GetData(int id)
var row = ReadFromDatabase(id);
MyStandardModel model = null;
if (row.Version == 1)
var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else if (row.Version == 2)
var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else /* throw exception */
return model;
The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.
It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.
Problems
Generally speaking, handling different versions of the same data model in the same code results in extra unwanted complexity. Some common issues include:
- Fields renamed
- Data types changed
- Old fields removed
- New fields added
- Existing data refactored into multiple fields
- Existing data combined together into a single field
- Semantics of existing fields redefined
None of these are things which you want to have creeping in to your core/domain logic if you can at all help it.
Furthermore, if you have other version changes planned in the future, then by holding on to old formats, you're potentially looking at an explosion in complexity once you've been through multiple evolutions of the data format.
Ideally, migrate all old data into the new Format and disband the old format entirely
The ideal way to handle this scenario is to make sure that your domain logic is never bothered by different data formats in the first place. Every time you add a new format, complexity increases, but by migrating data it can be a 'one off' operation.
When performing Data Migration, it's important to create a 'rollback' path - i.e. put appropriate backup/restore procedures in place so that you can prevent data loss if anything goes wrong during the migration.
Also ensure that you have appropriate sanity checks and data verification in-place to make sure the data is in a good state following the migration.
Of course, this is not always an option. Multiple data versions are sometimes an unavoidable, necessary evil.
If Migration is not an option, separate your Persistence format away from your Domain Models
The logic would be the somewhat similar to the migration code, except it would occur at run-time instead, and the 'migration logic' would be sticking around long-term until the data is either fully migrated or retired, and extra care is needed to decouple it from the rest of the application.
Any concerns regarding different versions or variations in the shape of the same data stored in different formats within your database should be handled in one place away from the rest of your code; hidden behind a standard Data Layer interface which contains everything that the rest of the logic needs. This can minimise the complexity and impact of storing multiple data formats in your database.
Avoid exposing multiple different formats to your core logic wherever possible The rest of your code should be agnostic to the actual shape or format of your persistent data.
Internally to your data layer, keep different 'model' structures which you can use to deserialise into with JSON.NET. Have a look at AutoMapper for switching between your 'persistence' models and the domain model -- don't use these JSON serialiser models anywhere in your core logic because they represent knowledge of your persistence format.
Some form of versioning will be necessary for this - your repository/serialiser will need to know which internal JSON model format to deserialise into, so you'd probably need to store a version number within the database alongside the serialised data, or otherwise have some way of unambiguously distinguishing between different data versions.
Avoid using a boolean field to switch between your versions -- if your way of distinguishing data formats ends up being a true/false value such as "isNewVersion" then that'll be a problem if you ever happen to introduce version 3 in the future.
For example:
internal class MyDataModelVersion1 /* Old JSON Persistence Format POCO */
internal class MyDataModelVersion2 /* New JSON Persistence Format POCO */
public class MyStandardModel /* Common/Domain Model */
public class MyRepository
public MyStandardModel GetData(int id)
var row = ReadFromDatabase(id);
MyStandardModel model = null;
if (row.Version == 1)
var data = Json.DeserializeObject<MyDataModelVersion1>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else if (row.Version == 2)
var data = Json.DeserializeObject<MyDataModelVersion2>(row.Json);
model = Mapper.Map<MyStandardModel>(data);
else /* throw exception */
return model;
The main reason for this approach is to ensure that the only part of your code which needs to change when you introduce a new data shape is in the Repository/Data layer -- the rest of your code shouldn't need to care.
It's still less ideal than migration but it encapsulates the data version switching into one place and avoids polluting your core logic.
edited Apr 15 at 9:51
answered Apr 15 at 9:45
Ben CottrellBen Cottrell
7,7713 gold badges20 silver badges25 bronze badges
7,7713 gold badges20 silver badges25 bronze badges
add a comment
|
add a comment
|
I would avoid having the new class know about the old class.
If the class name changes you can have
OldRepository
public List<OldUser> GetAll()
Converter
public NewUser Convert(OldUser)
NewRepository
public void Add(NewUser)
You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.
Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.
As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.
The main problem is in handling the change over with zero downtime.
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because ifFoo
containsBar
, then bumping the version ofBar
or any other member would also bump the version ofFoo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?
– piojo
May 28 at 5:57
|
show 1 more comment
I would avoid having the new class know about the old class.
If the class name changes you can have
OldRepository
public List<OldUser> GetAll()
Converter
public NewUser Convert(OldUser)
NewRepository
public void Add(NewUser)
You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.
Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.
As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.
The main problem is in handling the change over with zero downtime.
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because ifFoo
containsBar
, then bumping the version ofBar
or any other member would also bump the version ofFoo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?
– piojo
May 28 at 5:57
|
show 1 more comment
I would avoid having the new class know about the old class.
If the class name changes you can have
OldRepository
public List<OldUser> GetAll()
Converter
public NewUser Convert(OldUser)
NewRepository
public void Add(NewUser)
You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.
Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.
As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.
The main problem is in handling the change over with zero downtime.
I would avoid having the new class know about the old class.
If the class name changes you can have
OldRepository
public List<OldUser> GetAll()
Converter
public NewUser Convert(OldUser)
NewRepository
public void Add(NewUser)
You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.
Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.
As @Hans-Martin says below. Having multiple data versions hanging around for a long time can cause unforeseen issues. If you can do a clean break and upgrade all the data to the new structure thats a good thing.
The main problem is in handling the change over with zero downtime.
edited Apr 15 at 8:21
answered Apr 15 at 7:42
EwanEwan
47.9k3 gold badges48 silver badges108 bronze badges
47.9k3 gold badges48 silver badges108 bronze badges
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because ifFoo
containsBar
, then bumping the version ofBar
or any other member would also bump the version ofFoo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?
– piojo
May 28 at 5:57
|
show 1 more comment
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because ifFoo
containsBar
, then bumping the version ofBar
or any other member would also bump the version ofFoo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?
– piojo
May 28 at 5:57
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
While converting data on the fly on deserialization looks attractive, it can cause more problems than it solves. Least hassle is to convert all data at once and make sure only clients which use the new definition will connect to the database. Schema versioning can be used to achieve this.
– Hans-Martin Mosner
Apr 15 at 7:56
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
If for some reason you can't force all users to switch to the new client version, a better architecture would be a service that supports different api versions while clients using these api versions are in the wild.
– Hans-Martin Mosner
Apr 15 at 7:58
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
@Hans-MartinMosner sounds like you have some recent experience of this. If you wana do an answer about it I think it would be cool
– Ewan
Apr 15 at 8:09
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Your answer already contains all the helpful advice, I was just adding a few bits of experience (actually, not recent but collected throughout the years 😀)
– Hans-Martin Mosner
Apr 15 at 8:13
Is versioning a data row compatible with nested structures, all of which can change? Because if
Foo
contains Bar
, then bumping the version of Bar
or any other member would also bump the version of Foo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?– piojo
May 28 at 5:57
Is versioning a data row compatible with nested structures, all of which can change? Because if
Foo
contains Bar
, then bumping the version of Bar
or any other member would also bump the version of Foo
. This could eventually cause a merge hell with more than one person changing the classes. Whereas if each class has its own version, it could work more nicely but at the cost of us needing to write our own serializer. Any thoughts on this?– piojo
May 28 at 5:57
|
show 1 more comment
Thanks for contributing an answer to Software Engineering Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f390374%2fis-there-a-canonical-way-to-handle-json-data-format-changes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
X4fFN2IufWX5XyLEgSaW7RlC,XIFAlHK
4
I usually avoid storing JSON directly to the database for this reason. Most of the type, changing the database schema to handle small column changes is less of a headache than trying to handle the same issue on the data layer of your application.
– T. Sar
Apr 15 at 11:42
I've been looking into Avro for similar reasons
– Jared Goguen
Apr 15 at 12:45
It might interest
– Laiv
Apr 21 at 21:23