Pivotal GemFire® v8.2

Recovering from ConfictingPersistentDataExceptions

Recovering from ConfictingPersistentDataExceptions

A ConflictingPersistentDataException while starting up persistent members indicates that you have multiple copies of some persistent data, and GemFire cannot determine which copy to use.

Normally GemFire uses metadata to determine automatically which copy of persistent data to use. Along with the region data, each member persists a list of other members that are hosting the region and whether their data is up to date. A ConflictingPersistentDataException happens when two members compare their metadata and find that it is inconsistent. The members either don’t know about each other, or they both think the other member has stale data.

The following sections describe scenarios that can cause ConflictingPersistentDataExceptions in GemFire and how to resolve the conflict.

Independently Created Copies

Trying to merge two independently created distributed systems into a single distributed system will cause a ConflictingPersistentDataException.

There are a few ways to end up with independently created systems.

  • Create two different distributed systems by having members connect to different locators that are not aware of each other.
  • Shut down all persistent members and then start up a different set of brand new persistent members.

GemFire will not automatically merge independently created data for the same region. Instead, you need to export the data from one of the systems and import it into the other system. See the section Cache and Region Snapshots for instructions on how to export data from one system and import it into another.

Starting New Members First

Starting a brand new member that has no persistent data before starting older members with persistent data can cause a ConflictingPersistentDataException.

One accidental way this can happen is to shut the system down, add a new member to the startup scripts, and start all members in parallel. By chance, the new member may start first. The issue is that the new member will create an empty, independent copy of the data before the older members start up. GemFire will be treat this situation like the Independently Created Copies case.

In this case the fix is simply to move aside or delete the persistent files for the new member, shut down the new member and then restart the older members. When the older members have fully recovered, then restart the new member.

A Network Failure Occurs and Network Partitioning Detection is Disabled

When enable-network-partition-detection is set to true, GemFire will detect a network partition and shut down unreachable members to prevent a network partition ("split brain") from occurring. No conflicts should occur when the system is healed.

However if enable-network-partition-detection is false, GemFire will not detect the network partition. Instead, each side of the network partition will end up recording that the other side of the partition has stale data. When the partition is healed and persistent members are restarted, the members will report a conflict because both sides of the partition think the other members are stale.

In some cases it may be possible to choose between sides of the network partition and just keep the data from one side of the partition. Otherwise you may need to salvage data and import it into a fresh system.

Salvaging Data

If you receive a ConflictingPersistentDataException, you will not be able to start all of your members and have them join the same distributed system. You have some members with conflicting data.

First, see if there is part of the system that you can recover. For example if you just added some new members to the system, try to start up without including those members.

For the remaining members you can extract data from the persistent files on those members and import the data.

To extract data from the persistent files, use the gfsh export offline-disk-store command.
gfsh> export offline-disk-store --name=MyDiskStore --disk-dirs=./mydir --dir=./outputdir
This will produce a set of snapshot files. Those snapshot files can be imported into a running system using:
gfsh> import data --region=/myregion --file=./outputdir/snapshot-snapshotTest-test0.gfd --member=server1