Multi-thread cache corruption

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi-thread cache corruption

Benoit Allard

Hi together,

 

I believe we have some case of cache corruption. We can reproduce it within a test-case of our application with ~70% reliability.

 

How the issues manifest itself: within a transaction, a flush of the EntityManager triggers a Foreign key violation about another entry that has been added (and successfully updated) within the same transaction.

 

What we did to trigger the issue: we started passing id to the asynchronous workers (and let them fetch the entity themselves) instead of passing them the entity directly. This does put a lot of more work onto the entity manager, causing it to corrupt its cache (I believe). When we remove that modification, that issue does not happens. Which leads us to believe that using eclipselink asynchronously more than before is what triggers the issue.

 

Enclosed is an example of logs that get generated while the issue occurs. That log is trimmed to the instant where the erroneous unit-of-work (1298690816) is active. Some other are visible, those are the mentioned asynchronous workers. Let me go briefly through it:

 

An entity of type slaughter_party is first inserted (id: 640). That entity is queried, updated a few times. Later another entity of the same type is created with a (supposedly) reference to the first one (old_party_id). The flush of that second entity claims that the first one does not exist.

 

Sometimes, that same issue (a foreign key violation) happens with different objects, always related to a missing entity of type slaughter_party.

 

There is another cache error happening in this log (probably related): an entity of type join_partner_business (partner 635, business 607) is inserted with id 639 after checking that it doesn’t exists (select count …). That same procedure happens a few moments later and decides to insert a new join_partner_business (id 647) about the same relation. In case where it works, that second join entity is (rightfully) never inserted.

 

I tried with various DB backend (HSQL in-memory, H2 in-memory and H2 file), as those are quite easy to setup, they all trigger the same issue.

 

I hope you can shed some light on our issue and point me to my mistake as I don’t really want to believe that multi-threaded eclipselink is fundamentally broken.


Mit freundlichen Grüßen / Kind regards / С уважением


Benoit Allard
Software Engineer


SOFTWARE. APPLICATIONS. SERVICES.

SLA Software Logistik Artland GmbH

ARTLAND
 Friedrichstr. 30, D-49610 Quakenbrück
 +49 5431 9480 - 379
 +49 5431 9480 - 979
[hidden email]
sla.de

BERLIN
 Keithstraße 14, D-10787 Berlin

OSNABRÜCK
 
Dinglingsweg 1a, D-49565 Bramsche

Support
DE
+49 5431 9480 - 77 
AT +43 720 115300

Geschäftsführer Jörg Brezl (CEO) | Hermann Grevemeyer (COO)
Handelsregister 
Amtsgericht Osnabrück - HRB 20381

Follow us:
           


_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/eclipselink-users

red-trimmed.txt (176K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Multi-thread cache corruption

christopher delahunt
Why you think this is a cache corruption within Eclipselink? The database is stating the entry isn't there (not a cache), and you are stating it was added in the same transaction as the insert statement referring to it that is failing. One of the two statements must be wrong, or you have an issue at the database level.

The only explanation other I can come up with would be that the transaction is not the same, so that the first insert is no longer visible to your existing UnitOfWork/EntityManager. You mention you are doing work asynchronously - check that the datasource isn't returning a new connection/transaction to your thread processing work. From the EclipseLink logging you've shown, I don't see a reason for the exception, but you are using an external connection pool with JTA, so it all depends on how the transaction is demarcated and being handled underneath EclipseLink.

Best Regards,
Chris



On Nov 8, 2018, at 7:20 AM, Benoit Allard <[hidden email]> wrote:

Hi together,
 
I believe we have some case of cache corruption. We can reproduce it within a test-case of our application with ~70% reliability.
 
How the issues manifest itself: within a transaction, a flush of the EntityManager triggers a Foreign key violation about another entry that has been added (and successfully updated) within the same transaction.
 
What we did to trigger the issue: we started passing id to the asynchronous workers (and let them fetch the entity themselves) instead of passing them the entity directly. This does put a lot of more work onto the entity manager, causing it to corrupt its cache (I believe). When we remove that modification, that issue does not happens. Which leads us to believe that using eclipselink asynchronously more than before is what triggers the issue.
 
Enclosed is an example of logs that get generated while the issue occurs. That log is trimmed to the instant where the erroneous unit-of-work (1298690816) is active. Some other are visible, those are the mentioned asynchronous workers. Let me go briefly through it:
 
An entity of type slaughter_party is first inserted (id: 640). That entity is queried, updated a few times. Later another entity of the same type is created with a (supposedly) reference to the first one (old_party_id). The flush of that second entity claims that the first one does not exist.
 
Sometimes, that same issue (a foreign key violation) happens with different objects, always related to a missing entity of type slaughter_party.
 
There is another cache error happening in this log (probably related): an entity of type join_partner_business (partner 635, business 607) is inserted with id 639 after checking that it doesn’t exists (select count …). That same procedure happens a few moments later and decides to insert a new join_partner_business (id 647) about the same relation. In case where it works, that second join entity is (rightfully) never inserted.
 
I tried with various DB backend (HSQL in-memory, H2 in-memory and H2 file), as those are quite easy to setup, they all trigger the same issue.
 
I hope you can shed some light on our issue and point me to my mistake as I don’t really want to believe that multi-threaded eclipselink is fundamentally broken.

Mit freundlichen Grüßen / Kind regards / С уважением 


Benoit Allard
Software Engineer

<SLA_logo_verl_simply_closer_3d0f937f-a364-438d-93db-416260a15ccf.png>
SOFTWARE. APPLICATIONS. SERVICES.

SLA Software Logistik Artland GmbH

ARTLAND
<adress_14x14_4aeda57f-b95c-4821-bb0c-4120232aff1c.png> Friedrichstr. 30, D-49610 Quakenbrück
<phone_14x14_a4ae79ca-db84-4ff6-abac-69480444fe3c.png> +49 5431 9480 - 379 
<fax_14x14_f0932916-84c5-48f3-a82e-ba90a5edce15.png> +49 5431 9480 - 979
<mail_14x14_9dc2bae1-e67f-4614-abed-579f3f429fa1.png>[hidden email]
sla.de

BERLIN 
<adress_14x14_c625e6d8-a931-4432-ac54-2c5028d906c3.png> Keithstraße 14, D-10787 Berlin 

OSNABRÜCK
<adress_14x14_5be19d62-1bb6-47f8-8968-fd02a6ea9d5e.png> 
Dinglingsweg 1a, D-49565 Bramsche

Support
DE
 +49 5431 9480 - 77 
AT +43 720 115300

Geschäftsführer Jörg Brezl (CEO) | Hermann Grevemeyer (COO)
Handelsregister 
Amtsgericht Osnabrück - HRB 20381

Follow us:
<facebook_35x35_5d19c9e5-db01-4bdc-93e3-33f20401fdcf.png>   <Twitter_e3082ee0-27ef-402f-a757-441f18aad1c6.png>   <Youtube_cd8acbe7-c059-4c54-839b-08b1406d533d.png>   <linkedin_35x35_84e6230c-1375-4f5b-b66f-2740a3d1cdbc.png>   <xing_35x35_4c399f6b-26da-4d3e-b30a-e77d2dd69f43.png>

<Mailbanner_FD_fin_38b56c16-1056-4c3e-a455-2fc5a6006ad9.jpg>

<Gulfood2018_indd_da7c1b52-12e6-4c92-8b9e-08bd6e4a9f7a.jpg>

<red-trimmed.txt>_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.eclipse.org_mailman_listinfo_eclipselink-2Dusers&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=aeYmvFwRsCWfafng1rj-AcoD-cr8mkJQBEoVD5OszU0&m=fv5NJR7lxB27iwHQEuJg02XbNN2yukmOASh3qx1Qri8&s=vdyYBycMdPSzSfN87Jj1BjJ2WafXHNJiK-tDEptxkLA&e=


_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/eclipselink-users
Reply | Threaded
Open this post in threaded view
|

Re: Multi-thread cache corruption

Benoit Allard

Hi Chris,

 

Thanks a lot, indeed you are correct, the caching doesn’t seem to be even remotely involved in my issue. My biggest misunderstanding there is that until now I believed that unit of works are the same as transaction. In the failing log I can see a new “begin transaction” just before the issue start to manifest itself, and that without seeing a “commit transaction” of the previous one. That clearly both explains the foreign key violation and the missing join entity, as in the new transaction those entities are not there. That new transaction doesn’t happen in the case where the test succeeds.

 

We are using Spring transaction management, so I guess my next step is to turn logging on at that place, and have spring explain me why the new transaction is necessary.

 

Thanks again for your eyes opener!

 

Regards,

Ben.

 


Mit freundlichen Grüßen / Kind regards / С уважением


Benoit Allard
Software Engineer


SOFTWARE. APPLICATIONS. SERVICES.

SLA Software Logistik Artland GmbH

ARTLAND
 Friedrichstr. 30, D-49610 Quakenbrück
 +49 5431 9480 - 379
 +49 5431 9480 - 979
[hidden email]
sla.de

BERLIN
 Keithstraße 14, D-10787 Berlin

OSNABRÜCK
 
Dinglingsweg 1a, D-49565 Bramsche

Support
DE
+49 5431 9480 - 77 
AT +43 720 115300

Geschäftsführer Jörg Brezl (CEO) | Hermann Grevemeyer (COO)
Handelsregister 
Amtsgericht Osnabrück - HRB 20381

Follow us:
           

Von: [hidden email] <[hidden email]> Im Auftrag von Christopher Delahunt
Gesendet: Thursday, 8 November, 2018 16:53
An: EclipseLink User Discussions <[hidden email]>
Betreff: Re: [eclipselink-users] Multi-thread cache corruption

 

Why you think this is a cache corruption within Eclipselink? The database is stating the entry isn't there (not a cache), and you are stating it was added in the same transaction as the insert statement referring to it that is failing. One of the two statements must be wrong, or you have an issue at the database level.

 

The only explanation other I can come up with would be that the transaction is not the same, so that the first insert is no longer visible to your existing UnitOfWork/EntityManager. You mention you are doing work asynchronously - check that the datasource isn't returning a new connection/transaction to your thread processing work. From the EclipseLink logging you've shown, I don't see a reason for the exception, but you are using an external connection pool with JTA, so it all depends on how the transaction is demarcated and being handled underneath EclipseLink.

 

Best Regards,

Chris

 

 

 

On Nov 8, 2018, at 7:20 AM, Benoit Allard <[hidden email]> wrote:

 

Hi together,

 

I believe we have some case of cache corruption. We can reproduce it within a test-case of our application with ~70% reliability.

 

How the issues manifest itself: within a transaction, a flush of the EntityManager triggers a Foreign key violation about another entry that has been added (and successfully updated) within the same transaction.

 

What we did to trigger the issue: we started passing id to the asynchronous workers (and let them fetch the entity themselves) instead of passing them the entity directly. This does put a lot of more work onto the entity manager, causing it to corrupt its cache (I believe). When we remove that modification, that issue does not happens. Which leads us to believe that using eclipselink asynchronously more than before is what triggers the issue.

 

Enclosed is an example of logs that get generated while the issue occurs. That log is trimmed to the instant where the erroneous unit-of-work (1298690816) is active. Some other are visible, those are the mentioned asynchronous workers. Let me go briefly through it:

 

An entity of type slaughter_party is first inserted (id: 640). That entity is queried, updated a few times. Later another entity of the same type is created with a (supposedly) reference to the first one (old_party_id). The flush of that second entity claims that the first one does not exist.

 

Sometimes, that same issue (a foreign key violation) happens with different objects, always related to a missing entity of type slaughter_party.

 

There is another cache error happening in this log (probably related): an entity of type join_partner_business (partner 635, business 607) is inserted with id 639 after checking that it doesn’t exists (select count …). That same procedure happens a few moments later and decides to insert a new join_partner_business (id 647) about the same relation. In case where it works, that second join entity is (rightfully) never inserted.

 

I tried with various DB backend (HSQL in-memory, H2 in-memory and H2 file), as those are quite easy to setup, they all trigger the same issue.

 

I hope you can shed some light on our issue and point me to my mistake as I don’t really want to believe that multi-threaded eclipselink is fundamentally broken.

 

Mit freundlichen Grüßen / Kind regards / С уважением 

 


Benoit Allard
Software Engineer

<SLA_logo_verl_simply_closer_3d0f937f-a364-438d-93db-416260a15ccf.png>

SOFTWARE. APPLICATIONS. SERVICES.

SLA Software Logistik Artland GmbH

ARTLAND
<adress_14x14_4aeda57f-b95c-4821-bb0c-4120232aff1c.png> Friedrichstr. 30, D-49610 Quakenbrück
<phone_14x14_a4ae79ca-db84-4ff6-abac-69480444fe3c.png> +49 5431 9480 - 379 
<fax_14x14_f0932916-84c5-48f3-a82e-ba90a5edce15.png> +49 5431 9480 - 979
<mail_14x14_9dc2bae1-e67f-4614-abed-579f3f429fa1.png>[hidden email]
sla.de

BERLIN 
<adress_14x14_c625e6d8-a931-4432-ac54-2c5028d906c3.png> Keithstraße 14, D-10787 Berlin 

OSNABRÜCK
<adress_14x14_5be19d62-1bb6-47f8-8968-fd02a6ea9d5e.png> 
Dinglingsweg 1a, D-49565 Bramsche

Support
DE
 +49 5431 9480 - 77 
AT +43 720 115300

Geschäftsführer Jörg Brezl (CEO) | Hermann Grevemeyer (COO)
Handelsregister Amtsgericht Osnabrück - HRB 20381

Follow us:
<facebook_35x35_5d19c9e5-db01-4bdc-93e3-33f20401fdcf.png>   <Twitter_e3082ee0-27ef-402f-a757-441f18aad1c6.png>   <Youtube_cd8acbe7-c059-4c54-839b-08b1406d533d.png>   <linkedin_35x35_84e6230c-1375-4f5b-b66f-2740a3d1cdbc.png>   <xing_35x35_4c399f6b-26da-4d3e-b30a-e77d2dd69f43.png>

<Mailbanner_FD_fin_38b56c16-1056-4c3e-a455-2fc5a6006ad9.jpg>

<Gulfood2018_indd_da7c1b52-12e6-4c92-8b9e-08bd6e4a9f7a.jpg>

<red-trimmed.txt>_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.eclipse.org_mailman_listinfo_eclipselink-2Dusers&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=aeYmvFwRsCWfafng1rj-AcoD-cr8mkJQBEoVD5OszU0&m=fv5NJR7lxB27iwHQEuJg02XbNN2yukmOASh3qx1Qri8&s=vdyYBycMdPSzSfN87Jj1BjJ2WafXHNJiK-tDEptxkLA&e=

 


_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/eclipselink-users
Reply | Threaded
Open this post in threaded view
|

Re: Multi-thread cache corruption

Mauro Molinari
Il 09/11/2018 09:02, Benoit Allard ha scritto:
We are using Spring transaction management, so I guess my next step is to turn logging on at that place, and have spring explain me why the new transaction is necessary.

I've not read your logs, however please note that Spring transaction management is thread-bound, so if  you have multiple workers on different threads they cannot share the same transaction. At least not in the default implementation.

Mauro


_______________________________________________
eclipselink-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/eclipselink-users