Skip to content

MDEV-7270: Fix replicas not honoring slave_skip_errors for 1677#4653

Open
OmarGamal10 wants to merge 1 commit intoMariaDB:12.3from
OmarGamal10:mdev-7270
Open

MDEV-7270: Fix replicas not honoring slave_skip_errors for 1677#4653
OmarGamal10 wants to merge 1 commit intoMariaDB:12.3from
OmarGamal10:mdev-7270

Conversation

@OmarGamal10
Copy link
Copy Markdown
Contributor

@OmarGamal10 OmarGamal10 commented Feb 15, 2026

Summary

  • This PR fixes replicas crashing on column type mismatch for row-based replication, not honoring the slave_skip_errors configuration.
  • The issue comments on JIRA pointed to an upstream (mysql) test, which was helpful for tracking the bug and eventually implementing the fix.
  • The replica now skips the event gracefully (in case the error is actually caught and the column is not convertible), logs a warning and does not abort.

Fix: MDEV-7270

@ParadoxV5 ParadoxV5 requested a review from bnestere February 15, 2026 18:13
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 16, 2026
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please squash all of your commits into a single one and add a commit message to it that complies with CODING_STANDARDS.md.

@bnestere bnestere added the Replication Patches involved in replication label Mar 13, 2026
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OmarGamal10 !

Referencing your Zulip note

Though, I did this on main, then found out I had to do this on the oldest maintained version, which is 10.6 . Now there's some changes in the Rows_log_event::do_apply_event function, upon which I depended on with my fix, these changes do not exist in 10.6 along with some others, what should I do in this case?
...
I expect it would be to find a way to do the fix for 10.6, but the thing is the main has some helper functions and some modifications ~9 months ago-ish that makes handling the row errors much cleaner.

Let's actually keep this change on 12.3 where such changes exist (that should reduce the size of the patch). --skip-slave-errors is mostly used for testing purposes, not production, so missing this in some earlier versions won't be a big deal, and it reduces risk of accidental bugs in early GA versions. Also note, 10.6 is restricted to critical bug fixes only, so we couldn't put this in 10.6 anyway.

Sorry I wasn't able to answer earlier :( I was on vacation when you initially asked the question on Zulip, and have only just been able to take a look at your PR now.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 29, 2026

CLA assistant check
All committers have signed the CLA.

@OmarGamal10 OmarGamal10 changed the base branch from 10.6 to 12.3 March 29, 2026 14:32
@OmarGamal10
Copy link
Copy Markdown
Contributor Author

Hi @OmarGamal10 !

Referencing your Zulip note

Though, I did this on main, then found out I had to do this on the oldest maintained version, which is 10.6 . Now there's some changes in the Rows_log_event::do_apply_event function, upon which I depended on with my fix, these changes do not exist in 10.6 along with some others, what should I do in this case?
...
I expect it would be to find a way to do the fix for 10.6, but the thing is the main has some helper functions and some modifications ~9 months ago-ish that makes handling the row errors much cleaner.

Let's actually keep this change on 12.3 where such changes exist (that should reduce the size of the patch). --skip-slave-errors is mostly used for testing purposes, not production, so missing this in some earlier versions won't be a big deal, and it reduces risk of accidental bugs in early GA versions. Also note, 10.6 is restricted to critical bug fixes only, so we couldn't put this in 10.6 anyway.

Sorry I wasn't able to answer earlier :( I was on vacation when you initially asked the question on Zulip, and have only just been able to take a look at your PR now.

No worries :). I noticed error handling changed from 10.6 to 12.3, and that comptaible_with() does not handle the error itself anymore but marks it for handling later. So I just added a bool to the thd, set in compatible_with() to skip the event in do_apply_event().

Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for rebasing and updating the patch, @OmarGamal10 ! I've left some notes.

@OmarGamal10 OmarGamal10 force-pushed the mdev-7270 branch 4 times, most recently from ca21a82 to c32a9d0 Compare March 31, 2026 22:06
@OmarGamal10 OmarGamal10 requested a review from bnestere March 31, 2026 22:14
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work, @OmarGamal10 !

I've left another round of feedback. It is quite close!

Another note for your git commit, please mention your reviewer (me):

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>

call mtr.add_suppression("Slave SQL.*Error executing row event: .Table .test.t. doesn.t exist., Error_code: 1146");
call mtr.add_suppression("Slave SQL.*Column 0 of table .test.t. cannot be converted from type.* Error_code: 1677");
call mtr.add_suppression("The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state");
call mtr.add_suppression("Got error 1 during COMMIT");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor this nor the one above it. You should only need

call mtr.add_suppression("Slave SQL.*Column 0 of table .test.t. cannot be converted from type.* Error_code: 1677");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need any. The pattern for suppressing warnings is added to global_suppressions.

Field *field= table->field[slave_idx];
set_field_to_null(field);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to always override to NULL. The slave can make it's own

ALTER TABLE <table> ALTER <broken_column> set default <whatever>;

which could be set to NULL if a user wants that behavior, and set differently otherwise. I think we can just scrap this change so the default behavior is taken.

It'd be good to test that as well so it is explicit that behavior is expected, e.g.

connection slave;
ALTER TABLE t ALTER name set default "slave_default";

and also to update the git commit message.

Though I think I recall mentioning at some point "the slave would use NULL", or something like that, but I suppose here I meant whatever its default is, sorry for the miscommunication

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, yes I set it to NULL based on your suggestion, but it does make more sense to set it to the default field. I'll add this and add another row in the test.

Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please keep working with Brandon on the final review.

Setting slave_skip_errors to 1677 (ER_SLAVE_CONVERSION_FAILED) doesn't work
as intended. The slave aborts replication on column mismatch. This fix
makes it log a warning and set the problematic field to the default
value gracefully.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

4 participants