-
Notifications
You must be signed in to change notification settings - Fork 18
don't cache if the schema registry is unavailable #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
3a358e6
be6320d
d998a3e
39fa195
2f296b8
9c67d20
b0005e5
e28e23d
1bf8865
4efb647
1bc1c70
17aafff
452ce9c
64a1756
6160ecd
50d1e0b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| ! go2avro Foo | ||
| stderr 'undefined: bar.Foo' | ||
| stderr 'undefined: pkg.Foo' | ||
|
|
||
| -- bar.go -- | ||
| package bar | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,32 +1,32 @@ | ||
| module github.com/heetch/avro | ||
|
|
||
| go 1.19 | ||
| go 1.24 | ||
|
|
||
| require ( | ||
| github.com/actgardner/gogen-avro/v10 v10.2.1 | ||
| github.com/frankban/quicktest v1.14.0 | ||
| github.com/google/uuid v1.3.0 | ||
| github.com/google/uuid v1.6.0 | ||
| github.com/kr/pretty v0.3.0 | ||
| github.com/linkedin/goavro/v2 v2.11.1 | ||
| github.com/rogpeppe/go-internal v1.9.0 | ||
| github.com/rogpeppe/go-internal v1.14.1 | ||
| github.com/sebdah/goldie/v2 v2.5.5 | ||
| github.com/stretchr/testify v1.7.1 | ||
| golang.org/x/text v0.3.0 | ||
| golang.org/x/text v0.23.0 | ||
| gopkg.in/httprequest.v1 v1.2.1 | ||
| gopkg.in/retry.v1 v1.0.3 | ||
| ) | ||
|
|
||
| require ( | ||
| github.com/davecgh/go-spew v1.1.1 // indirect | ||
| github.com/golang/snappy v0.0.4 // indirect | ||
| github.com/google/go-cmp v0.5.6 // indirect | ||
| github.com/google/go-cmp v0.6.0 // indirect | ||
| github.com/julienschmidt/httprouter v1.3.0 // indirect | ||
| github.com/kr/text v0.2.0 // indirect | ||
| github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e // indirect | ||
| github.com/pmezard/go-difflib v1.0.0 // indirect | ||
| github.com/sergi/go-diff v1.0.0 // indirect | ||
| golang.org/x/net v0.0.0-20200505041828-1ed23360d12c // indirect | ||
| golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 // indirect | ||
| golang.org/x/net v0.33.0 // indirect | ||
| golang.org/x/sys v0.28.0 // indirect | ||
| golang.org/x/tools v0.26.0 // indirect | ||
| gopkg.in/errgo.v1 v1.0.0 // indirect | ||
| gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ import ( | |
| "context" | ||
| "fmt" | ||
| "reflect" | ||
| "strings" | ||
| "sync" | ||
| ) | ||
|
|
||
|
|
@@ -117,9 +118,12 @@ func (c *SingleDecoder) getProgram(ctx context.Context, vt reflect.Type, wID int | |
| } else { | ||
| // We haven't seen the writer schema before, so try to fetch it. | ||
| wType, err = c.registry.SchemaForID(ctx, wID) | ||
| // TODO look at the SchemaForID error | ||
| // and return an error without caching it if it's temporary? | ||
| // See https://github.com/heetch/avro/issues/39 | ||
| // do not cache the error when schema registry is unavailable | ||
| // we can't import avroregistry, to compare the error, so we're looking at the error message to see if the | ||
| // error is of type `UnavailableError` (avroregistry/errors.go) | ||
| if err != nil && strings.HasPrefix(err.Error(), "schema registry unavailability caused by") { | ||
| return nil, err | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can it lead to an overload of the registry if we don't cache anything? The other open PR that addresses the same issue caches the error for 1 minute: #127 Can we just merge that one, or maybe decrease the duration a bit if necessary?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Merging mentioned PR would be nice. 1min might be a bit too much in our case indeed. How about we make that duration configurable (e.g w/ default and overridable through env var)?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the schema registr\y is unavailable the request doesn't reach the schema registry pod at all, so it's not stressed. If i'm understanding this correctly, this should only be called once per pod per topic (unless there's a race condition where it's called for each partition on the same pod at the same time). In any case, the number of requests to the schema registry is very low.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If the registry is unavailable for 10 minutes, and in that timeframe one pod after another is started (e.g. autoscaling, node consolidation) and runs into the error, then the registry comes back up, won't all pods send their requests to the registry at the same time? Could that lead to issues? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally we could check the exact issue
But this current PR is good enough as a first step IMO
This is already the current behavior when a deploy happens, the registry should accommodate this (as Adrian mentioned the number of requests per service should be low, basically 1 per topic consumed/produced in per instance)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My PR explicitly only doesn't cache
I believe that it would be a mistake to cache the error in either of those cases. If we want to limit the cache time for other types of errors, that would be beyond the scope of this PR.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Indeed. Then no blocker, and we can consider improving it in the future if necessary |
||
| } | ||
| } | ||
| c.mu.Lock() | ||
| defer c.mu.Unlock() | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.