-
ABP Framework version: v8.3.0
-
UI Type: MVC
-
Database System: EF Core (SQL Server)
-
Tiered (for MVC) or Auth Server Separated (for Angular): yes
We are getting this exception very often and it's causing the system to be unusable. We had a failed go-live because of this and we've not been successful at locating the reason:
2025-02-20 15:11:51.633] [Error] wn0mdwk000176 (77) A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Volo.Abp.Threading.SemaphoreSlimExtensions.LockAsync(SemaphoreSlim semaphoreSlim, CancellationToken cancellationToken)
at Volo.Abp.Caching.DistributedCache`2.GetOrAddAsync(TCacheKey key, Func`1 factory, Func`1 optionsFactory, Nullable`1 hideErrors, Boolean considerUow, CancellationToken token)
at Volo.Abp.LanguageManagement.DynamicResourceLocalizer.FillAsync(LocalizationResourceBase resource, String cultureName, Dictionary`2 dictionary)
at Volo.Abp.Localization.LocalizationResourceContributorList.FillAsync(String cultureName, Dictionary`2 dictionary, Boolean includeDynamicContributors)
at Volo.Abp.Localization.AbpDictionaryBasedStringLocalizer.GetAllStringsAsync(String cultureName, Boolean includeParentCultures, Boolean includeBaseLocalizers, Boolean includeDynamicContributors)
at Volo.Abp.Localization.AbpDictionaryBasedStringLocalizer.GetAllStringsAsync(Boolean includeParentCultures, Boolean includeBaseLocalizers, Boolean includeDynamicContributors)
at Volo.Abp.Localization.AbpStringLocalizerExtensions.GetAllStringsAsync(IStringLocalizer stringLocalizer, Boolean includeParentCultures, Boolean includeBaseLocalizers, Boolean includeDynamicContributors)
at Volo.Abp.AspNetCore.Mvc.ApplicationConfigurations.AbpApplicationLocalizationAppService.GetAsync(ApplicationLocalizationRequestDto input)
at Castle.DynamicProxy.AsyncInterceptorBase.ProceedAsynchronous[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo)
at Volo.Abp.Castle.DynamicProxy.CastleAbpMethodInvocationAdapterWithReturnValue`1.ProceedAsync()
at Volo.Abp.GlobalFeatures.GlobalFeatureInterceptor.InterceptAsync(IAbpMethodInvocation invocation)
at Volo.Abp.Castle.DynamicProxy.CastleAsyncAbpInterceptorAdapter`1.InterceptAsync[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo, Func`3 proceed)
at Castle.DynamicProxy.AsyncInterceptorBase.ProceedAsynchronous[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo)
at Volo.Abp.Castle.DynamicProxy.CastleAbpMethodInvocationAdapterWithReturnValue`1.ProceedAsync()
at Volo.Abp.Auditing.AuditingInterceptor.ProceedByLoggingAsync(IAbpMethodInvocation invocation, AbpAuditingOptions options, IAuditingHelper auditingHelper, IAuditLogScope auditLogScope)
at Volo.Abp.Auditing.AuditingInterceptor.ProcessWithNewAuditingScopeAsync(IAbpMethodInvocation invocation, AbpAuditingOptions options, ICurrentUser currentUser, IAuditingManager auditingManager, IAuditingHelper auditingHelper, IUnitOfWorkManager unitOfWorkManager)
at Volo.Abp.Auditing.AuditingInterceptor.ProcessWithNewAuditingScopeAsync(IAbpMethodInvocation invocation, AbpAuditingOptions options, ICurrentUser currentUser, IAuditingManager auditingManager, IAuditingHelper auditingHelper, IUnitOfWorkManager unitOfWorkManager)
at Volo.Abp.Auditing.AuditingInterceptor.InterceptAsync(IAbpMethodInvocation invocation)
at Volo.Abp.Castle.DynamicProxy.CastleAsyncAbpInterceptorAdapter`1.InterceptAsync[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo, Func`3 proceed)
at Castle.DynamicProxy.AsyncInterceptorBase.ProceedAsynchronous[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo)
at Volo.Abp.Castle.DynamicProxy.CastleAbpMethodInvocationAdapterWithReturnValue`1.ProceedAsync()
at Volo.Abp.Validation.ValidationInterceptor.InterceptAsync(IAbpMethodInvocation invocation)
at Volo.Abp.Castle.DynamicProxy.CastleAsyncAbpInterceptorAdapter`1.InterceptAsync[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo, Func`3 proceed)
at Castle.DynamicProxy.AsyncInterceptorBase.ProceedAsynchronous[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo)
at Volo.Abp.Castle.DynamicProxy.CastleAbpMethodInvocationAdapterWithReturnValue`1.ProceedAsync()
at Volo.Abp.Uow.UnitOfWorkInterceptor.InterceptAsync(IAbpMethodInvocation invocation)
at Volo.Abp.Castle.DynamicProxy.CastleAsyncAbpInterceptorAdapter`1.InterceptAsync[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo, Func`3 proceed)
at Volo.Abp.AspNetCore.Mvc.ApplicationConfigurations.AbpApplicationLocalizationController.GetAsync(ApplicationLocalizationRequestDto input)
at lambda_method3566(Closure, Object)
at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Logged|12_1(ControllerActionInvoker invoker)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Pretty standard deploy: web, auth, API all pointing to the same redis instance at azure.
Redis health check is good:
Redis latency seems ok:
Help!
7 Answer(s)
-
0
I've spent the morning trying to produce a simple project to show the issue, but it's hard. I never see this issue locally, only when deployed.
https://github.com/kfrancis/abp-cache-issue-repo
I've tried to get something similar, run
TestConcurrentLocks.ps1
but so far I've not been able to reproduce locally.It's just curious that of all the similar exceptions, they are all the same. DynamicResourceLocalizer, LockAsync causing cancel. And generally, it feels like something is wrong with caching. We regularly see issues that we associate with caching, like cached results quickly getting thrown out, issues with permissions that seem to flip/flop (sometimes there are items in the menu that should be there, sometimes not), etc.
-
0
Hi,
I'm delivering this issue to our core team and they'll start investigate. I just created an internal issue for this.
Until this investigation in the framework level, I can ask for some more details.
-
Is your redis instance under high load?
If yes, can you try increaasingmaxmemory
for redis instance, If you have a customredis.conf
, it might be limited
-
-
0
No, there's barely any usage yet because the system can't handle it.
.
It feels like a race condition, IMHO.
In SemaphoreSlimExtensions.LockAsync:
There's a small window between WaitAsync completing and GetDispose being called where cancellation could occur
If cancellation happens in this window, the semaphore would be acquired but the IDisposable might not be returned, potentially leaving the semaphore lockedIn GetOrAddAsync:
The double-check pattern used here assumes the first GetAsync result remains valid when entering the lock
Between the first GetAsync and obtaining the lock, another thread could have modified or removed the cache value
Within the lock, after the factory() call and before SetAsync, an exception/cancellation could leave cache in an inconsistent stateThe most concerning race condition is in GetOrAddAsync where cache reads aren't transactional with respect to the lock. The code assumes cache state observed before taking the lock remains valid inside the lock, which may not be true in a distributed system. That might explain why this issue isn't happening in dev, where the instances are running on the same system.
-
0
Just a heads up while you look into this, we are working on a bit of an "abp caching playground" to assist from our side: https://github.com/Clinical-Support-Systems/abp-caching-playground
It's meant to help us determine how changes in the caching implementation change the overall caching health/throughput, but also (hopefully) expose possible issues with the caching implementation.
Cool things:
-
We've figured out how to support k6 for load testing in aspire, this being how we wanted to "stress" the caching implementation to make sure it's working.
-
We've added redis caching metrics to the aspire dashboard even though the aspire documentation says that metrics aren't possible.
-
This approach is also helping us do more representative load/burst testing, as one issue we're struggling with is that load testing locally produces wildly different results than any deployed release
-
-
0
Sorry for the late reply, but we missed this question because the friend dealing with the subject was on vacation. However, next Monday, the friend who is interested in the subject will return to you. Thank you for your patience.
-
0
Please don't close this ticket. It is unresolved. Any news?
I'll note, though, while I'm responding, that I see very similar caching issues on a completely separate project. It feels like a cache stampede issue causing a race condition on the lock internal to the implementation, which then causes the cache to need to be loaded again, etc.
I've been side tracked on the AbpCachingPlayground while we separated out the aspire k6 components (https://github.com/kfrancis/k6-aspire-hosting), but I'll be getting back to it shortly to test my hypothesis.
-
0
There can be a race condition while acquiring the lock, hard to determine.
Can you reproduce it with newly created solution? Can you share us a minimal reproduction steps or the minimal application that reproduces the problem so we can check if there is a problem in framework level