Starts in:
1 DAY
16 HRS
53 MIN
50 SEC
Starts in:
1 D
16 H
53 M
50 S
Open Closed

Database exceptions while running in Azure, how to handle without EnableRetryOnFailure #7848


User avatar
0
okains created
  • ABP Framework version: v8.2.3
  • UI Type: Blazor Server
  • Database System: EF Core (SQL Server)
  • Tiered (for MVC) or Auth Server Separated (for Angular): yes
  • Exception message and full stack trace:
  • Steps to reproduce the issue:

Hi,

I have been deploying to Azure Deployment Slots, and I am getting transient db errors in the logs on the Auth project that are causing me to often get 400 Errors returned. The log errors are initially in this format:

2024-09-06 15:20:30.990 +00:00 [ERR] An error occurred using the connection to database 'ESv2-testing' on server 'tcp:hathor-hk.database.windows.net,1433'.

Resulting in this:

Following the suggestion of ChatGPT, I added EnableRetryOnFailure:

Configure<AbpDbContextOptions>(options => { /* The main point to change your DBMS. * See also ESv2DbContextFactory for EF Core tooling. */ options.UseSqlServer(sqlOptions => sqlOptions.EnableRetryOnFailure(5, TimeSpan.FromSeconds(10), null)); });

But it seems this may not be supported in ABP.

I am now getting the following in my logs :

2024-09-06 13:08:13.988 +00:00 [ERR] An exception occurred while iterating over the results of a query for context type 'Volo.Abp.TextTemplateManagement.EntityFrameworkCore.TextTemplateManagementDbContext'. System.InvalidOperationException: The configured execution strategy 'SqlServerRetryingExecutionStrategy' does not support user-initiated transactions. Use the execution strategy returned by 'DbContext.Database.CreateExecutionStrategy()' to execute all the operations in the transaction as a retriable unit. at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.OnFirstExecution()

Is there another approach to handling these errors?

This particular exception that is logged: [ERR] An exception occurred while iterating over the results of a query for context type 'Volo.Abp.TextTemplateManagement.EntityFrameworkCore.TextTemplateManagementDbContext'.

Is this maybe the root cause here? This doesn't happen all of the time though, any idea on what is going on here?

Thanks,

Karim Ainsworth


34 Answer(s)
  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    ABP doesn't support the connection-resiliency(EnableRetryOnFailure) feature.

  • User Avatar
    0
    okains created

    Hi,

    Yeah I understand that, question is how to handle this without being able to use EnableRetryOnFailure. Is there a best practice with ABP for this?

    Is there any plan on implementing this in a future version?

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    You can only write code to catch the exception and retry.

    EnableRetryOnFailure is not compatible with abp design.

  • User Avatar
    0
    okains created

    OK but this is happening during the login process, seems to be on the db connection that pulls the OpenIddictApplications record for the BlazorWebAppTiered client.

    So I would need to pull the source code for login then and try and handle it in there? Any info on what method I should focus on , or any other ideas on how to best do this?

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    OK but this is happening during the login process, seems to be on the db connection that pulls the OpenIddictApplications record for the BlazorWebAppTiered client.

    Can you share full logs?

    liming.ma@volosoft.com

  • User Avatar
    0
    okains created

    Hi,

    I have shared the full debug logs of the site load and login attempt. Here is a screenshot of the db error that I believe is causing these 400 errors:

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    HTTP 400 is about the Openiddict.

    [INF] Client validation failed because 'https://esv2-test.azurewebsites.net/signin-oidc' was not a valid redirect_uri for ESv2_BlazorWebAppTiered.
    [INF] The authorization request was rejected because the redirect_uri was invalid: 'https://esv2-test.azurewebsites.net/signin-oidc'.
    
  • User Avatar
    0
    okains created

    That client validation failed INF is only present when the DB connection ERR happens. The configuration is fine, it works fine occasionally and I have double and tripled checked Environment Variables and the redirectURLs in the OpenIddictApplications table. Everything is at it should be.

    This is why I had said that I think that DB error is happening when trying to read the OpenIddictApplications table, so no data coming back therefore the redirect_uri mismatch. Just a guess but I have been testing this for days and that seems to be the 1 thing that makes a bit of sense. That is why I wanted to retry that db call in the first place.

    This is happening on an AppService in Azure with 3 deployment slots, Test / Staging / Production . It seems that only 1 deployment slot will work at a time, the other 2 then throw the ' error using the connection to the database ' error and then redirect uri mismatch. And there are no config changes at all being made.

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    Maybe the index error causes the system to cache the wrong data.

    You can try removing the await OpenIdApplicationRepository.GetListAsync(); from the index page and testing again.

  • User Avatar
    0
    okains created

    Hi,

    OK I commented out the // Application = await OpenIdApplicationRepository.GetListAsync(); line,

    I am still getting the : [ERR] An error occurred using the connection to database 'ESv2-testing' on server 'tcp:hathor-hk.database.windows.net,1433'. error intermittently.

    Any idea of what I can try next on this? This is a pretty critical error, we can deploy our app reliably at this point. Any help would be much appreciated.

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    Can you share a project so I can reproduce the problem locally?

  • User Avatar
    0
    okains created

    Hi,

    You have access to this repo in GitHub, it is the same repo as before. The problem is though that this doesn't happen locally, it is only showing up when we deploy to Azure. And not all the time.

    I just need to figure out a fix or workaround for this in some way so that we can deploy properly. If you have any ideas or can look at the code and see if there is anything obviously wrong with the Auth project that would be great.

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    Can the problem be reproduced by connecting to the remote database locally?

  • User Avatar
    0
    okains created

    Hi,

    OK let me try that. It actually worked for about a day, no 400 errors however there was another Auth error on the web app that I created a separate ticket for. But now back to the 400 error on the login page. I think all of these issues are down to this db connection problem we are having.

    I will try locally and let you know.

    Thanks,

    Karim

  • User Avatar
    0
    okains created

    OK I made a copy of the ESv2-testing db on Azure, called it ESv2-support. Updated the OpenIddictApplications table and added my localhost urls where needed. Set the db connection locally on Auth / API projects to the ESv2-support db on Azure.

    Was able to get to the login page, logged in, then got an auth error but there were no [ERR] entries in the Auth log, so no issues with the connection it seems, though only ran it one time for now.

    Can check back on this in a few hours. If any further ideas please let me know.

    Thanks for your help,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    If the local production environment cannot reproduce the problem using a remote server, it means that there is a problem with Azure Deployment Slots.

    I'm not familiar with Azure. Can you change the running environment for testing? For example, Azure App Service

  • User Avatar
    0
    okains created

    Hi,

    Yeah I thought last week that this would be because of the Deployment Slots, so I removed all Deployment Slots. I now have 2 identical environments, Test and Production, but not using Deployment Slots at all. But still getting the same errors, I had hoped that would have fixed things.

    Currently the Production environment has been running without any issues, for the past few days, but Test has not, same codebase, same setup / configuration.

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    This must be an issue with the running environment. Are there any logs in the database when errors occur?

  • User Avatar
    0
    okains created

    Hi,

    I have full AppService logs here:

    https://1drv.ms/u/s!AkJmGuHQuob7kqUl46X837Ty84Ga_Q?e=zyRTum

    If you are after database logs can you be more specific about what you need?

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    hi

    The error logs of the An error occurred using the connection to database are caused by TaskCanceledException: A task was canceled in the Index page.

    OK I commented out the // Application = await OpenIdApplicationRepository.GetListAsync(); line, ?I am still getting the : [ERR] An error occurred using the connection to database 'ESv2-testing' on server 'tcp:hathor-hk.database.windows.net,1433'. error intermittently.

    Can you share these logs?

    Thanks

  • User Avatar
    0
    okains created

    Hi,

    The error logs of the An error occurred using the connection to database are caused by TaskCanceledException: A task was canceled in the Index page.

    I thought it was the other way around, the DB error was causing the TaskCancelledException. If the TaskCancelledException is causing the db error, then what is the root cause of the TaskCancelledException?

    OK I commented out the // Application = await OpenIdApplicationRepository.GetListAsync(); line,

    I uncommented that line, put it back in as it didn't seem to have any effect on the error. Are you asking me to comment out again and send logs?

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    TaskCancelledException may be the browser canceling the request.


    Are you asking me to comment out again and send logs?

    Yes, Please. Because I see the error caused by the Index page. so Please try to remove the db call from Index page.

  • User Avatar
    0
    okains created

    Hi,

    I have removed the db call from the Index Page. Just hardcoding the ID needed now. Still getting many of the same db connection issues in the logs, both AUTH and API. At this point though the behavior is that the App loses authentication after a few seconds, this is the same issue as documented in my other open ticket.

    Here are the logs for this deployment to https://test-dashboard.hathor.events :

    https://1drv.ms/u/s!AkJmGuHQuob7kqZeKS0nlUov2TXVHQ?e=ComGjG

    Thanks,

    Karim

  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    But I still can see the Index page sill uses the OpenIddictProDbContext

    at ESv2.Pages.IndexModel.OnGetAsync() in D:\a\Hathor\Hathor\aspnet-core\src\ESv2.AuthServer\Pages\Index.cshtml.cs:line 30
    
    4-09-21 13:53:36.610 +00:00 [ERR] An error occurred using the connection to database 'ESv2-testing' on server 'tcp:hathor-hk.database.windows.net,1433'.
    2024-09-21 13:53:36.610 +00:00 [DBG] A query was canceled for context type 'Volo.Abp.OpenIddict.EntityFrameworkCore.OpenIddictProDbContext'.
    2024-09-21 13:53:36.620 +00:00 [ERR] ---------- RemoteServiceErrorInfo ----------
    {
      "code": null,
      "message": "An internal error occurred during your request!",
      "details": null,
      "data": {},
      "validationErrors": null
    }
    
    2024-09-21 13:53:36.620 +00:00 [ERR] A task was canceled.
    System.Threading.Tasks.TaskCanceledException: A task was canceled.
       at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
       at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SplitQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SplitQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
       at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
       at Volo.Abp.Domain.Repositories.EntityFrameworkCore.EfCoreRepository`2.GetListAsync(Boolean includeDetails, CancellationToken cancellationToken)
       at Castle.DynamicProxy.AsyncInterceptorBase.ProceedAsynchronous[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo)
       at Volo.Abp.Castle.DynamicProxy.CastleAbpMethodInvocationAdapterWithReturnValue`1.ProceedAsync()
       at Volo.Abp.Uow.UnitOfWorkInterceptor.InterceptAsync(IAbpMethodInvocation invocation)
       at Volo.Abp.Castle.DynamicProxy.CastleAsyncAbpInterceptorAdapter`1.InterceptAsync[TResult](IInvocation invocation, IInvocationProceedInfo proceedInfo, Func`3 proceed)
       at ESv2.Pages.IndexModel.OnGetAsync() in D:\a\Hathor\Hathor\aspnet-core\src\ESv2.AuthServer\Pages\Index.cshtml.cs:line 30
    
  • User Avatar
    0
    maliming created
    Support Team Fullstack Developer

    Btw You can try using NullCancellationTokenProvider instead HttpContextCancellationTokenProvider to troubleshoot the A task was canceled. problem

    
    public override void ConfigureServices(ServiceConfigurationContext context)
    {
        context.Services.Replace(ServiceDescriptor.Transient<ICancellationTokenProvider, NullCancellationTokenProvider>());
            
    }
    
Made with ❤️ on ABP v9.1.0-preview. Updated on November 20, 2024, 13:06