- ABP Framework version: v7.4.0
- UI Type: Angular
- Database System: EF Core (PostgreSQL)
- Tiered (for MVC) or Auth Server Separated (for Angular): yes
Hello I am trying to find out bottlenecks inside my app with stress testing. I am preparing the server to handle more than 5000 device requests. It is going to be 1 get / post request in every 5 secs for each device. Each device is sending a soap xml format when they do post and when they do get, they are getting a response as xml from the server. To serve xml to device, I use Volo.Abp.TextTemplating.Scriban
I prepare my controllers to connect to database less frequently so i do not create bottleneck in database. To do that i use Microsoft Orleans. Holding the data in memory then saving it to database in every 5 mins since it is not a critical data that comes from the device. However for some critical data sometimes I need to persist the data, this is happening at first request from each device. I use k6 for stress test. I prepared a test for 3000 devices. I also make a setup for pgbouncer in front of postgresql for connection pooling. Things are okay until 2000 devices after this number I am getting this exception.
2023-10-16 22:03:31.077 +02:00 [ERR] The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(TaskNode asyncWaiter, Int32 millisecondsTimeout, CancellationToken cancellationToken) at Volo.Abp.Threading.SemaphoreSlimExtensions.LockAsync(SemaphoreSlim semaphoreSlim, CancellationToken cancellationToken) at Volo.Abp.Caching.DistributedCache
2.GetOrAddAsync(TCacheKey key, Func
1 factory, Func1 optionsFactory, Nullable
1 hideErrors, Boolean considerUow, CancellationToken token) at Volo.Abp.TextTemplateManagement.TextTemplates.DatabaseTemplateContentContributor.GetOrNullAsync(TemplateContentContributorContext context) at Volo.Abp.TextTemplating.TemplateContentProvider.GetContentOrNullAsync(ITemplateContentContributor[] contributors, TemplateContentContributorContext context) at Volo.Abp.TextTemplating.TemplateContentProvider.GetContentOrNullAsync(TemplateDefinition templateDefinition, String cultureName, Boolean tryDefaults, Boolean useCurrentCultureIfCultureNameIsNull) at Volo.Abp.TextTemplating.TemplateRenderingEngineBase.GetContentOrNullAsync(TemplateDefinition templateDefinition) at Volo.Abp.TextTemplating.Scriban.ScribanTemplateRenderingEngine.RenderSingleTemplateAsync(TemplateDefinition templateDefinition, Dictionary2 globalContext, Object model) at Volo.Abp.TextTemplating.Scriban.ScribanTemplateRenderingEngine.RenderInternalAsync(String templateName, Dictionary
2 globalContext, Object model) at Volo.Abp.TextTemplating.Scriban.ScribanTemplateRenderingEngine.RenderAsync(String templateName, Object model, String cultureName, Dictionary2 globalContext) at Volo.Abp.TextTemplating.AbpTemplateRenderer.RenderAsync(String templateName, Object model, String cultureName, Dictionary
2 globalContext) at Doohlink.MagicInfo.Envelopes.Renderers.EnvelopeRenderingService1.RenderAsync(EnvelopeHeader header, TModel body) in C:\Development\Projects\Examples\Doohlink\aspnet-core\modules\Doohlink.MagicInfo\src\Doohlink.MagicInfo.Domain\Envelopes\Renderers\EnvelopeRenderingService.cs:line 30 at Doohlink.MagicInfo.Handlers.CommandHandler.HandleAsync(Envelope
1 envelope) in C:\Development\Projects\Examples\Doohlink\aspnet-core\modules\Doohlink.MagicInfo\src\Doohlink.MagicInfo.Application\Handlers\CommandHandler.cs:line 90 at Doohlink.MagicInfo.Handlers.EnvelopeHandler.HandlePostAsync(String body) in C:\Development\Projects\Examples\Doohlink\aspnet-core\modules\Doohlink.MagicInfo\src\Doohlink.MagicInfo.Application\Handlers\EnvelopeHandler.cs:line 62
and after the stress test finished, the application can not connect to redis anymore. i need to flush redis. Also you can see that inside the logs.
2023-10-16 22:42:47.163 +02:00 [WRN] The message timed out in the backlog attempting to send because no connection became available, command=HMGET, timeout: 5000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 89, qs: 0, aw: True, bw: CheckingForTimeoutComplete, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 0, cur-in: 0, sync-ops: 833, async-ops: 49078, serverEndpoint: localhost:6379, conn-sec: 1484.52, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: DESKTOP-NHAEDKT(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=47,Free=32720,Min=6,Max=32767), POOL: (Threads=52,QueuedItems=441,CompletedItems=2340351,Timers=7635), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) StackExchange.Redis.RedisTimeoutException: The message timed out in the backlog attempting to send because no connection became available, command=HMGET, timeout: 5000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 89, qs: 0, aw: True, bw: CheckingForTimeoutComplete, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 0, cur-in: 0, sync-ops: 833, async-ops: 49078, serverEndpoint: localhost:6379, conn-sec: 1484.52, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: DESKTOP-NHAEDKT(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=47,Free=32720,Min=6,Max=32767), POOL: (Threads=52,QueuedItems=441,CompletedItems=2340351,Timers=7635), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor
1 processor, ServerEndPoint server, T defaultValue) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2099 at StackExchange.Redis.RedisDatabase.HashGet(RedisKey key, RedisValue[] hashFields, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 405 at Microsoft.Extensions.Caching.StackExchangeRedis.RedisExtensions.HashMemberGet(IDatabase cache, String key, String[] members) at Microsoft.Extensions.Caching.StackExchangeRedis.RedisCache.GetAndRefresh(String key, Boolean getData) at Volo.Abp.Caching.DistributedCache
2.Get(TCacheKey key, Nullable`1 hideErrors, Boolean considerUow) 2023-10-16 22:42:47.163 +02:00 [WRN] ---------- Exception Data ---------- Redis-Message = HMGET c:Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en Redis-Timeout = 5000 Redis-Write-State = Idle Redis-Read-State = ReadAsync Redis-OutboundDeltaKB = 0KiB Redis-InboundDeltaKB = 0KiB Redis-OpsSinceLastHeartbeat = 0 Redis-Queue-Awaiting-Write = 89 Redis-Queue-Awaiting-Response = 0 Redis-Active-Writer = True Redis-Backlog-Writer = CheckingForTimeoutComplete Redis-Inbound-Bytes = 0 Redis-Inbound-Pipe-Bytes = 0 Redis-Outbound-Pipe-Bytes = 0 Redis-Last-Result-Bytes = 0 Redis-Inbound-Buffer-Bytes = 0 Redis-Sync-Ops = 833 Redis-Async-Ops = 49078 Redis-Server-Endpoint = localhost:6379 Redis-Server-Connected-Seconds = 1484.52 Redis-Abort-On-Connect = 1 Redis-Multiplexer-Connects = 1/1/0 Redis-Manager = 10 of 10 available Redis-Client-Name = DESKTOP-NHAEDKT(SE.Redis-v2.6.122.38350) Redis-ThreadPool-IO-Completion = (Busy=0,Free=1000,Min=1,Max=1000) Redis-ThreadPool-Workers = (Busy=47,Free=32720,Min=6,Max=32767) Redis-ThreadPool-Items = (Threads=52,QueuedItems=441,CompletedItems=2340351,Timers=7635) Redis-Busy-Workers = 47 Redis-Version = 2.6.122.38350 redis-command = HMGET c:Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en request-sent-status = WaitingInBacklog redis-server = localhost:6379
So couple of questions, - I understand from the exception that it is trying to get the template text from DatabaseTemplateContentContributor, at Volo.Abp.TextTemplateManagement.TextTemplates.DatabaseTemplateContentContributor.GetOrNullAsync(TemplateContentContributorContext context) I assume this is coming from Text Template Management Module (https://docs.abp.io/en/commercial/latest/modules/text-template-management#text-template-management-module) and since this is the last contributor, it is the first to try.
Since my text templates doesn't use any localization is it possible to use VirtualFileTemplateContentContributor instead? Of course I can add another ITemplateContentContributor. I just wonder if there is any other way to avoid trying the last contributor and only use VirtualFileTemplateContentContributor, since I don't want the code to go to redis cache or database at all. All I need is to get the template from virtual file system and replace it with the model.
- Also I wonder what is happening in this setup, Is the bottleneck happening from the redis cache or from the database connection? At first thought, I assume it is sth going on with Redis Cache, Cause if the templates are cached, it won't go to database after some time for the same template. But strange thing over here is if i increase the connection pool size in database, I can increase the concurrent users that the server handles. With 150 max connections and max pool size of 150 i can handle 2000 users, if i increase the pool size to 250 then the server can handle 3000 users with the same test. So it seems database connection pool has a role over here, but I am not expecting for the code to create a connection with database. Why it is happening? Can it be a bug when it tries to get the template from the cache? so it goes to database instead?
By the way I disabled auditlogging all over the application with this configuration. any help would be appreciated.
Configure<AbpAuditingOptions>(options =>
{
options.IsEnabled = false; //Disables the auditing system
});
7 Answer(s)
-
0
Since my text templates doesn't use any localization is it possible to use VirtualFileTemplateContentContributor instead? Of course I can add another ITemplateContentContributor. I just wonder if there is any other way to avoid trying the last contributor and only use VirtualFileTemplateContentContributor, since I don't want the code to go to redis cache or database at all. All I need is to get the template from virtual file system and replace it with the model.
You can remove
DatabaseTemplateContentContributor
from theAbpTextTemplatingOptions
.For example:
public override void ConfigureServices(ServiceConfigurationContext context) { Configure<AbpTextTemplatingOptions>(options => { options.ContentContributors.RemoveAll(x => x == typeof(DatabaseTemplateContentContributor)); }); }
-
0
Also I wonder what is happening in this setup, Is the bottleneck happening from the redis cache or from the database connection? At first thought, I assume it is sth going on with Redis Cache,
This looks like a bottleneck for redis.
-
0
Hello, Thanks for the answer.
public override void ConfigureServices(ServiceConfigurationContext context) { Configure(options => { options.ContentContributors.RemoveAll(x => x == typeof(DatabaseTemplateContentContributor)); }); }
I only want to remove this option when I use the specific templates, for the other templates like emailing, i think i need this. I don't think there is an option for that right now?
To overcome the problem for now i am adding prefix to the name of the templates that I want to exclude, and added another template content contributor
public class MyVirtualFileTemplateContentContributor : VirtualFileTemplateContentContributor, ITransientDependency { private readonly ILocalizedTemplateContentReaderFactory _localizedTemplateContentReaderFactory; public MyVirtualFileTemplateContentContributor( ILocalizedTemplateContentReaderFactory localizedTemplateContentReaderFactory) : base( localizedTemplateContentReaderFactory) { _localizedTemplateContentReaderFactory = localizedTemplateContentReaderFactory; } public override async Task<string> GetOrNullAsync(TemplateContentContributorContext context) { if (!context.TemplateDefinition.Name.Contains(MyTemplateDefinitionProvider.TemplateGroup)) { return null; } var localizedReader = await _localizedTemplateContentReaderFactory .CreateAsync(context.TemplateDefinition); return localizedReader.GetContentOrNull( null ); } }
so i fixed the problem in that way.
For the second question, I am not so sure about if it is a redis problem. I do not understand why redis should have a bottleneck with 2000 requests. Also I am not sure why the redis trying to get the value of AbpExceptionHandling_en from Volo.Abp.LanguageManagement.Texts
redis-command = HMGET c:Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en
With DatabaseTemplateContentContributor, the code will go through and tries to get the value with localization from the repository, so every request that hits the backend will try to go to db first
here you can see that the code will check the db and since it is a static store it will return null. And this has been tried with localization "en" and couldn't find it. Next try will be with null, and in that case the same thing will happen from dbcontributor, if you have en-En culture it will do 3 times
so it will look twice to database at least. In this process i was thinking if every request is creating 2 different connection to db it takes longer time to get the result or maybe sth else.Then it is triggering the redis overload cause it seems like when exception is happening Abp is trying to find some localization parameters from redis cache.
At the end of this story, I fixed my problem by not going to db or redis at all, with implementing ITemplateContributor. Still not sure how much clients can my app serve if there is a need to go to db for crud operations or just for querying purpose. Should i create redis cluster? should i do postgres sharding or just do load balancing? I am so confused.
Is there any benchmark that has been done with abp template for stress testing? I don't want to deploy the app and having surprises.
Since it is gonna be iot devices that will connect to server it can increase the load very much. I need to find the sweet spot for 1 instance so i can do the decisions for sharding and clusters. I am not expecting more than 10 000 devices inside the system. So i was expecting to handle that much load with 1 instance.
-
0
I only want to remove this option when I use the specific templates, for the other templates like emailing, i think i need this. I don't think there is an option for that right now?
You can replace the
DatabaseTemplateContentContributor
with your own. for example:[ExposeServices(typeof(MyDatabaseTemplateContentContributor), typeof(ITemplateContentContributor))] public class MyDatabaseTemplateContentContributor : DatabaseTemplateContentContributor { public override async Task<string> GetOrNullAsync(TemplateContentContributorContext context) { if(context.TemplateDefinition.Name == "MyTemplate.....") { return null; // skip } return await Cache.GetOrAddAsync( new TemplateContentCacheKey(context.TemplateDefinition.Name, context.Culture), async () => await GetTemplateContentFromDbOrNullAsync(context), () => new DistributedCacheEntryOptions { SlidingExpiration = Options.MinimumCacheDuration } ); } } public override void ConfigureServices(ServiceConfigurationContext context) { Configure<AbpTextTemplatingOptions>(options => { options.ContentContributors.RemoveAll(x => x == typeof(DatabaseTemplateContentContributor)); }); }
To overcome the problem for now i am adding prefix to the name of the templates that I want to exclude, and added another template content contributor
That's fine.
here you can see that the code will check the db and since it is a static store it will return null. And this has been tried with localization "en" and couldn't find it. Next try will be with null, and in that case the same thing will happen from dbcontributor, if you have en-En culture it will do 3 times
Because the template system supports multiple languages, if the current language template does not exist, it will try to regress to the default language. this is by design.
You can override the default implementation if you want.
Is there any benchmark that has been done with abp template for stress testing? I don't want to deploy the app and having surprises. Since it is gonna be iot devices that will connect to server it can increase the load very much. I need to find the sweet spot for 1 instance so i can do the decisions for sharding and clusters. I am not expecting more than 10 000 devices inside the system. So i was expecting to handle that much load with 1 instance.
No, we didn't do such a test. I think you must know the 80-20 rule, you can optimize for specific templates, such as using memory cache, which will greatly improve performance.
-
0
Hello, thanks for the answers. I don't know about the 80-20 rule. You mean pareto principle? I already use the caching, my problem is about finding the limit of how much devices i can serve. If i can find it then i can plan it accordingly, by using clusters or with other solutions. In abp system, I have trouble understanding the cause, there are bunch of things going on the background for an http request which comes from middleware, by logging or opening a transactional unitofwork writing audit logs to database and etc. That's fine and i am not complaining about it. But it is difficult to read the errors as i show it on my past answer.
For ex i can not understand why abp is trying to get the Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en text from redis cache? redis-command = HMGET c:Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en
is this coming from after another exception or it is because it can not reach the redis server since it is bloated? Since this is kind of abstract discussion, I think i will move on and see with my new code how many problems i am gonna get.
I really appreciate the answers though, thank you.
-
0
Hi,
For ex i can not understand why abp is trying to get the Volo.Abp.LanguageManagement.Texts,k:Adzup:AbpExceptionHandling_en text from redis cache?
The LanguageManagement module provides the ability to dynamic language text, but we don't want to query the database every time, so we use cache to improve performance.
Because ABP framework provides a lot of infrastructure: multilingual, multi-tenant, auditing, etc. Of course, they will affect performance, but it is enough for regular web applications.
my problem is about finding the limit of how much devices i can serve. If i can find it then i can plan it accordingly, by using clusters or with other solutions
Sorry, I can't give an answer. You can test it and find the point that affects performance. I will try to help you to improve it.
-
0
Thanks a lot closing the ticket for now, if i find sth i can create a new ticket. Thanks for the answers.