Status code 504 in workflow

nutella · March 22, 2022, 9:11am

Hi!

I am having trouble in my workflow. This workflow calls an API, iterates through a list and obtains data to call another API, then collects the data in the second list and saves them in a new record. This is how the workflow looks like.

When I run the workflow, the records are created and saved, but then the workflow has an error at some point. “Request failed with status code 504”

The errors also does not occur at the same time, sometimes after 170 records, sometimes after 180 records.

The first list has around 30 items and the second list has an average of around 20 items.
9 fields are obtained and saved into each new record.
I am not sure what the issue is here.

Thank you very much for the help!

munawir · March 23, 2022, 2:59pm

check your infrastructure? this is timeout from the server
so you could configure the server to higher number

nutella · March 24, 2022, 2:27am

Do you what configuration exactly should be changed? I tried to edit some of the time_out fields in the system configuration from Server configuration :: Corteza Docs , but I am still having the same problem.

From the documentation, I saw that workflows should run indefinitely, is there a timeout when calling external API?

Thank you for the help!

munawir · March 24, 2022, 12:25pm

@nutella 504 is HTTP error not related to Corteza
check your server’s configuration (or your loadbalancers) to increase the timeout

nutella · March 28, 2022, 2:06am

The configurations for the timeout on the server seem to be fine, but the error still occurs. Is it possible the error is caused by the API calling?

tjerman · March 28, 2022, 6:48am

If you call some external service then yes; if that request errors out then the workflow would also.
If that is the case, you can use error handlers to catch errors.

nutella · March 28, 2022, 6:59am

I did some testing with the workflow and the external APIs are working perfectly, when I decrease the size of the payload/arrays, the data is able to be saved in the records.

If status code 504 is not related to Corteza, does it mean there is a time out configuration that I must have missed that is stopping the workflow?

munawir · March 28, 2022, 10:06am

@nutella
how many minutes does your server to timeout?

I encountered something like this before, my LoadBlancer timeout configuration was 3 minutes I increased it to 15 minutes.

to easily find out what’s your infrastructure timeout find/create a lengthy workflow (iterate over a lot of records) then run that workflow manually while you in the page and see how many minutes to get 504 status code

I think you better contact someone who has knowledge with hosting your servers

Mike · April 6, 2022, 9:16pm

Hi @munawir ,

I got a lot of 504 on my server with a workflow. It crashes corteza-server and all associated services (corredor, DB, etc) become instable. It’s like when something start to be crazy, all others service become the same.
I get load on server and the only way is to stop/restart docker.

It looks corteza-server has bad DB/query/workflow management with lot of data and looks totally instable… I’m on nginx/percona with version 2021.9.8.

How did u solved that? Just with increased to 15 minutes ?.. did you get like me, 504 generated and then you can do nothing?

If you have 1000 records to add as a loop, it’s incredible to wait for me so long time.
As bad user experience we are on the top…

Thanks !

munawir · April 8, 2022, 4:42am

I think I got your issue

sometimes when I hit the server too many times with heavy automation/rest API requests … it suddenly lags and doesn’t perform any write operation till I restart the docker/server

@tjerman what do you think is the reason ?

Mike · April 10, 2022, 9:30pm

yes it’s for me something very critical on corteza… it’s crashing totally server… (I mean corteza-server) and the only way is to restart docker server (if you can else sometimes you’ve to restart server due to too much load/memory problem…

that’s a very critical point. on my side I consider Corteza cannot be used on production with that problem… it’s working for small things but not in real prod env…

peter · April 11, 2022, 6:42am

Hi @Mike @munawir @nutella

thanks for the report, we are aware of a potential data race in the workflow/API gateway subsystem and are trying our best to weed the bugs out.

Your reports definitely help, so I would ask you if you have any more info under what kind of circumstances those issues happen.

@nutella , thanks for the workflow example, that definitely helps and once we get some more info on it, we’ll post the update here.
Do you have issues with the payload size on HTTP request?
Do the issues start with the second iterator?
Is the size of the response an issue?
Is the amount of items on one of the iterators an issue?

I will check but the more info we get, the easier (not easy :)) it gets to fix these kinds of issues, thanks.

Cheers, Peter

munawir · April 11, 2022, 9:54pm

Hey @peter
I can help to reproduce the issue … and I’ll write more scenarios when I get back to the office