Website Migration Notice: SafePoint is now operated by CyberServal.Learn more →
DiscussionSLA

failed to detector req: failed to get socket: failed to connect to t1k server

Published 4 months ago

# SafeLine WAF
# 🐞 bug

Published 4 months ago

profile_photo

Phuong

Updated 4 months ago

0

Hi Safeline Support Team,

We are currently using Safeline Version 9.3.1 integrated with ingress-nginx via the Lua plugin. The integration is working well; however, we are experiencing intermittent connection errors between Ingress-Nginx and the Safeline Detector.

  1. Environment Details:

Safeline Version: 9.3.1

Integration: Ingress-Nginx (Lua T1K plugin)

Deployment: Kubernetes (K8s)

Detector Service Address: safeline-safeline-lts-detector.ingress-nginx.svc.cluster.local.:8000

  1. Error Logs in Ingress-Nginx: We frequently see the following error in our Ingress-Nginx logs:

2026/01/28 04:55:06 [error] 402#402: *17983 [lua] main.lua:47: failed to detector req: failed to get socket: failed to connect to t1k server safeline-safeline-lts-detector.ingress-nginx.svc.cluster.local.:8000: timeout

  1. Error Logs in Safeline-lts-detector: At the same time, the detector logs show:

WARN snserver_engine::detector_serve::t1k: read T1K packet error: io error

  1. Observations:

This issue occurs intermittently, not for all requests.

It seems like a connection timeout or packet handling issue during peak or specific traffic patterns.

Could you please help us check:

Are there any recommended timeout settings for the Lua plugin when connecting to the T1K server?

Does the read T1K packet error: io error indicate a resource limit on the detector side or a protocol mismatch?

Should we adjust the TCP keep-alive or worker settings for the safeline-lts-detector?

Looking forward to your guidance to resolve this.

profile_photo

Carrie

Updated 4 months ago

0

Please also provide the following necessary info:

  1. The specifications of the machine where SafeLine is installed, such as CPU, memory, etc.

  2. The specifications of the T1K machine.

  3. If there was any increase in CPU or memory usage when this issue occurred, i.e., the traffic volume during that period.

profile_photo

Phuong

Updated 4 months ago

Hi Carrie

  1. Machine & T1K Specifications: Since we are running Safeline in a Kubernetes environment, both the main components and the T1K detector (safeline-lts-detector) are deployed with the following allocated resources:

CPU: 4 Cores

Memory: 4GB

Deployment Environment: Kubernetes Cluster (Ingress-Nginx integration via Lua)

  1. Resource Usage during the issue: Based on our monitoring system (Perfman/Grafana), the resource consumption remains stable even when the "T1K connection timeout" occurs:

CPU Usage: Average 8% (No spikes observed).

Memory Usage: Average 75% (Stable, no abnormal jumps).

Network Throughput: Average 1 Mbps.

  1. Observations: The performance metrics do not show any saturation or exhaustion of resources. The io error in the detector logs and the timeout in Ingress-Nginx happen while the system is running well within its capacity.

This suggests that the issue might be related to the internal handling of T1K packets or a specific timeout configuration within the Safeline engine/Lua plugin rather than infrastructure limitations.

profile_photo

Carrie

Updated a month ago

0

Sorry we missed this issue and just found out we didn't reply.

The timeout setting can be adjusted based on your specific needs — for time-sensitive operations, a lower value is recommended.

If network bandwidth is not a bottleneck, then the issue is likely due to resource limits.

Increasing the number of workers should be accompanied by allocating additional resources.

By default, if timeout is not configured, it is set to 1 second.

profile_photo

Petrovich

Updated 22 days ago

0

Hi!
Is it ok to run several detector replicas? What abou scaling other resources?

profile_photo

Meowth

Updated 21 days ago

Only one detector is allowed