JEP |
222 |
---|---|
Title |
WebSocket Support for Jenkins Remoting and CLI |
Sponsor |
|
Status |
Final 🔒 |
Type |
Standards |
Created |
2019-11-21 |
BDFL-Delegate |
TBD |
Jenkins offers an API allowing WebSocket services to be implemented.
This is available using the Stapler HttpResponse
idiom,
served as the final action after the regular Stapler routing mechanism.
Unlike regular HTTP endpoints, bidirectional streaming communication is possible.
The principal service exposed in this way is a new Remoting transport for inbound build agents at an HTTP(S) URL.
The agent itself (typically named agent.jar
) includes a WebSocket client capable of connecting to this endpoint.
This provides an alternative to using the JNLP4-connect
protocol on the Jenkins TCP port.
Otherwise the regular behavior of inbound agents (JNLPLauncher
) applies.
The Jenkins CLI could also be served via WebSocket in place of the SSHD port or the problematic “full-duplex HTTP” transport.
The implementation relies on APIs avalable in the built-in Jetty (“Winstone”) servlet container, but Jetty APIs are not re-exported to plugins. JSR 356 is not used on the server side.
A new jenkins.websocket
package contains an entry point WebSockets.upgrade
which issues an HTTP upgrade response and initiates a WebSocket connection.
The service implementor provides a WebSocketSession
implementation,
which both receives callbacks (for example when a text or binary frame is received)
and can be acted upon (for example to send a frame).
WebSocket ping/pong frames are not explicitly modelled. Rather, a service can elect to send pings automatically at a fixed interval. This suffices to keep a connection alive when a proxy server might otherwise have closed it.
An HTTP endpoint /wsagents/
, implemented by WebSocketAgents
,
forms the server side of an agent Remoting transport
suitable for use with a computer configured via JNLPLauncher
.
hudson.remoting.Engine
correspondingly can make a WebSocket connection to this address
and open the client side of a Remoting transport.
The client is implemented using the JSR 356 API and Project Tyrus library running in standalone mode.
The connection in this mode supports client authentication as well as Remoting protocol capability negotiation. There is no need to implement any encryption, as this is covered by HTTPS in Jenkins generally. Some of the complexity of the TCP transport regarding packet framing is also unnecessary, as WebSocket provides this.
Some affiliated APIs such as JnlpConnectionState
and JnlpAgentReceiver
may be implemented as time permits.
The Jenkins CLI after the removal of Remoting supported two transports:
SSH and a custom protocol over “full duplex” HTTP.
These are augmented with a simpler and less problematic WebSocket transport.
The existing /cli
endpoint is retained,
but a new /cli/ws
endpoint is added.
jenkins-cli.jar
can be asked to use the WebSocket transport with the -webSocket
option.
The client is implemented using the JSR 356 API and Project Tyrus library running in standalone mode.
The existing PlainCLIProtocol
is retained over the WebSocket transport,
though since framing is handled by the transport,
frames are not encoded in the protocol.
Thus every (binary) frame is a one-byte opcode followed by a payload.
Traditionally, inbound agents (using the somewhat misleadingly-named JNLPLauncher
)
have connected to a special TCP port in Jenkins handled by TcpSlaveAgentListener
.
This socket listener handles pluggable AgentProtocol
s,
such as JnlpSlaveAgentProtocol4
with the identifier JNLP4-connect
which implements a Remoting transport for agents with TLS encryption support.
If the administrator allows this port to be open (it is a security configuration option),
then agents first make an HTTP metadata request to Jenkins to find the (fixed or random) port number,
and then connect to the TCP port with the desired protocol.
The key problem with this system is not just the use of a secondary port to connect to Jenkins (besides 80/443),
but the fact that the port serves non-HTTP traffic.
If Jenkins is serving traffic directly to all clients (HTTP or not), this is not an issue,
but increasingly these connections are intermediated so as to allow TLS termination or network ingress.
Reverse proxies and L7 load balancers are normally handling HTTP traffic,
and multiplexing by Host
header or URI path demands it.
Ensuring that connections can be made to an opaque TCP port is often much trickier
and may constrain the choice of proxy software.
(The same issues apply to the SSH port used by some Jenkins features, notably the CLI. In principle the SSH protocol could support L7 proxies, but this is unlikely.)
When agents are to be connected to a master from outside the LAN, the Jenkins HTTP(S) URL is already specified; it is much less hassle if that is the only connection that needs to be routed.
The Jenkins CLI has long been able to run over the HTTP port. (When Remoting was used, this was offered as an alternative to the TCP port; after the removal of CLI-over-Remoting, the newer “plain text” protocol runs over HTTP.)
This transport is implemented via the FullDuplexHttpStream
and FullDuplexHttpService
APIs in Jenkins,
which effectively simulate a TCP socket using HTTP connections and “chunked” encoding.
Unfortunately this trick stretches the boundaries of acceptable HTTP handler behavior,
and is known to break in certain reverse proxies or otherwise cause complications.
For example, with nginx ingress for Kubernetes, the CLI will not work unless you set the annotation
nginx.ingress.kubernetes.io/proxy-request-buffering: off
.
Several alternate approaches to the fundamental problems listed above were explored.
Ideally the programmer interface to exposing a WebSocket service would follow JSR 356,
the javax.websocket
API (particularly Endpoint
, Session
, RemoteEndpoint
, and MessageHandler
).
After some exploration, however, this appeared difficult to implement in the context of Jenkins.
While Jetty includes an implementation of the JSR,
it is not aligned in any obvious way with the WebSocketServletFactory
interface
which allows a WebSocket upgrade from an existing servlet HTTP handler,
as would be present at the terminal stage of Stapler routing.
The Jakarta EE-style annotation-based registration (@ServerEndpoint
) would be acceptable
(at the expense of any integration with Stapler routing),
but merely adding the relevant Jetty modules to the runtime and using such annotations did not work.
Reusing Jetty’s JSR implementation classes (such as JsrSession
) did not seem feasible,
due to the number of @ManagedObject
s involved which would need to be “wired” into place.
Reimplementing JSR interfaces from scratch looked complicated,
and there would be many methods which are not needed for basic use cases
and would have no reasonable implementation based on delegating to what WebSocketServletFactory
offers.
Project Tyrus offers a “standalone” mode for serving WebSocket connections in an arbitrary Java program. This is intended to control the entire HTTP port service, however, and would likely clash with Jetty’s socket management if it worked at all. Listening on another HTTP port would add too much complexity to the Jenkins installation.
Therefore for now it was decided to keep the implementation simple and use what is known to work:
Jetty’s WebSocketServletFactory
.
Subsequent research may reveal a straightforward way to use the server mode of JSR 356 from Winstone/Stapler/Jenkins,
in which case the existing Jenkins APIs could be deprecated or amended to link to javax.websocket
.
org.eclipse.jetty.websocket.api
could have been exposed directly to Jenkins code,
assuming Jetty permits this class loader linkage.
However this would tie too much code to Jetty specifics,
and pose problems for users of non-Winstone containers.
By analogy with the JSR’s @ServerEndpoint
,
a Jenkins ExtensionPoint
could have been defined for each WebSocket-based service.
This would however clash with URIs used by the existing UnprotectedRootAction
interface
and not allow interoperation with other Stapler features such as hierarchical navigation
or with the standard Jenkins authentication filters.
gRPC was also considered as a mechanism for bidirectional streaming. It works at a higher layer than WebSocket, however; for purposes of a Remoting transport, for example, simple framing suffices, and there is no need for additional machinery (Remoting is after all another remote procedure call framework).
The use of HTTP/2 could also be problematic. It is several years newer than WebSocket, and likely has poorer compatibility with reverse proxies.
“Outbound” agents, those using any common launcher other than JNLPLauncher
(such as SSH),
do not suffer from the problem of exposing ports on the Jenkins master.
However, some users have difficulty setting up such agents:
-
Installing an SSH server on Windows has traditionally been cumbersome.
-
Many administrators have little familiarity with SSH and run into problems with obscure misconfigurations.
-
The network hosting the agent computer may not allow inbound connections (whereas we presume the network hosting the Jenkins master does, since it must serve a web UI).
Note that outbound agents remain a reasonable option for the Jenkinsfile Runner (JFR) scenario, where you would prefer for the Jenkins “master” to expose no ports. JENKINS-53461 allows only a TCP port to be exposed (no HTTP), though it would be better to expose neither.
Support for inbound WebSocket connections could be developed as a fresh ComputerLauncher
implementation.
However, this would fail to reuse a fair amount of subtle code
which is already available in JNLPLauncher
and the matching client code in agent.jar
,
such as the slave-agent.jnlp
endpoint and the secret handling system.
It seems simpler to behave as a mode of JNLPLauncher
selecting an alternate transport.
Rather than introducing a new agent option -webSocket
and making slave-agent.jnlp
and other launching code (such as in the Docker image and the kubernetes
plugin) aware of the choice,
the agent could try one transport, then fall back to the other.
This would minimize the number of components that need to be modified.
Besides making behavior more opaque and thus hard to diagnose, this has some problems. If WebSocket mode is preferred, agents which were working fine in TCP mode might suddenly switch behavior. Since the WebSocket code is new, this could be alarming. Also if the agent is inside the same local network as the master and TLS encryption is applied externally, this would mean loss of encryption of the Remoting channel.
If TCP mode is preferred, the behavior is more compatible,
but then when WebSocket connections are wanted,
there are extra network round trips in the best case
(to get the X-Jenkins-JNLP-Port
header from Jenkins over HTTP, then to make a TCP connection);
and in the worst case the TCP connection might hang rather than failing cleanly.
Rather than reusing PlainCLIProtocol
from the full-duplex HTTP transport,
the WebSocket-based CLI endpoint could be designed to be friendlier to generic clients such as websocat
.
For example, CLICommand.stdin
could be streamed from incoming frames,
and CLICommand.stdout
could be chunked into outgoing frames.
Some features of the Jenkins CLI fit naturally into this model,
such as the use of Accept-Charset
and Accept-Language
headers.
However, several obstacles seemed to make this approach more trouble than it would be worth:
-
Unlike the SSH protocol, there is no simple way to enumerate a command name and arguments in the request: you would need to use query parameters or HTTP headers in an awkward fashion. (And ensuring that arguments containing spaces or other special characters are supported would complicate the scheme.)
-
Again unlike SSH, there is no standard way to differentiate
stdout
fromstderr
; binary vs. text frames (respectively) could be used for this, but generic clients are unlikely to honor the distinction. -
Again unlike SSH, there is no standard way to represent an exit code: an HTTP header is not an option in interactive mode (the exit code is determined after the upgrade response is sent), so a special frame syntax would be needed.
If and when there is a need for a protocol which can be used easily from a generic client,
this could be implemented in a plugin.
In fact the CLI Commander plugin
already does something similar
for the case of noninteractive commands
(though in that case the primary use case is via a browser UI rather than something like curl
).
The new transport for the CLI could be made the default,
perhaps a fallback to -http
mode in case the server is too old or does not support WebSocket
(both of which can be detected via a 404 response code from /cli/ws
).
The considerations which led to a conservative choice of transport selection for agents are less relevant in this case:
-
The previous (but post-Remoting) transport is based on HTTP(S) and offers no encryption of its own.
-
Performance is typically not a key consideration.
-
The client is often used interactively, so a regression is less likely to mean an urgent outage after upgrade.
-
An explicit gesture must be made to download a new version of the client. (Sometimes also true for agents, depending on the details of the launch mode.)
Nonetheless, for now it is safest to not change the default behavior. A change of default can be considered after the new implementation has been field-tested.
A single agent.jar
can make either TCP or WebSocket inbound connections
via either hudson.remoting.Main
with -jnlpUrl
or hudson.remoting.jnlp.Main
(along with “outbound” modes typically selected via hudson.remoting.Main
without -jnlpUrl
).
Therefore it must be able to decide which to use in a given circumstance:
some servers will support only TCP, some only WebSocket, some both.
Since the WebSocket mode is activated only with a -webSocket
option to the launcher,
existing agent installations are unaffected.
The new jenkins-cli.jar
continues to run in -http
mode by default for now.
-webSocket
mode can still be selected if desired.
-ssh
mode is unaffected.
Jenkins is occasionally run in other servlet containers such as Tomcat (or even Jetty but not using the built-in Winstone launcher). WebSocket support will not be offered in these modes, and dependent features such as WebSocket-based agents will not be available. There should be no loss of functionality for these users.
(The Jenkins project rarely if ever tests these scenarios and occasionally breaks them inadvertently. Users are encouraged to run Winstone. A future JEP may explicitly drop support for custom containers.)
The WebSockets.upgrade
return value is used as the return value (or throwable) of a regular Stapler web method,
terminating the Stapler handling process.
Thus service implementors are free to use the usual Stapler/Jenkins URI routing techniques
such as TransientActionFactory
or Java getters.
Regular Jenkins servlet filters also handle request authentication,
and Stapler routing will then follow AccessControlled
permission checks.
If Jenkins authentication is unwanted (as it is for handling JNLPLauncher
),
the usual UnprotectedRootAction
API makes it textually clear that the implementation is opting out of access control.
Inbound agents traditionally have authenticated to a particular Jenkins SlaveComputer
by using a secret token (an HMAC of the agent name).
This is necessary since Jenkins lacks service accounts;
otherwise a build machine would need to store the personal API token of a Jenkins user,
which could be abused to perform unrelated actions.
The WebSocket-based agent service retains this system: the HTTP connection is made anonymously, and the secret is passed in a header.
Unlike the JNLP4-connect
protocol, which impls a custom TLS handshake,
any encryption of traffic between the agent and the master is done either by the servlet container
or by some reverse proxy in front of Jenkins.
WebSocketAgentsTest
provides a functional test demonstrating that the agent can connect to a WebSocket endpoint on localhost.
(The existing JNLPLauncherTest
continues to test TCP connections using JNLP4-connect
.)
Several sanity checks were performed of using the WebSocket protocol to set up a bidirectional connection with Jenkins, or run a (Pipeline) build on an inbound agent, under complex realistic conditions:
-
Against a CloudBees Core installation running on EKS using the nginx ingress controller terminating TLS.
-
Against CloudBees Core running on GKE using Google’s native ingress controller based on an external load balancer.
-
Against CloudBees Core running on OpenShift 4.2 using a
Route
and TLS termination.
Connecting directly to Jenkins also works. Other reverse proxies, such as Apache, have not been specifically tested.
Basic connectivity and “keep-alive” behavior can be established using a script such as:
(while :; do date; sleep 5m; done) | websocat -vv wss://$jenkins/wsecho
The main finding was that GKE requires minor customization to service definitions to prevent the connection from closing too soon:
apiVersion: v1
kind: Service
metadata:
name: jenkins
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"80":"jenkins"}}'
type: NodePort
# …
---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: jenkins
spec:
timeoutSec: 999999
and nginx requires a WebSocket ping/pong at less than 60s intervals.