spark-on-k8s-operator简介
spark-on-k8s-operator: Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications.
spark-on-k8s-operator工作流程:
1、提交sparkApplication的请求到api-server
2、把sparkApplication的CRD持久化到etcd
3、operator订阅发现有sparkApplication,获取后通过submission 4、runner提交spark-submit过程,请求给到api-server后生成对应的driver/executor pod
5、spark pod monitor会监控到application执行的状态(所以通过sparkctl可以通过list、status看到)
mutating adminission webhook建svc,可以查看spark web ui
简而言之,spark-on-k8s-operator改变了传统的spark任务运行方式,能够提高资源利用率和节省资源。用户提交CRD之后,k8s才会创建运行spark任务需要的pod,从而能够利用整个k8s集群的资源。任务跑完之后,pod会被回收,从而节省资源。
参考文档: