動機
突然想起有k8s,就來看看 越看越像linux主機的抽象化,最後變成近乎linux主機的framework
架構: 把主機拆掉再變多
一開始主機是 一台電腦有
- 應用程式
- 檔案(設定檔)
- 處理連線的介面(load balancer或是firewall)
應用程式 & 虛擬的主機
都是共用同一台實體主機的資源
但應用程式越來越多會彼此影響 所以在同一台實體主機中用chroot與network namespace分開應用程式 但彼此之間要溝通就不能透過 記憶體與filesystem 他們已經被分開了 所以彼此之間用網路連接在一起 這就是container或是pod
如果每個應用程式都是用網路連接在一起,那應用程式還需要限定在同一台實體主機上嗎? 因此我們可以把實體主機變多,讓應用程式跑在不同的主機上 這就是cluster
對於使用者來說這些實體主機都是可以提供服務的,所以cluster也可以說是虛擬的主機
如果相同的應用程式跑在不同的主機上,都提供一樣的服務,這叫scaling 如果子任務跑在不同的主機上,來跑出計算結果,這叫parllelism
有了這麼多實體主機要怎麼分配pod? 在k8s就是deployment
處理連線的介面
原本在同一台主機上,使怎麼區分要把連線導到哪個應用程式? 用port,像http是80、https是443 在應用程式用bind就可以綁到指定的port
但現在每個應用程式都是被分開,所以會需要
- ip(在不同的network namespace)
- port(在第四層網路)
這個除了用bind外,還要用iptables,也就是firewall去控制ip的封包往那邊走
但這都還在同一台實體主機上
如果變成像cluster那樣的虛擬主機要怎麼辦?
基本上就是用主從式架構 一個收連線分配到某台cluster下的主機,等資料return到這裡
這就是master node的由來
接下去要問的是 怎麼使用服務?
現在master node知道服務的ip與port
所以要讓服務可以被使用,方法有
- port forwarding
- reverse proxy
port forwarding就是service(type: NodePort) reverse proxy就是ClusterIP(ingress可以想成加料過的loadbalancer)
service的NodePort & LoadBalancer差在? port開在哪裡, NodePort開在主機上, LoadBalancer會先拉一台主機出來再開port
LoadBalancer可以看成把master node的firewall部份抽出來單獨用
為什麼service再分配一個ClusterIP? 為了可以用service的name直接連到Pod,service的name會被放在kube-dns中 這樣用service的name做nslookup就會拿到ip,就像部屬一台server一樣
同樣都是讓服務可以被使用, 比起實體主機,為什麼k8s還多了ingress? 第一個是現在應用程式都有了自己的ip 第二個是一般常見的服務都是 network protocol綁定port的,像80,443
所以ingress可以想像成DNS不過是把path換成ip&port DNS是domain name換成ip
(有注意到嗎,在網站上,path是換成對應的controller,所以ingress的角色其實是與網站上的router相當)
service不加上selector(不是把流量導到cluster中)的話? service就是在實體主機上的bind 所以也可以把流量導到其他地方去,像
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
ports:
- protocol: TCP
port: 80
targetPort: 9376
apiVersion: v1
kind: Endpoints
metadata:
name: my-service
subsets:
- addresses:
- ip: 192.0.2.42
ports:
- port: 9376
Namespaces就是不同的cluster,其實就是不同的虛擬主機
部署服務
流程是
- deployment生pod
- service把port綁到指定的pod
- ingress把路徑綁到port(service)
指定某東西
如何指定就是用label,像
apiVersion: apps/v1beta2 # for kubectl versions >= 1.9.0 use apps/v1
kind: Deployment
metadata:
name: hello-deployment
spec:
replicas: 3
selector:
matchLabels: # here
app: my-deployment
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
template:
metadata:
labels: # here
app: my-deployment
spec:
containers:
- name: my-pod
image: zxcvbnius/docker-demo:latest
ports:
- containerPort: 3000
如果是在commandline上要指定的話
kubectl get pods -l environment=production,tier=frontend
kubectl get pods -l 'environment in (production),tier in (frontend)'
kubectl get pods -l 'environment in (production, qa)'
kubectl get pods -l 'environment,environment notin (frontend)'
另外可以用go-template,來客製顯示的資訊
[root@node root]# kubectl get pods --all-namespaces -o go-template --template='{{range .items}}{{.metadata.uid}}
{{end}}'
0313ffff-f1f4-11e7-9cda-40f2e9b98448
ee49bdcd-f1f2-11e7-9cda-40f2e9b98448
f1e0eb80-f1f2-11e7-9cda-40f2e9b98448
[root@node root]# kubectl get pods --all-namespaces -o go-template --template='{{range .items}}{{printf "|%-20s|%-50s|%-30s|\n" .metadata.namespace .metadata.name .metadata.uid}}{{end}}'
|console |console-4d2d7eab-1377218307-7lg5v |0313ffff-f1f4-11e7-9cda-40f2e9b98448|
|console |console-4d2d7eab-1377218307-q3bjd |ee49bdcd-f1f2-11e7-9cda-40f2e9b98448|
|cxftest |ipreserve-f15257ec-3788284754-nmp3x |f1e0eb80-f1f2-11e7-9cda-40f2e9b98448|
Horizontal Pod Autoscaling
Deployment控制Pod的數量 Horizontal Pod Autoscaling控制Deployment的數量
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: helloworld-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: helloworld-deployment
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
DaemonSet
如果每個Node都要跑這個pod,像是monitor或log 就可以用DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
# this toleration is to have the daemonset runnable on master nodes
# remove it if your masters can't run pods
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
管理Node
停用node: kubectl drain {node_name}
啟用node: kubectl uncordon {node_name}
限制資源
Pod本身可以限制或要求最多或最少需要多少資源
當然也可以從外部限制
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-high
spec:
hard:
cpu: "1000"
memory: 200Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["high"]
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["medium"]
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-low
spec:
hard:
cpu: "5"
memory: 10Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["low"]
apiVersion: v1
kind: Pod
metadata:
name: high-priority
spec:
containers:
- name: high-priority
image: ubuntu
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
resources:
requests:
memory: "10Gi"
cpu: "500m"
limits:
memory: "10Gi"
cpu: "500m"
priorityClassName: high
除了pod的限制,也能限制namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
requests.nvidia.com/gpu: 4
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: object-counts
spec:
hard:
configmaps: "10"
persistentvolumeclaims: "4"
pods: "4"
replicationcontrollers: "20"
secrets: "10"
services: "10"
services.loadbalancers: "2"
角色控制
Role是指定在一個namespace下可以操作的資源 ClusterRole是在任何namespace下都可以操作的資源
資源是在不同的apigroup下
用rolebinding把Role與user或group或service account綁在一起
儲存
在 使用檔案 之前我們需要 filesystem 在 使用filesystem 之前我們需要 做partition 在 做partition 之前我們需要 硬碟
filesystem 就是 volumne 做partition 就是 PersistentVolumeClaim 虛擬的硬碟(有特別的屬性) 就是 Storage Class 實體的硬碟 就是 Persistent Volume (docker的volumne)
這裡就轉貼一些別人的範例
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: us-west-2
reclaimPolicy: Delete # [Delete | Retain]
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce # [ReadWriteOnce | ReadOnlyMany | ReadWriteMany]
resources:
requests:
storage: 8Gi
storageClassName: standard # Ref
apiVersion: v1
kind: Pod
metadata:
name: apiserver
labels:
app: apiserver
tier: backend
spec:
containers:
- name: my-pod
image: zxcvbnius/docker-demo
ports:
- containerPort: 3000
volumeMounts: # \/ 看下面的volumes
- name: my-pvc
mountPath: "/tmp"
volumes:
- name: my-pvc
persistentVolumeClaim:
claimName: myclaim # Ref
Persistent Volume的例子 (注意到在這裡PVC沒有指定storageClassName,PVC直接從PV拿)
apiVersion: v1
kind: PersistentVolume <=== 指定物件種類為 PV
metadata:
name: pv001 <=== PV 名稱
spec:
capacity:
storage: 2Gi <=== 指定大小
accessModes:
- ReadWriteOnce <=== 指定存取模式
hostPath: <=== 綁定在 host 的 /tmp 目錄
path: /tmp
---
apiVersion: v1
kind: Pod <=== 使用一個 Pod 並試著掛載 Volume
metadata:
name: pvc-nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts: <=== 將名為 volume-pv 的 Volume 掛載到 /usr/share/nginx/html 目錄底下
- name: volume-pv
mountPath: /usr/share/nginx/html
volumes:
- name: volume-pv <=== 宣告一個名為 volume-pv 的 Volume 物件
persistentVolumeClaim: <=== 綁定名為 pv-claim 的 PVC 物件
claimName: pv-claim
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi <=== 要求 1G 容量
filesystem只能從硬碟開始嗎? 沒有現成的嗎?
有,
- 實體主機的檔案 (emptyDir, hostPath)
- NFS
apiVersion: v1
kind: Pod
metadata:
name: apiserver
spec:
containers:
- name: apiserver
image: zxcvbnius/docker-demo
volumeMounts:
- mountPath: /tmp
name: tmp-volume
imagePullPolicy: Always
volumes:
- name: tmp-volume
hostPath:
path: /tmp
- name: nfs-volumes
nfs:
server: {YOUR_NFS_SERVER_URL}
path: /
type: Directory
- name: cache-volume
emptyDir: {}
從剛剛的資料來源來看可分成
- 網路: NFS、Storage Class&PersistentVolumeClaim
- 實體主機: emptyDir、hostPath
哪虛擬主機有沒有自己的檔案(資料)? 有,ConfigMap 與 Secret
不過因為secret是放秘密的資料 所以也會透過env var傳
要注意到ConfigMap 與 Secret都是hash table, 也就是用key去取值
apiVersion: v1
kind: Pod
metadata:
name: apiserver
labels:
app: webserver
tier: backend
spec:
containers:
- name: nodejs-app
image: zxcvbnius/docker-demo
ports:
- containerPort: 3000
- name: nginx
image: nginx:1.13
ports:
- containerPort: 80
volumeMounts:
- name: nginx-conf-volume
mountPath: /etc/nginx/conf.d
env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
name: demo-secret-from-yaml
key: username
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
name: demo-secret-from-yaml
key: password
volumes:
- name: nginx-conf-volume
configMap:
name: nginx-conf
items:
- key: my-nginx.conf
path: my-nginx.conf
- name: secret-volume
secret:
secretName: demo-secret-from-yaml
只想跑某個指令
只跑一次: Job
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
也可以平行跑,見官方文件
反覆: CronJob
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: apline
args:
- /bin/sh
- -c
- echo "Hi, current time is $(date)"
restartPolicy: OnFailure