debian-mirror-gitlab/doc/architecture/blueprints/gitlab_agent_deployments/index.md
2023-04-23 21:23:45 +05:30

22 KiB

status creation-date authors coach approvers owning-stage participating-stages
proposed 2022-11-23
@shinya.maeda
@DylanGriffith
@nagyv-gitlab
@cbalane
@hustewart
@hfyngvason
~devops::release
Configure
Release

View and manage resources deployed by GitLab Agent For Kuberenetes

Summary

As part of the GitLab Kubernetes Dashboard epic, users want to view and manage their resources deployed by GitLab Agent For Kuberenetes. Users should be able to interact with the resources through GitLab UI, such as Environment Index/Details page.

This blueprint describes how the association is established and how these domain models interact with each other.

Motivation

Goals

Non-Goals

Proposal

Overview

  • GitLab Environment and Agent-managed Resource Group have 1-to-1 relationship.
  • Agent-managed Resource Group tracks all resources produced by the connected agent. This includes not only resources written in manifest files but also subsequently generated resources (e.g. Pods created by Deployment manifest file).
  • Agent-managed Resource Group renders dependency graph, such as Deployment => ReplicaSet => Pod. This is for providing ArgoCD-style resource view.
  • Agent-managed Resource Group has the Resource Health status that represents a summary of resource statuses, such as Healthy, Progressing or Degraded.
flowchart LR
  subgraph Kubernetes["Kubernetes"]
    subgraph ResourceGroupProduction["ResourceGroup"]
      direction LR
      ResourceGroupProductionService(["Service"])
      ResourceGroupProductionDeployment(["Deployment"])
      ResourceGroupProductionPod1(["Pod1"])
      ResourceGroupProductionPod2(["Pod2"])
    end
    subgraph ResourceGroupStaging["ResourceGroup"]
      direction LR
      ResourceGroupStagingService(["Service"])
      ResourceGroupStagingDeployment(["Deployment"])
      ResourceGroupStagingPod1(["Pod1"])
      ResourceGroupStagingPod2(["Pod2"])
    end
  end

  subgraph GitLab
    subgraph Organization
      subgraph Project
        environment1["production environment"]
        environment2["staging environment"]
      end
    end
  end

  environment1 --- ResourceGroupProduction
  environment2 --- ResourceGroupStaging
  ResourceGroupProductionService -.- ResourceGroupProductionDeployment
  ResourceGroupProductionDeployment -.- ResourceGroupProductionPod1
  ResourceGroupProductionDeployment -.- ResourceGroupProductionPod2
  ResourceGroupStagingService -.- ResourceGroupStagingDeployment
  ResourceGroupStagingDeployment -.- ResourceGroupStagingPod1
  ResourceGroupStagingDeployment -.- ResourceGroupStagingPod2

Existing components and relationships

  • GitLab Project and GitLab Environment have 1-to-many relationship.
  • GitLab Project and Agent have 1-to-many direct relationship. Only one project can own a specific agent.
  • GitOps mode
  • CI Access mode
    • GitLab Project and Agent have many-to-many indirect relationship. The project owning the agent can share the access with the other proejcts. (NOTE: Technically, only running jobs inside the project are allowed to access the cluster due to job-token authentication.)
    • Agent and Agent-managed Resource Group do NOT have relationships yet.

Issues

  • Agent-managed Resource Group should have environment ID as the foreign key, which must be unique across resource groups.
  • Agent-managed Resource Group should have parameters how to group resources in the associated cluster, for example, namespace, lable and inventory-id (GitOps mode only) can passed as parameters.
  • Agent-managed Resource Group should be able to fetch all relevant resources, including both default resource kinds and other Custom Resources.
  • Agent-managed Resource Group should be aware of dependency graph.
  • Agent-managed Resource Group should be able to compute Resource Health status from the associated resources.

Example: Pull-based deployment (GitOps mode)

NOTE: At the moment, we focus on the solution for CI Access mode. GitOps mode will have significant architectural changes outside of this blueprint, such as Flux switching and Manifest projects outside of the Agent configuration project. In order to derisk potential rework, we'll revisit the GitOps mode after these upstream changes have been settled.

Example: Push-based deployment (CI access mode)

This is an example of how the architecture works in push-based deployment. The feature is documented here as CI access mode.

flowchart LR
  subgraph ProductionKubernetes["Production Kubernetes"]
    subgraph ResourceGroupProductionFrontend["ResourceGroup"]
      direction LR
      ResourceGroupProductionFrontendService(["Service"])
      ResourceGroupProductionFrontendDeployment(["Deployment"])
      ResourceGroupProductionFrontendPod1(["Pod1"])
      ResourceGroupProductionFrontendPod2(["Pod2"])
    end
    subgraph ResourceGroupProductionBackend["ResourceGroup"]
      direction LR
      ResourceGroupProductionBackendService(["Service"])
      ResourceGroupProductionBackendDeployment(["Deployment"])
      ResourceGroupProductionBackendPod1(["Pod1"])
      ResourceGroupProductionBackendPod2(["Pod2"])
    end
    subgraph ResourceGroupProductionPrometheus["ResourceGroup"]
      direction LR
      ResourceGroupProductionPrometheusService(["Service"])
      ResourceGroupProductionPrometheusDeployment(["Deployment"])
      ResourceGroupProductionPrometheusPod1(["Pod1"])
      ResourceGroupProductionPrometheusPod2(["Pod2"])
    end
  end

  subgraph GitLab
    subgraph Organization
      subgraph OperationGroup
        subgraph AgentManagementProject
          AgentManagementAgentProduction["Production agent"]
          AgentManagementManifestFiles["Kubernetes Manifest Files"]
          AgentManagementEnvironmentProductionPrometheus["production prometheus environment"]
          AgentManagementPipelines["CI/CD pipelines"]
        end
      end
      subgraph DevelopmentGroup
        subgraph FrontendAppProject
          FrontendAppCode["VueJS"]
          FrontendDockerfile["Dockerfile"]
        end
        subgraph BackendAppProject
          BackendAppCode["Golang"]
          BackendDockerfile["Dockerfile"]
        end
        subgraph DeploymentProject
          DeploymentManifestFiles["Kubernetes Manifest Files"]
          DeploymentPipelines["CI/CD pipelines"]
          DeploymentEnvironmentProductionFrontend["production frontend environment"]
          DeploymentEnvironmentProductionBackend["production backend environment"]
        end
      end
    end
  end

  DeploymentEnvironmentProductionFrontend --- ResourceGroupProductionFrontend
  DeploymentEnvironmentProductionBackend --- ResourceGroupProductionBackend
  AgentManagementEnvironmentProductionPrometheus --- ResourceGroupProductionPrometheus
  ResourceGroupProductionFrontendService -.- ResourceGroupProductionFrontendDeployment
  ResourceGroupProductionFrontendDeployment -.- ResourceGroupProductionFrontendPod1
  ResourceGroupProductionFrontendDeployment -.- ResourceGroupProductionFrontendPod2
  ResourceGroupProductionBackendService -.- ResourceGroupProductionBackendDeployment
  ResourceGroupProductionBackendDeployment -.- ResourceGroupProductionBackendPod1
  ResourceGroupProductionBackendDeployment -.- ResourceGroupProductionBackendPod2
  ResourceGroupProductionPrometheusService -.- ResourceGroupProductionPrometheusDeployment
  ResourceGroupProductionPrometheusDeployment -.- ResourceGroupProductionPrometheusPod1
  ResourceGroupProductionPrometheusDeployment -.- ResourceGroupProductionPrometheusPod2
  AgentManagementAgentProduction -- Shared with --- DeploymentProject
  DeploymentPipelines -- "Deploy" --> ResourceGroupProductionFrontend
  DeploymentPipelines -- "Deploy" --> ResourceGroupProductionBackend
  AgentManagementPipelines -- "Deploy" --> ResourceGroupProductionPrometheus

Further details

Multi-Project Deployment Pipelines

The microservice project setup can be improved by Multi-Project Deployment Pipelines:

  • Deployment Project can behave as the shared deployment engine for any upstream application projects and environments.
  • Environments can be created within the application projects. It gives more visibility of environments for developers.
  • Deployment Project can be managed under Operator group. More segregation of duties.
  • Users don't need to setup RBAC to restrict CI/CD jobs.
  • This is especitially helpful for dynamic environments, such as Review Apps.
flowchart LR
  subgraph ProductionKubernetes["Production Kubernetes"]
    subgraph ResourceGroupProductionFrontend["ResourceGroup"]
      direction LR
      ResourceGroupProductionFrontendService(["Service"])
      ResourceGroupProductionFrontendDeployment(["Deployment"])
      ResourceGroupProductionFrontendPod1(["Pod1"])
      ResourceGroupProductionFrontendPod2(["Pod2"])
    end
    subgraph ResourceGroupProductionBackend["ResourceGroup"]
      direction LR
      ResourceGroupProductionBackendService(["Service"])
      ResourceGroupProductionBackendDeployment(["Deployment"])
      ResourceGroupProductionBackendPod1(["Pod1"])
      ResourceGroupProductionBackendPod2(["Pod2"])
    end
    subgraph ResourceGroupProductionPrometheus["ResourceGroup"]
      direction LR
      ResourceGroupProductionPrometheusService(["Service"])
      ResourceGroupProductionPrometheusDeployment(["Deployment"])
      ResourceGroupProductionPrometheusPod1(["Pod1"])
      ResourceGroupProductionPrometheusPod2(["Pod2"])
    end
  end

  subgraph GitLab
    subgraph Organization
      subgraph OperationGroup
        subgraph DeploymentProject
          DeploymentAgentProduction["Production agent"]
          DeploymentManifestFiles["Kubernetes Manifest Files"]
          DeploymentEnvironmentProductionPrometheus["production prometheus environment"]
          DeploymentPipelines["CI/CD pipelines"]
        end
      end
      subgraph DevelopmentGroup
        subgraph FrontendAppProject
          FrontendDeploymentPipelines["CI/CD pipelines"]
          FrontendEnvironmentProduction["production environment"]
        end
        subgraph BackendAppProject
          BackendDeploymentPipelines["CI/CD pipelines"]
          BackendEnvironmentProduction["production environment"]
        end
      end
    end
  end

  FrontendEnvironmentProduction --- ResourceGroupProductionFrontend
  BackendEnvironmentProduction --- ResourceGroupProductionBackend
  DeploymentEnvironmentProductionPrometheus --- ResourceGroupProductionPrometheus
  ResourceGroupProductionFrontendService -.- ResourceGroupProductionFrontendDeployment
  ResourceGroupProductionFrontendDeployment -.- ResourceGroupProductionFrontendPod1
  ResourceGroupProductionFrontendDeployment -.- ResourceGroupProductionFrontendPod2
  ResourceGroupProductionBackendService -.- ResourceGroupProductionBackendDeployment
  ResourceGroupProductionBackendDeployment -.- ResourceGroupProductionBackendPod1
  ResourceGroupProductionBackendDeployment -.- ResourceGroupProductionBackendPod2
  ResourceGroupProductionPrometheusService -.- ResourceGroupProductionPrometheusDeployment
  ResourceGroupProductionPrometheusDeployment -.- ResourceGroupProductionPrometheusPod1
  ResourceGroupProductionPrometheusDeployment -.- ResourceGroupProductionPrometheusPod2
  FrontendDeploymentPipelines -- "Trigger downstream pipeline" --> DeploymentProject
  BackendDeploymentPipelines -- "Trigger downstream pipeline" --> DeploymentProject
  DeploymentPipelines -- "Deploy" --> ResourceGroupProductionFrontend
  DeploymentPipelines -- "Deploy" --> ResourceGroupProductionBackend

View all Agent-managed Resource Groups on production environment

At the group-level, we can accumulate all environments match a specific tier, for example, listing all environments with production tier from subsequent projects. This is useful to see the entire Agent-managed Resource Groups on production environment. The following diagram examplifies the relationship between GitLab group and Kubernetes resources:

flowchart LR
  subgraph Kubernetes["Kubernetes"]
    subgraph ResourceGroupProduction["ResourceGroup"]
      direction LR
      ResourceGroupProductionService(["Service"])
      ResourceGroupProductionDeployment(["Deployment"])
      ResourceGroupProductionPod1(["Pod1"])
      ResourceGroupProductionPod2(["Pod2"])
    end
    subgraph ResourceGroupStaging["ResourceGroup"]
      direction LR
      ResourceGroupStagingService(["Service"])
      ResourceGroupStagingDeployment(["Deployment"])
      ResourceGroupStagingPod1(["Pod1"])
      ResourceGroupStagingPod2(["Pod2"])
    end
  end

  subgraph GitLab
    subgraph Organization
      OrganizationProduction["All resources on production"]
      subgraph Frontend project
        FrontendEnvironmentProduction["production environment"]
      end
      subgraph Backend project
        BackendEnvironmentProduction["production environment"]
      end
    end
  end

  FrontendEnvironmentProduction --- ResourceGroupProduction
  BackendEnvironmentProduction --- ResourceGroupStaging
  ResourceGroupProductionService -.- ResourceGroupProductionDeployment
  ResourceGroupProductionDeployment -.- ResourceGroupProductionPod1
  ResourceGroupProductionDeployment -.- ResourceGroupProductionPod2
  ResourceGroupStagingService -.- ResourceGroupStagingDeployment
  ResourceGroupStagingDeployment -.- ResourceGroupStagingPod1
  ResourceGroupStagingDeployment -.- ResourceGroupStagingPod2
  OrganizationProduction --- FrontendEnvironmentProduction
  OrganizationProduction --- BackendEnvironmentProduction

A few notes:

  • In the future, we'd have more granular filters for resource search. For example, there are two environments production/us-region and production/eu-region in each project and show only resources in US region at the group-level. This could be achivable by query filtering in PostgreSQL or label/namespace filtering in Kubernetes.
  • Please see Add dynamically populated organization-level environments page for more information.

Design and implementation details

NOTE: The following solution might be only applicable for CI Access mode. GitOps mode will have significant architectural changes outside of this blueprint, such as Flux switching and Manifest projects outside of the Agent configuration project. In order to derisk potential rework, we'll revisit the GitOps mode after these upstream changes have been settled.

Associate Environment with Agent

As a preliminary step, we allow users to explicitly define "which deployment job" uses "which agent" and deploy to "which namespace". The following keywords are supported in .gitlab-ci.yml.

  • environment:kubernetes:agent ... Define which agent the deployment job uses. It can select the appropriate context from the KUBE_CONFIG.
  • environment:kubernetes:namespace ... Define which namespace the deployment job deploys to. It injects KUBE_NAMESPACE predefined variable into the job. This keyword already exists.

Here is an example of .gitlab-ci.yml.

deploy-production:
  environment:
    name: production
    kubernetes:
      agent: path/to/agent/repository:agent-name
      namespace: default
  script: 
    - helm --context="$KUBE_CONTEXT" --namespace="$KUBE_NAMESPACE" upgrade --install

When a deployment job is created, GitLab persists the relationship of specified agent, namespace and deployment job. If the CI job is NOT authorized to access the agent (Please refer Clusters::Agents::FilterAuthorizationsService for more details), this relationship aren't recorded. This process happens in Deployments::CreateForBuildService. The database table scheme is:

agent_deployments:
  - deployment_id (bigint/FK/NOT NULL/Unique)
  - agent_id (bigint/FK/NOT NULL)
  - kubernetes_namespace (character varying(255)/NOT NULL)

To idenfity an associated agent for a specific environment, environment.last_deployment.agent can be used in Rails.

Fetch resources through user_access

When user visits an environment page, GitLab frontend fetches an environment via GraphQL. Frontend additionally fetches the associated agent-ID and namespace through deployment relationship, which being tracked by the agent_deployments table.

Here is an example of GraphQL query:

{
  project(fullPath: "group/project") {
    id
    environment(name: "<environment-name>") {
      slug
      lastDeployment(status: SUCCESS) {
        agent {
          id
          kubernetesNamespace
        }
      }
    }
  }
}

GitLab frontend authenticate/authorize the user access with browser cookie. If the access is forbidden, frontend shows an error message that You don't have access to an agent that deployed to this environment. Please contact agent administrator if you are allowed in "user_access" in agent config file. See <troubleshooting-doc-link>.

After the user gained access to the agent, GitLab frontend fetches available API Resource list in the Kubernetes and fetches the resources with the following parameters:

  • namespace ... #{environment.lastDeployment.agent.kubernetesNamespace}
  • labels
    • app.gitlab.com/project_id=#{project.id} AND
    • app.gitlab.com/environment_slug: #{environment.slug}

If no resources are found, this is likely that the users have not embedded these lables into their resources. In this case, frontend shows an warning message There are no resources found for the environment. Do resources have GitLab preserved labels? See <troubleshooting-doc-link>.

Dependency graph

  • GitLab frontend uses Owner References to idenfity the dependencies between resources. These are embedded in resources as metadata.ownerReferences field.
  • For the resoruces that don't have owner references, we can use Well-Known Labels, Annotations and Taints as complement. e.g. EndpointSlice doesn't have metadata.ownerReferences, but has kubernetes.io/service-name as a reference to the parent Service resource.

Health status of resources

  • GitLab frontend computes the status summary from the fetched resources. Something similar to ArgoCD's Resource Health e.g. Healthy, Progressing, Degraded and Suspended. The formula is TBD.