Connect EKS clusters
To analyze the usage and cost of resources in an EKS cluster, you need to connect the cluster to the DoiT platform. You can choose either Terraform or CloudFormation (with kubectl or Helm) for the deployment. Be aware that:
-
Clusters in the same account or region must be deployed using the same deployment method.
-
It may take up to 24 hours before your data appear in DoiT Cloud Analytics.
-
The
DEPLOYMENT-ID
variable in Terraform and Helm deployment is provided by DoiT. It's not possible to automate theDEPLOYMENT-ID
connection in the current implementation.
Required permissions
To connect an EKS cluster to the DoiT platform, you need the following permissions:
-
Kubernetes admin permissions to deploy the OpenTelemetry Collector agent on the cluster, and
-
AWS permissions to set up the necessary resources, including:
- Creating a role and IAM policy to export the EKS metrics collected by the OpenTelemetry Collector, creating an S3 bucket to store the metrics, and setting up an OpenID Connect (OIDC) identity provider that authenticates the role to the cluster.
- Creating a role and IAM policy to give DoiT read-only access to get the metrics from the S3 bucket.
See the full list of required AWS permissions
Permission Description iam:CreatePolicy
Creates a new managed policy for your AWS account. iam:DeletePolicy
Deletes the specified managed policy. iam:AttachRolePolicy
Attaches the specified managed policy to an IAM role. iam:DetachRolePolicy
Removes the specified managed policy from a role. iam:CreateRole
Creates a new role for your AWS account. iam:DeleteRole
Deletes the specified role. iam:PutRolePolicy
Adds or updates a policy document in the specified IAM role. iam:DeleteRolePolicy
Deletes the specified policy in the specified IAM role. iam:ListRoles
Lists the IAM roles with the specified path prefix. iam:GetRole
Retrieves information about an IAM role, including the role's path, GUID, ARN, and the trust policy that grants permission to assume the role. iam:PassRole
Passes an IAM role to an AWS service. iam:GetPolicy
Retrieves information about the specified managed policy. s3:CreateBucket
Creates a new Amazon S3 bucket. s3:DeleteBucket
Deletes a specific S3 bucket. s3:PutLifecycleConfiguration
Creates a new lifecycle configuration for the S3 bucket or replaces an existing one. s3:GetLifecycleConfiguration
Returns the lifecycle configuration information set on the S3 bucket. lambda:CreateFunction
Creates a Lambda function. lambda:UpdateFunctionCode
Updates a Lambda function's code. lambda:UpdateFunctionConfiguration
Modifies the version-specific settings of a Lambda function. lambda:DeleteFunction
Deletes a Lambda function. lambda:AddPermission
Grants an AWS service, AWS account, or AWS organization permission to use a function. lambda:GetFunction
Returns information about the function or function version, with a link to download the deployment package. lambda:InvokeFunction
Invokes a Lambda function. cloudformation:CreateStack
Creates a stack as specified in the template. cloudformation:DescribeStacks
Returns the description for the specified stack or all the stacks created. cloudformation:DescribeStackEvents
Returns all stack related events for a specified stack in reverse chronological order. cloudformation:DeleteStack
Deletes a specified stack. cloudformation:DescribeStackResource
Returns a description of the specified resource in the specified stack. cloudformation:DescribeStackResources
Returns AWS resource descriptions for running and deleted stacks. cloudformation:GetTemplate
Returns the template body for a specified stack. cloudformation:GetTemplateSummary
Returns information about a new or existing template. cloudformation:ListStacks
Returns the summary information for stacks with matching status. cloudformation:UpdateStack
Updates a stack as specified in the template. SNS:Publish
Gives users permissions to publish to the topic.
Amazon EKS User Guide: Allowing users to access your cluster
Terraform deployment
-
In the DoiT console, select Terraform as the deployment method.
-
Clone the DoiT terraform-eks-lens repository for the account/region.
git clone https://github.com/doitintl/terraform-eks-lens.git eks-lens-ACCOUNT-REGION
cd eks-lens-ACCOUNT-REGION -
Sign in to the Amazon EKS console, select your cluster on the Clusters page.
-
In the Details section on the Overview tab, copy the value of the OpenID Connect provider URL and then paste it in the DoiT console to download the Terraform configuration file,
CLUSTERNAME.tf
, for your cluster. Save the downloaded file in the current Terraform directory. -
Create a new file named
CLUSTERNAME_provider.tf
, copy the code snippet in the DoiT console and modify it to set up your Terraform Kubernetes provider. -
Copy the code snippet in the DoiT console and modify it to set up your AWS provider in the
aws_provider.tf
file. -
Run the following Terraform commands in sequence:
terraform init
: Initializes a working directory containing Terraform configuration files.terraform plan
: Creates an execution plan that allows you to preview the changes that Terraform plans to make to your infrastructure.terraform apply
: Executes the actions proposed in the Terraform plan.
-
In the DoiT console, select Finish to complete the deployment.
If successful, the cluster status shows Active on the EKS clusters page.
Troubleshooting
If you've successfully executed the Terraform commands but the state of your cluster still shows Not started
, try the following:
-
Open your Terraform configuration file
CLUSTERNAME.tf
, find the curl command in the section ofnull_resource
anddeploy_cluster
. -
Run the curl command to send a request to
https://console.doit.com/webhooks/v1/eks-metrics/terraform-validate
with the correct parameters.
CloudFormation deployment
The CloudFormation deployment process consists of two steps.
Step 1: Add permission
In this step, you create a CloudFormation stack using the DoiT EKS onboarding template.
-
In the DoiT console, select CloudFormation as the deployment method, click Next, and then select Open CloudFormation Stack.
-
In the AWS CloudFormation console, review the pre-populated fields, and then create a stack using the DoiT template.
-
Select the checkbox at the bottom of the page to acknowledge that AWS CloudFormation might create IAM resources with custom names.
-
Create the stack.
-
Once the stack is created, navigate back to the DoiT console. You should see a confirmation message that says
Permission successfully added
. Select Next to proceed.
Step 2: Connect and validate
In this step, you install the required components on your Kubernetes clusters, using an auto-generated Kubernetes Deployment file or an EKS Lens Helm chart.
Using kubectl
-
Download the deployment YAML file if you haven't done so in the previous step.
-
Open AWS CloudShell in the AWS Management Console. Upload the deployment YAML file.
-
In the DoiT console, copy the command
kubectl apply -f DEPLOYMENT_YAML_FILE
. -
Paste the command in the AWS CloudShell and run it to update the cluster configuration.
The deployment file creates two service accounts in the namespace doit-eks-metrics
:
- Service account
doit-kube-state-metrics
: To deploy the kube-state-metrics (KSM) service on the cluster. - Service account
doit-collector
: To deploy the OpenTelemetry Collector on the cluster.
- In the DoiT console, select Check to validate the connection. If successful, the status of cluster shows Active on the EKS clusters page.
Using Helm
-
Install Helm on your local system.
-
Copy the Helm commands provided in the DoiT console and run them in sequence.
- The
helm repo add
command adds the chart repository to your local helm installation. - The
helm template
command renders the chart template locally. - The
helm upgrade --install
command installs thedoit-eks-lens
chart with its specifickube-state-metrics
deployment.
Refer to EKS Lens Helm chart for detailed instructions.
- The
-
In the DoiT console, select Check to validate the connection. If successful, the cluster's status will show Active on the EKS clusters page.
Multiple EKS clusters
If you have multiple EKS clusters, you must create a new CloudFormation stack for each cluster because some AWS resources are deployed at the cluster level.
The S3 bucket is created per account/region. It should be created only once when onboarding the first cluster in each account/region; all other clusters in the same account/region will use the same S3 bucket.
If the stack creation for a later cluster fails because of the existing S3 bucket, set the CreateBucket parameter to false
. You should not change the bucket name.
Resource management for OpenTelemetry Collector Pods
DoiT EKS cost monitoring solution supports the OpenTelemetry Collector's Memory Limiter Processor to help you mitigate memory usage issues with the OpenTelemetry Collector.
Terraform EKS Lens
With terraform-eks-lens, you can set environment variables for the OpenTelemetry Collector using the otel_env
variable.
# If you need to set environment variables for the OpenTelemetry Collector, you can do so by setting the `otel_env` variable:
# otel_env = {
# "GOMEMLIMIT" = "2750MiB" # set the memory limit for the OpenTelemetry Collector
# }
# We recommend to read the OpenTelemetry Collector documentation to understand the memory limiter processor configuration: https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md#best-practices
# If you want to customize the memory limiter processor for the OpenTelemetry Collector, you can do so by setting the `otel_memory_limiter` variable:
# otel_memory_limiter = {
# check_interval = "1s"
# limit_percentage = 70
# spike_limit_percentage = 30
# }
# If you want to customize the resources for the OpenTelemetry Collector container, you can do so by setting the `otel_resources` variable:
# otel_resources = {
# requests = {
# cpu = "100m"
# memory = "256Mi"
# }
# limits = {
# cpu = "100m"
# memory = "256Mi"
# }
# }
EKS Lens Helm chart
With EKS Lens Helm chart, you can adjust memory limiter settings using variables in the values.yaml
file.
collector:
otelcol:
replicas: 1
image:
repository: otel/opentelemetry-collector-contrib
tag: 0.83.0
kubeStateMetrics:
endpoint: "kube-state-metrics:8080"
env:
# - name: "GOMEMLIMIT"
# value: "2750MiB"
## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
resources: {}
## Ref: https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
memory_limiter:
check_interval: 1s
limit_percentage: 70
spike_limit_percentage: 30
Where:
-
check_interval
: Time between measurements of memory usage. -
limit_percentage
: Maximum amount of total memory that can be allocated by the process heap. This option is supported on Linux systems withcgroups
and it's intended for dynamic platforms like Docker. -
spike_limit_percentage
: Maximum spike expected between the memory usage measurements. The value must be less thanlimit_percentage
. This option is intended to be used only withlimit_percentage
.
See also Resource Management for Pods and Containers.
EKS clusters offboarding
To offboard an EKS cluster from the DoiT platform:
-
Cluster connected via Terraform: Run the terraform destroy command to destroy the full stack based on your
CLUSTERNAME.tf
file, or use the-target
option to destroy single resources, for example,terraform destroy -target RESOURCE_TYPE.NAME
. -
Cluster connected via CloudFormation with Helm:
-
Delete the CloudFormation stack of the cluster from your AWS account. See Deleting a stack.
-
Run the
helm uninstall doit-eks-lens
command to delete the agent (OpenTelemetry Collector) from Kubernetes.
-
-
Clusters connected via CloudFormation with kubectl:
-
Delete the CloudFormation stack of the cluster from your AWS account. See Deleting a stack.
-
Run the
kubectl delete -f DEPLOYMENT_YAML_FILE
command from the AWS CloudShell to delete the agent (OpenTelemetry Collector) configuration.
-
To remove multiple clusters, repeat the steps above for each one.
Interactive demo
Try out our interactive demo for a hands-on walk-through experience.
If the demo doesn't display properly, try expanding your browser window or opening the demo in a new tab.