Deploying to Google Cloud Run with Pulumi

In this post, we deploy a small service to Google Cloud Run with Pulumi.

Google Cloud Run is a fully managed serverless platform for running containerized workloads on Google Cloud. It works well for HTTP services, APIs, background processing, and other workloads that can run in a container.

Cloud Run is a billed service, but Google Cloud offers a free tier that is large enough for following along with this example without incurring costs.

Prerequisites ¶

If you want to follow along, you need the following:

A Google Cloud project with billing enabled
Pulumi installed and configured
Docker installed and running locally
Go installed

This post uses a Go application, but the same deployment pattern works for applications written in any language that can be packaged as a container. The infrastructure code is written in Go with Pulumi, but you can use any language supported by Pulumi.

The Pulumi program uses the gcp provider to create the Google Cloud resources and the Docker provider to build and push the container image.

For authentication, Google Cloud's current guidance generally favors Application Default Credentials and, where appropriate, Workload Identity Federation over long-lived service account keys. For a local tutorial like this, a service account key still works, and so do user credentials from the gcloud CLI.

If you use a service account key, you can set either of these environment variables:

GOOGLE_CREDENTIALS to the raw JSON contents of the key
GOOGLE_APPLICATION_CREDENTIALS to the path of the key file

If you prefer to use your own user account locally, you can also run gcloud auth login and gcloud auth application-default login.

Service ¶

The service in this example is a small Go HTTP server that returns JSON. There is nothing Cloud Run-specific in the application itself. It just needs to listen on a port and respond to HTTP requests.

You can find the server code in server/main.go in the repository. The Dockerfile is also a standard container build and does not require any special Cloud Run features.

Pulumi configuration ¶

Enable APIs ¶

This stack deploys more than just a Cloud Run service. It also creates an Artifact Registry repository and an API Gateway. That means the corresponding APIs must be enabled in the Google Cloud project.

You can enable them manually in the Console, but Pulumi can do it for you:

    services := []string{
      "apigateway.googleapis.com",
      "artifactregistry.googleapis.com",
      "run.googleapis.com",
      "servicecontrol.googleapis.com",
      "servicemanagement.googleapis.com",
    }
    enabledAPIs := make([]pulumi.Resource, 0, len(services))
    for _, api := range services {
      name := strings.ReplaceAll(strings.TrimSuffix(api, ".googleapis.com"), ".", "-")
      service, err := projects.NewService(ctx, name, &projects.ServiceArgs{
        Project:                         pulumi.String(project),
        Service:                         pulumi.String(api),
        DisableOnDestroy:                pulumi.Bool(false),
        CheckIfServiceHasUsageOnDestroy: pulumi.Bool(false),
      })
      if err != nil {
        return err
      }
      enabledAPIs = append(enabledAPIs, service)
    }

main.go

Artifact Registry ¶

Before Cloud Run can serve the application, we need a container image. Google currently recommends Artifact Registry for Cloud Run deployments, and this example pushes the image there.

Strictly speaking, Cloud Run can also deploy directly from Docker Hub, and other registries can be used through Artifact Registry remote repositories. This example stays with Artifact Registry because it is the most natural fit for a Google Cloud project.

First, create a Docker repository:

    repo, err := artifactregistry.NewRepository(ctx, "images", &artifactregistry.RepositoryArgs{
      Project:      pulumi.String(project),
      Location:     pulumi.String(region),
      RepositoryId: pulumi.String(repositoryID),
      Description:  pulumi.String("Docker images for the temperature Cloud Run example"),
      Format:       pulumi.String("DOCKER"),
    }, pulumi.DependsOn(enabledAPIs))
    if err != nil {
      return err

main.go

Next, build the image locally and push it with the Pulumi Docker provider. You provide the Dockerfile path, the build context, the target platform, and registry credentials.

    rootDir := ctx.RootDirectory()
    dockerfile := pulumi.String(filepath.Join(rootDir, "Dockerfile"))
    contextDir := pulumi.String(rootDir)

    image, err := docker.NewImage(ctx, "server-image", &docker.ImageArgs{
      ImageName: serverImageRef,
      Build: &docker.DockerBuildArgs{
        Context:    contextDir,
        Dockerfile: dockerfile,
        Platform:   pulumi.String("linux/amd64"),
      },
      Registry: &docker.RegistryArgs{
        Server:   repo.RegistryUri,
        Username: pulumi.String("oauth2accesstoken"),
        Password: clientConfig.AccessToken(),
      },
    }, pulumi.DependsOn([]pulumi.Resource{repo}))

main.go

Cloud Run ¶

A Cloud Run service is defined with a Service resource and belongs to a specific project and region:

    service, err := cloudrunv2.NewService(ctx, "service", &cloudrunv2.ServiceArgs{
      Project:            pulumi.String(project),
      Name:               pulumi.String(serviceName),
      Location:           pulumi.String(region),
      DeletionProtection: pulumi.Bool(false),

main.go

In this example, ingress is set to INGRESS_TRAFFIC_ALL. That setting allows requests from the public internet as well as Google Cloud services. This example will set up API Gateway as a public entry point in front of the Cloud Run service. API Gateway v1 routes over the public endpoint of Cloud Run, so the service needs to allow ingress from all traffic. The service is still protected by IAM, so only authorized requests from API Gateway will succeed.

For a Cloud Run service that is only invoked by other Google Cloud services (except API Gateway), you could set ingress to INGRESS_TRAFFIC_INTERNAL_ONLY to block all public traffic.

      Ingress:            pulumi.String("INGRESS_TRAFFIC_ALL"),

main.go

The configuration also keeps authenticated invocation enabled by setting InvokerIamDisabled to false. That means the caller must be authenticated and have the appropriate IAM permission to invoke the service. If you set InvokerIamDisabled to true, then the service can be invoked anonymously without any authentication. A caller needs the roles/run.invoker permission to invoke the service.

      InvokerIamDisabled: pulumi.Bool(false),

main.go

The execution environment is set explicitly to second generation. Cloud Run can choose the execution environment automatically if you leave it unspecified. The current documentation describes second generation as a better fit for workloads that benefit from fuller Linux compatibility, stronger CPU performance, or better network performance.

        ExecutionEnvironment:          pulumi.String("EXECUTION_ENVIRONMENT_GEN2"),

main.go

MaxInstanceRequestConcurrency controls how many requests a single instance may process concurrently. Higher concurrency can reduce the number of instances you need, but it only helps if the application can actually handle that level of parallelism without hurting latency.

        MaxInstanceRequestConcurrency: pulumi.Int(80),

main.go

The request timeout is set to 15s. According to the Cloud Run documentation, if a request exceeds the timeout, Cloud Run closes the network connection and returns 504, but the container instance is not necessarily terminated immediately. That is worth keeping in mind if a handler might continue doing work after the client has already timed out.

        Timeout:                       pulumi.String("15s"),

main.go

The next block configures the scaling. With MinInstanceCount set to 0, the service can scale to zero when idle and with MaxInstanceCount set to 3 at most three instances will be created to handle incoming traffic. With 3 instances and concurrency set to 80, the theoretical upper bound is 240 concurrent requests. Scaling to zero reduces idle cost, but introduces cold starts. Whether that tradeoff is acceptable depends on your latency requirements.

        Scaling: &cloudrunv2.ServiceTemplateScalingArgs{
          MinInstanceCount: pulumi.Int(0),
          MaxInstanceCount: pulumi.Int(3),
        },

main.go

The container section references the built image and exposes port 8080:

            Image: image.RepoDigest,
            Ports: &cloudrunv2.ServiceTemplateContainerPortsArgs{
              ContainerPort: pulumi.Int(8080),
            },

main.go

The next block sets the vCPU and memory limits for the container. These two settings also control how much you are billed for the service. Billing is based on the vCPU and memory limits you set, not on the actual usage, so it is important to set these limits based on the expected usage of your application to avoid over-provisioning and higher costs.

You can set the vCPU to a lower value than 1, minimum is 0.08, but then you can no longer set the concurrency to a value greater than 1. That means if you have 10 concurrent requests, Cloud Run will create 10 instances to handle them. Because in this example concurrency is set to 80, the vCPU must be at least 1.

StartupCpuBoost can help to reduce cold start latency. When you provision 1 vCPU the startup boost will use 2 vCPUs during startup. Note that you will be charged for this additional vCPU during startup, but it can reduce the startup time and can reduce the overall cost of a request if the startup time is significantly reduced.

            Resources: &cloudrunv2.ServiceTemplateContainerResourcesArgs{
              CpuIdle: pulumi.Bool(true),
              Limits: pulumi.StringMap{
                "cpu":    pulumi.String("1"),
                "memory": pulumi.String("512Mi"),
              },
              StartupCpuBoost: pulumi.Bool(true),
            },

main.go

Finally, the service routes all traffic to the latest revision. With this setting you can configure things like A/B testing or gradual rollouts by splitting traffic across revisions.

      Traffics: cloudrunv2.ServiceTrafficArray{
        &cloudrunv2.ServiceTrafficArgs{
          Percent: pulumi.Int(100),
          Type:    pulumi.String("TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"),
        },
      },

main.go

API Gateway ¶

API Gateway is not required for Cloud Run, but it is useful when you want a single public entry point that can enforce API keys and rate limits before requests reach the backend. This saves costs because requests that are blocked by API Gateway do not reach Cloud Run and therefore do not incur Cloud Run costs.

The first step is a dedicated service account for the gateway. API Gateway uses this identity when it calls the Cloud Run service.

    gatewayServiceAccount, err := serviceaccount.NewAccount(ctx, "gateway-service-account", &serviceaccount.AccountArgs{
      Project:     pulumi.String(project),
      AccountId:   pulumi.String(gatewayServiceAccountID),
      DisplayName: pulumi.String("Temperature API Gateway backend"),
      Description: pulumi.String("Service account used by API Gateway to invoke the Cloud Run backend"),
    }, pulumi.DependsOn(enabledAPIs))

main.go

Next, grant that service account permission to invoke the Cloud Run service:

    _, err = cloudrunv2.NewServiceIamMember(ctx, "gateway-invoker", &cloudrunv2.ServiceIamMemberArgs{
      Project:  pulumi.String(project),
      Location: pulumi.String(region),
      Name:     service.Name,
      Role:     pulumi.String("roles/run.invoker"),
      Member:   gatewayServiceAccount.Member,
    }, pulumi.DependsOn([]pulumi.Resource{service, gatewayServiceAccount}))

main.go

Then create the logical API object:

    api, err := apigateway.NewApi(ctx, "service-api", &apigateway.ApiArgs{
      Project:     pulumi.String(project),
      ApiId:       pulumi.String(apiID),
      DisplayName: pulumi.String(serviceName + " API"),
    }, pulumi.DependsOn(enabledAPIs))

main.go

The API config is driven by an OpenAPI document. This example generates that document dynamically because the backend address depends on the Cloud Run URL that Pulumi creates.

    apiConfig, err := apigateway.NewApiConfig(ctx, "service-api-config", &apigateway.ApiConfigArgs{
      Project:           pulumi.String(project),
      Api:               api.ApiId,
      ApiConfigIdPrefix: pulumi.String(normalizeIdentifier(serviceName+"-cfg", 24) + "-"),
      DisplayName:       pulumi.String(serviceName + " config"),
      GatewayConfig: &apigateway.ApiConfigGatewayConfigArgs{
        BackendConfig: &apigateway.ApiConfigGatewayConfigBackendConfigArgs{
          GoogleServiceAccount: gatewayServiceAccount.Email,
        },
      },
      OpenapiDocuments: apigateway.ApiConfigOpenapiDocumentArray{
        &apigateway.ApiConfigOpenapiDocumentArgs{
          Document: &apigateway.ApiConfigOpenapiDocumentDocumentArgs{
            Path:     pulumi.String("openapi.yaml"),
            Contents: openAPIDocument,
          },
        },
      },
    }, pulumi.DependsOn([]pulumi.Resource{service, api, gatewayServiceAccount}), pulumi.ReplaceOnChanges([]string{"*"}))

main.go

Finally, deploy the regional gateway endpoint:

    gateway, err := apigateway.NewGateway(ctx, "service-gateway", &apigateway.GatewayArgs{
      Project:     pulumi.String(project),
      Region:      pulumi.String(region),
      GatewayId:   pulumi.String(gatewayID),
      DisplayName: pulumi.String(serviceName + " gateway"),
      ApiConfig:   apiConfig.Name,
    }, pulumi.DependsOn([]pulumi.Resource{apiConfig}))

main.go

Rate limit ¶

The rate limit is defined in the generated OpenAPI document with x-google-management. In this example, API Gateway tracks a requests metric and applies a quota of 10 requests per minute.

    "x-google-management:",
    "  metrics:",
    "    - name: requests",
    "      displayName: Requests",
    "      valueType: INT64",
    "      metricKind: DELTA",
    "  quota:",
    "    limits:",
    "      - name: requests-per-minute",
    "        metric: requests",
    "        unit: 1/min/{project}",
    "        values:",
    "          STANDARD: 10",
    "paths:",
    "  /:",
    "    get:",
    "      operationId: getIndex",
    "      x-google-backend:",
    fmt.Sprintf("        address: %s", serviceURL),
    "        path_translation: APPEND_PATH_TO_ADDRESS",
    fmt.Sprintf("        jwt_audience: %s", serviceURL),
    "      x-google-quota:",
    "        metricCosts:",
    "          requests: 1",
    "      security:",
    "        - api_key: []",
    "      responses:",
    "        '200':",
    "          description: OK",
    "  /api/temperature:",
    "    get:",
    "      operationId: getTemperature",
    "      x-google-backend:",
    fmt.Sprintf("        address: %s", serviceURL),
    "        path_translation: APPEND_PATH_TO_ADDRESS",
    fmt.Sprintf("        jwt_audience: %s", serviceURL),
    "      x-google-quota:",
    "        metricCosts:",
    "          requests: 1",

main.go

Each request to the protected route consumes 1 unit from that quota.

API key ¶

To require an API key, define a security scheme and apply it to the route:

    "      security:",
    "        - api_key: []",
    "      parameters:",
    "        - in: query",
    "          name: lat",
    "          required: true",
    "          type: number",
    "        - in: query",
    "          name: lng",
    "          required: true",
    "          type: number",
    "      responses:",
    "        '200':",
    "          description: OK",
    "securityDefinitions:",
    "  api_key:",
    "    type: apiKey",
    "    name: X-API-Key",
    "    in: header",

main.go

This Pulumi program does not create API keys. You need to create them manually in the Console or with the gcloud CLI.

Provision ¶

Before running pulumi up, set the required configuration values. At minimum, you need the project ID and region. The region must support both Cloud Run and API Gateway.

pulumi config set gcp:project <your-project-id>
pulumi config set gcp:region <your-region>

Also make sure that your Google Cloud credentials are available in the environment variables or through the gcloud CLI as described in the prerequisites section above.

Then run:

pulumi up

This will create all the resources, build and push the container image, and deploy the Cloud Run service and API Gateway.

Client ¶

To test the deployed service, you can use any HTTP client, such as curl or Postman. For this example, there is also a small Go client that demonstrates how to call the deployed service. You can find it in cli/main.go on GitHub.

After deployment, read the public gateway URL like this:

pulumi stack output serviceUrl

To create an API key with gcloud, use the following commands:

managedService=$(pulumi stack output apiGatewayManagedService)
gcloud services enable $managedService
gcloud services api-keys create --display-name="temperature-client"

API Gateway API keys are tied to the managed service created for the gateway, and that service must be enabled before it can be selected for API key restrictions.

You can also create and manage API keys in the Google Cloud Console. Go to APIs & Services > Enabled APIs & services > Enable APIs and Services, search for the managed service name returned by pulumi stack output apiGatewayManagedService, enable it, and then create or edit the API key and restrict it to that service.

The demo client reads the service URL and API key from environment variables or from a .env file:

TEMPERATURE_SERVICE_URL=https://temperature-service-gateway....
TEMPERATURE_SERVICE_API_KEY=AI...

When you are done, destroy the stack so you do not keep paying for resources you no longer need:

pulumi destroy

Cost ¶

Before you use a service like Google Cloud Run, it is important to understand the pricing model and how to control costs. All the services we use in this example cost money, but they also have free tiers that allow you to experiment without incurring costs. And if you deploy a service that is only called a few times a day, you might stay within the free tier limits and pay nothing or just a few dollars per month. However, if your service becomes popular or if there is a bug that causes it to scale uncontrollably, you can end up with a much higher bill than expected.

All the following information are based on the time of writing this blog post in April 2026. Please check the pricing pages for the most up-to-date information, because prices and free tier limits can and will change over time. Also check the correct region for the pricing, because some services have different prices in different regions.

Artifact Registry ¶

Pricing page

The cost for the hosted Docker images in Artifact Registry is mainly based on the amount of storage used. The free tier includes 0.5 GB of storage per month. Data transfer into Google Cloud is free, and also pulls to Google Cloud services in the same region are free. Costs occur when you pull images to a local machine, and to another region.

API Gateway ¶

Pricing page

For the API Gateway, you are billed based on the number of API calls and the amount of data transferred to clients. The free tier includes 2 million API calls per month. Data transfer into Google Cloud is free, but data transfer out is billed. Egress traffic is billed per GB and this can add up very fast if your API returns large responses or if you have a lot of traffic. To reduce the traffic make sure that you only send back the necessary data in the response and consider using compression if the responses are large. Be aware that compression can increase the vCPU usage and therefore the cost on Cloud Run, so it is a trade-off that you need to evaluate based on your specific use case. Also consider using a data format like Protocol Buffers instead of JSON which can reduce the response size without the overhead of compression.

There is no charge for the traffic from API Gateway to Cloud Run if they are in the same region, but if they are in different regions, you will be billed for cross-region data transfer.

Cloud Run ¶

Pricing page

Cloud Run offers you two pricing models

Request-Based Pricing ¶

This is the default and also the model used in this example. In this model, you are billed for the exact time your container is actively processing a request. If no request is being processed, no cost occurs if you set minimum instances to 0. If you set minimum instances to a value greater than 0, then you will be billed for the time the instances are running even when there is no traffic. Google calls this "idle time" and it is billed at a discounted rate.

You are billed based on the vCPU and memory limits you set for your service, and the time it takes to process each request, from start to finish (including cold startup time).

In addition, there is also a per-request fee. In the us-central1 region, for example, the fee is $0.40 per million requests after the free tier of 2 million requests per month.

This model is cost-effective for applications with low or unpredictable traffic, because you only pay when your service is actually handling requests. However, if your application has a steady stream of traffic, the costs can add up quickly, and in that case the instance-based pricing might be more cost-effective.

Instance-Based Pricing ¶

In this model, you are billed for the entire lifetime of the instance, from the moment it starts up until it shuts down. The vCPU is always available, even between requests, and you are billed for that time. However, the per-second rate is lower than request-based pricing, and there is no per-request fee. Like the request-based model, you are billed based on the vCPU and memory limits you set for your service. Make sure to set the limits based on the expected usage of your application to avoid over-provisioning.

This model can be more cost-effective for applications with steady, high-volume traffic, or for applications that need to do "work" in the background after a response has been sent to the user. For example, if your application needs to send an email or process logs after responding to a request, instance-based pricing allows you to do that.

Also note that if you want to use a GPU with Cloud Run, you need to use instance-based pricing, because request-based pricing does not support GPUs.

When choosing instance-based pricing you can also set the minimum instance to 0, which allows the service to scale down to zero when there is no traffic. The question is why not choose instance-based pricing with min instances set to 0 all the time? It's cheaper and it has no request fee, so it seems like the best of both worlds. The reason is that with instance-based pricing you are billed for the entire lifetime of the instance, and this includes the warm period after a request has been processed. Cloud Run ususally keeps instances alive for about 15 minutes after they have handled a request (GPU instances are kept alive for about 10 minutes), and with instance-based pricing you are billed for that time too. With request-based pricing, you are only billed for the time it takes to process the request, and not for the warm period after the request has been processed.
So for a low traffic application, instance-based pricing with min instances set to 0 can end up being more expensive than request-based pricing because you are paying for the warm period.

Like all the other services, you will only be charged after the free tier limits are exceeded. Note that the free tier limits are different for the two pricing models, so make sure to check the pricing page for the most up-to-date information.

Both models also offer 1 and 3 year committed use discounts, which can further reduce the cost if you have predictable traffic and can commit to a certain level of usage for a longer period of time.

It's difficult to say which model is better for your specific use case. A common rule of thumb is that if your service is actively processing requests more than 75% of the time, instance-based billing is usually cheaper. But it really depends on the specific traffic patterns and resource usage of your application, so it is recommended to analyze your expected traffic and costs for both models to make an informed decision. You can also switch between the two models at any time, so you can start with request-based pricing and switch to instance-based pricing if you find that it is more cost-effective for your use case. It's recommended to monitor your costs and usage regularly to make sure that you are using the most cost-effective pricing model for your application.

Other costs ¶

Be aware that sending data from your Cloud Run service to non-Google services or to Google services in another region can incur egress costs. The cost is the same as the egress cost from API Gateway.

Jobs and Worker Pools ¶

In this blog post we only use Cloud Run for serving HTTP requests, but Cloud Run also supports two additional service types, that have a different pricing model.

Cloud Run Jobs: For batch processing. You start a job, it runs to completion, and then it stops. This service is billed based on the instance-based pricing model, but only for the time it takes to run the job, including the startup time.
Cloud Run Worker Pools: For background workers. These are designed to stay running and process tasks from a queue or stream. This is the cheapest Cloud Run option. Google offers a ~40% discount on vCPU and RAM compared to standard "Instance-based" services. Worker Pools also have the largest free tier in the Cloud Run family. These services are usually always on, so you pay for every second they are alive.

Both of these services do not get a public URL and cannot be invoked by a client directly.

Protecting against excessive pricing ¶

An overview of the most important settings to keep the costs under control. I already mentioned some of these settings in the Cloud Run configuration section above.

Cloud Run ¶

Max instances: Prevent your service from scaling up indefinitely in response to a traffic spike or a bug. If the traffic exceeds what the max instances can handle, Cloud Run will return 429 Too Many Requests errors instead of creating more instances.
Min instances: Setting the minimum number of instances to 0 allows Cloud Run to scale down to zero when there is no traffic. However, keep in mind that if your service has a cold start time, the first request after a period of inactivity will take longer to respond and will be billed for the startup time as well. If you set the minimum instances to a value greater than 0, you will be paying for those instances even when there is no traffic (although at a reduced rate).
Concurrency: How many requests a single instance can handle at the same time. Setting a higher concurrency can reduce the number of instances needed to handle traffic, which can lower your costs. But handling too many concurrent requests on a single instance can lead to performance issues and increase the response time which increases the cost.
vCPU and memory limits: Setting appropriate vCPU and memory limits. You are billed based on the resources you allocate, not just what you use.
Authentication: If your service does not require public access, make sure to disable anonymous invocations and only allow authenticated requests.

Global billing protections ¶

Budget alerts: Set up budget alerts in the Google Cloud Console to get notified when your spending reaches a certain threshold. This can help you take action before the bill arrives. Alarms can be set on actual spend or forecasted spend.
Programmatic disablement: You can connect a budget alert to a Pub/Sub topic that triggers a Cloud Function to automatically lower the max instances of your Cloud Run services to 0 or even disable billing for the project entirely if a certain spending threshold is reached.

DDoS protection ¶

In this example we use API Gateway to protect the Cloud Run service from accidental or malicious overuse. API Gateway can validate API keys, apply quota rules, and then forward authorized requests to Cloud Run. This helps to prevent excessive costs of Cloud Run by blocking unauthorized requests and by rate-limiting authorized requests before they even reach Cloud Run.

For more advanced protection against DDoS attacks, you can use Google Cloud Armor in front of your API Gateway or Cloud Run service, if you don't use the API Gateway. Cloud Armor is a DDoS protection and web application firewall (WAF) service that helps protect your applications from various types of attacks. You can use Cloud Armor to create security policies that block or rate-limit traffic based on IP addresses, geographic locations, or other attributes before it even reaches your API Gateway. This can help to further reduce the risk of excessive costs due to DDoS attacks or other malicious traffic.

Wrapping Up ¶

In this blog post you have seen how to deploy a simple HTTP service to Google Cloud Run using Pulumi, and how to set up an API Gateway in front of it, for authenticated access and rate limiting.

Google Cloud Run is a convenient way to run containerized applications in a serverless environment but it is important to understand the pricing model and how to control costs. Cloud Run can be very cost-effective for low-traffic applications, but on the other hand it can also lead to unexpectedly high bills due to traffic spikes, attacks, or misconfigurations. Make sure to understand the pricing model of all the services you are using and implement billing alarms and cost controls to avoid surprises.