Code Mode for Tool Calling in Go

Tool calling is a workflow for giving an LLM access to external information and capabilities. For example, a weather tool can give the model access to current weather conditions.

The tool-calling workflow is simple. In the initial prompt, the client lists all available tools together with descriptions of their capabilities, the arguments they take, and what they return.

The model picks one or more tools to call and sends back a tool-calling response. The client then executes the tool and returns the result to the model as context for the next turn. This loop continues until the model has enough information to answer the user's question.

Another technology related to tool calling is the Model Context Protocol (MCP). MCP defines a standard way for a server to expose tools and for a client to discover and call them. An LLM client is often also an MCP client. So when an LLM client sends a prompt to an LLM, it fetches the available tools from its MCP servers and includes those tool descriptions in the prompt. Then, when the model picks a tool to call, the client executes that tool by sending a request to the MCP server, the MCP server executes the tool and returns the result to the client, and the client then sends that result to the LLM.

Problems ¶

One problem with the tool-calling pattern is that these tool descriptions can consume a large part of the context. This is not an issue when you only have a few tools, but when you have hundreds of them, the descriptions can take up a lot of tokens in the prompt.

Another problem appears in multi-step workflows, where one tool call depends on the result of another. In the traditional tool-calling pattern, the model calls the first tool, gets the result back, then calls the second tool with that result, and so on. This can lead to many turns and a lot of intermediate data being sent back and forth between the model and the client.

Solution ¶

In this blog post from Cloudflare, Cloudflare proposes a new way to use tools called "code mode." This pattern tackles both of the problems mentioned above. Cloudflare also implemented this pattern and offers support for it via its SDK.

The first problem is solved by exposing only a search tool to the model instead of the full tool list. The model can use this search tool to find the relevant tools for the task at hand. This way, the initial context only contains the search tool and only receives the relevant tools after the model asks for them. This reduces the upfront context cost and makes it easier for the model to find the right tools.

The second problem can be addressed by letting the model write code that calls the tools instead of calling them one after another. The model can use the search tool to find the relevant tools and then write a program that calls those tools in a sandboxed environment. This way, the model can do the orchestration in code and only get the final result back instead of all the intermediate results. This reduces the number of turns and the amount of data sent back and forth.

Anthropic also took ideas from the Cloudflare post and implemented them in their hosted product. The "Tool Search Tool" lets the model search for relevant tools instead of loading them all into the context at once. The other feature is called "Programmatic Tool Calling," and it lets the model write Python code that orchestrates calls to the tools. You can find the full details in this blog post.

The issue is that this is only available in Anthropic's hosted product. This post shows how to implement code mode yourself in Go. The following example will use Anthropic's Claude as the LLM, without using their hosted tool-calling features, to show that you can implement this pattern yourself without needing any specific features from the LLM provider.

MCP server ¶

Code mode does not replace tool calling and MCP. It is simply a different way to use them. For this demo, the application uses a simple MCP server that exposes the following tools:

add_numbers: Add two numbers together.
city_time: Get the current time for a supported city.
shift_time: Shift an RFC3339 timestamp by a number of hours.
list_carriers: List the supported carriers for a shipping route.
quote_rate: Get a deterministic base shipping quote for a carrier and package weight.
estimate_delivery: Get a deterministic delivery window for a carrier and route.
apply_surcharge: Calculate deterministic surcharges for package traits such as weight, remote areas, and fragile handling.
quote_summary: Normalize a shipping quote into a sortable final summary.

These tools fall into three unrelated groups: an arithmetic tool, time tools, and shipping tools. The value of the search tool becomes clear in a moment, because it can help the model find the relevant tools for a given task without having to read through all of them.

In this demo application, the MCP server is built into the same Go process as the MCP and LLM client for simplicity, but in a real application the MCP server would likely be a separate process that the client connects to over the network or over stdio.

MCP client ¶

Because the MCP server and client run in the same Go process, the application can use in-memory transports to connect them without setting up an actual network connection.

  server := mcp.NewServer(&mcp.Implementation{Name: "demo-mcp", Version: "0.1.0"}, nil)

runtime.go

  client := mcp.NewClient(&mcp.Implementation{Name: "demo-client", Version: "0.1.0"}, nil)
  serverTransport, clientTransport := mcp.NewInMemoryTransports()

  serverSession, err := server.Connect(ctx, serverTransport, nil)
  if err != nil {
    return nil, fmt.Errorf("connect server: %w", err)
  }
  clientSession, err := client.Connect(ctx, clientTransport, nil)
  if err != nil {
    if closeErr := serverSession.Close(); closeErr != nil {
      return nil, fmt.Errorf("connect client: %w (close server session: %v)", err, closeErr)
    }
    return nil, fmt.Errorf("connect client: %w", err)
  }

runtime.go

At the start of the application, the application connects to the server and lists all available tools.

  result, err := r.clientSession.ListTools(ctx, nil)

runtime.go

It then iterates over all the tools and builds an internal catalog of all the available tools, including their name, description, argument schema, and result schema. This catalog is used later by the search tool to find the relevant tools for a given query.

Tool Search ¶

To explain the following steps, I will use the following prompt from a fictional shipping use case as an example.

Compare all shipping options for a 2.4kg parcel from Germany to Spain. 
Exclude any option slower than 4 business days and return the cheapest 
acceptable option only. Nothing else.

In the initial request to the LLM, the application includes only the search tool, not the full list of tools loaded from the MCP server. The tools section of the initial request looks like this:

Initial Request:

    "tools": [
      {
        "description": "Call this to discover relevant helpers for the user's task. The result includes the JavaScript helper definitions you should use in execute(code).",
        "input_schema": {
          "properties": {
            "limit": {
              "description": "Maximum results to return.",
              "type": "integer"
            },
            "query": {
              "description": "A concise natural-language summary of the user's task or the helpers you need to find.",
              "type": "string"
            }
          },
          "required": [
            "query"
          ],
          "type": "object"
        },
        "name": "search"
      }
    ]

The response from the LLM is either a normal answer, when it can answer without tools, or a tool-calling response to invoke the search tool. For this prompt, the model has no way to answer correctly without tools, so it calls the search tool.

Tool Call Response:

      {
        "caller": {
          "type": "direct"
        },
        "id": "toolu_01Sn9Rw5SkaJ26VjKJnAXK3o",
        "input": {
          "query": "shipping options rates parcel delivery"
        },
        "name": "search",
        "type": "tool_use"
      }

The tool call response shows the keywords, "shipping options rates parcel delivery," that the LLM uses to search for relevant tools. The search tool then uses those keywords to find the relevant tools from the catalog built from the MCP server.

The search tool looks through the tool catalog for entries that match the keywords in the query. This demo application uses a very simple search algorithm that compares the keywords with the tool names and descriptions. A real-world application could use a more sophisticated search algorithm, synonym expansion, or even a vector database for this step. The search tool is a crucial part that needs a good implementation to make sure the model finds the right tools for the task. You can find the search implementation for this demo on GitHub here.

The application then creates a description of the tools that the search tool found. This description is a JavaScript header that includes JSDoc comments with the argument and result schema for each tool. This way, when the model gets this description back, it can write code that calls these tools with the correct arguments and know what results to expect.

Here is an example of how this looks for the list_carriers tool.

/**
 * List the supported carriers for a shipping route.
 * @param {{ destination_country: string, origin_country: string }} args
 * Input fields:
 *   - args.destination_country: ISO 3166-1 alpha-2 country code
 *   - args.origin_country: ISO 3166-1 alpha-2 country code
 * @returns {{ carriers: null | Array<any>, destination_country: string, origin_country: string }}
 * Output fields:
 *   - result.carriers: available carrier identifiers for this route
 *   - result.destination_country: ISO 3166-1 alpha-2 country code
 *   - result.origin_country: ISO 3166-1 alpha-2 country code
 */
function demo_list_carriers(args) {}

The second request the application sends to the LLM includes these helper definitions.

That api_definition field is the crucial piece. The search response does not just say, "I found some tools." It says, "Here is the JavaScript surface you should use in the next step."

      {
        "content": [
          {
            "content": [
              {
                "text": "{\"api_definition\":\"/**\\n * Get a deterministic base shipping quote for a carrier and package weight.\\n * @param {{ carrier: string, destination_country: string, origin_country: string, weight_kg: number }} args\\n * Input fields:\\n *   - args.carrier: carrier identifier, for example correos_priority\\n *  .... ",
                "type": "text"
              }
            ],
            "is_error": false,
            "tool_use_id": "toolu_01Sn9Rw5SkaJ26VjKJnAXK3o",
            "type": "tool_result"
          }
        ],
        "role": "user"
      }

This request also adds another tool definition for the execute tool, which is the tool the model can use to run JavaScript code that calls the helpers.

      {
        "description": "Execute synchronous JavaScript only. The sandbox supports ECMAScript 14 (ES2023) only, and the flow is sync so it does not support async or await. The sandbox does not support any Web APIs. Use the helper definitions returned by search, for example demo_add_numbers({...}). Keep the code minimal and do not write comments.",
        "input_schema": {
          "properties": {
            "code": {
              "description": "JavaScript body to execute inside a synchronous IIFE. End by returning a value. Do not write comments.",
              "type": "string"
            }
          },
          "required": [
            "code"
          ],
          "type": "object"
        },
        "name": "execute"
      }

This concludes the search phase and solves the first problem. The model now has a compact, relevant tool surface to work with instead of having to read through the full list of tools in the initial context. The search response also prepares the way for the next step, which is to write code that calls these tools. The model can now focus on the relevant tools and how to use them instead of being overwhelmed by a large number of irrelevant tools.

Orchestration in Code ¶

With the API definitions for the helper tools in place, the model can now write JavaScript code that performs the orchestration needed to answer the user's question. The model uses the execute tool to run that JavaScript code in a sandboxed environment and get the final result back. All helpers listed in api_definition are available as normal JavaScript functions in the sandbox, so the model can write code to call them directly without worrying about the underlying MCP transport or tool-calling mechanics.

The code can use loops, conditionals, and local variables to manage the flow of data between helper calls. This way, the model can do all the orchestration in code. It can pass the result of one helper call into another, loop through a list of items and call helpers on each item, and filter and sort results.

The system prompt also guides the model to prefer writing one JavaScript program that does all the work instead of calling tools one by one and getting intermediate results back into the model context. The idea is to push as much of the work as possible into code and only get the final result back into the model context. This reduces the number of turns and the amount of data sent back and forth between the model and the client.

The system prompt is also written to encourage the model to do the orchestration in code. For this demo application, that prompt works quite well, but it will likely need adjustment for different applications or different models. The exact wording can have a big impact on how well the model uses the code mode pattern, so it may require some experimentation to find the right phrasing for a given use case.

func fullSystemPrompt() string {
  return strings.TrimSpace(`You are a helpful assistant. You have access to helpers that can help you answer the user's question. 
Use them if it helps you answer better.
Use search to discover the relevant helpers.

After search returns helper definitions, prefer a single execute(code) call that completes the full computation.
If one helper's output can be passed directly into another helper inside the same JavaScript snippet, 
do that instead of making multiple execute calls with intermediate results.

Also when you need certain parts of the response of one helper to decide how to call another helper, 
it's better to do that orchestration in a single execute call with JavaScript, rather than making multiple 
tool calls. For example if the helper returns a list of items and you want to call another helper on each item, it's better to do that 
iteration within the same JavaScript snippet. 

You have a ECMAScript 2023 environment to your disposal in the execute(code) helper, and you can use it to orchestrate 
calls to other helpers as needed.
`)
}

loop.go

The response from the LLM to the second request contains another tool-calling response. This time, it calls the execute tool with JavaScript code.

      {
        "caller": {
          "type": "direct"
        },
        "id": "toolu_01EDUDPPJxweBsnt7r3htp1D",
        "input": {
          "code": "
             const carriers = demo_list_carriers({ origin_country: \"DE\", destination_country: \"ES\" }).carriers;
             
             const results = carriers.map(carrier =\u003e {
                const delivery = demo_estimate_delivery({ carrier, origin_country: \"DE\", destination_country: \"ES\" });
                const quote = demo_quote_rate({ carrier, origin_country: \"DE\", destination_country: \"ES\", weight_kg: 2.4 });
                const surcharge = demo_apply_surcharge({ carrier, weight_kg: 2.4, is_fragile: false, is_remote_area: false });
                const summary = demo_quote_summary({
                    carrier,
                    base_price_eur: quote.base_price_eur,
                    surcharge_eur: surcharge.total_surcharge_eur,
                    min_days: delivery.min_days,
                    max_days: delivery.max_days
                });
                return summary;
            });
            
            const acceptable = results.filter(r => r.max_days <= 4);
            acceptable.sort((a, b) => a.total_price_eur - b.total_price_eur);
            
            return acceptable[0] || null;
        "
        },
        "name": "execute",
        "type": "tool_use"
      }

The code is formatted here for readability. In the actual response, it is just a single line without any line breaks.

All the functions that start with demo_ are the tools returned in the tool-search response. In the code, the result from demo_list_carriers drives calls to demo_estimate_delivery, demo_quote_rate, demo_apply_surcharge, and demo_quote_summary. The code then filters the results and sorts them, all within the same JavaScript snippet. This is the key part of code mode: the model does the orchestration in code and only gets the final result back into the model context.

In a normal tool-calling process, the model would have to call demo_list_carriers, get the result back into the model context, then call demo_estimate_delivery for each carrier, get those results back into the model context, and so on. Most models can do parallel tool calling, so they could call demo_estimate_delivery, demo_quote_rate, demo_apply_surcharge, and demo_quote_summary in parallel for each carrier, but they would still need to get all those intermediate results back into the model context. With code mode, the model can do all of that in one go and only get the final selected option back into the model context.

Execute the Generated Code ¶

The LLM sends back JavaScript code, and the application needs a way to run it safely. In Go, the quickjs library implements an ECMAScript 14 (ES2023) engine in pure Go. It is a port of the C library QuickJS.

This engine only supports JavaScript and does not implement any Web APIs. So it has no way to write to disk, make network requests, or do anything else that could be dangerous. That is ideal for this use case because the sandbox should only allow the model to call the helper functions exposed by the application and not do anything else. Through those helper functions, the application can still implement things like network requests or file access if needed, while retaining full control over what the model can do.

The following code shows how to initialize a new QuickJS VM in Go code. You can specify the memory limit and the execution timeout to prevent infinite loops or excessive memory usage in the generated code.

  vm, err := quickjs.NewVM()
  if err != nil {
    return Result{}, fmt.Errorf("create quickjs vm: %w", err)
  }
  defer func() {
    if closeErr := vm.Close(); closeErr != nil && err == nil {
      err = fmt.Errorf("close quickjs vm: %w", closeErr)
    }
  }()

  vm.SetMemoryLimit(s.memoryLimitByte)
  if err := vm.SetEvalTimeout(s.evalTimeout); err != nil {
    return Result{}, fmt.Errorf("set eval timeout: %w", err)
  }

quickjs.go

The important part is bridging the tools retrieved from the MCP server into the QuickJS environment. The goal is to expose each tool as a JavaScript function that the model can call. The code running inside QuickJS should not have to worry about the MCP transport or how to call the tools. It should simply be able to call a normal JavaScript function with the correct arguments and get the result back. To do this, the code iterates over the tools in the catalog and registers a new function in the QuickJS VM for each tool. The function takes a JSON string as input, which contains the arguments for the tool. Whenever QuickJS encounters the function call in the JavaScript code, it invokes this callback, which parses the arguments, calls the tool through the MCP client, and returns the result to QuickJS as a JSON string. This gives the application a clean bridge between the JavaScript code and the MCP tools, and the model can call the tools as if they were normal JavaScript functions.

  for _, item := range s.catalog {
    toolName := item.Name
    bridgeName := "__bridge_" + item.Callable
    if err := vm.RegisterFunc(bridgeName, func(payload string) string {
      var args map[string]any
      if err := json.Unmarshal([]byte(payload), &args); err != nil {
        return marshalBridgeResponse(nil, fmt.Sprintf("parse tool args: %v", err))
      }
      toolCtx, cancel := context.WithTimeout(ctx, s.evalTimeout)
      defer cancel()
      result, err := s.runtime.CallTool(toolCtx, toolName, args)
      if err != nil {
        return marshalBridgeResponse(nil, err.Error())
      }
      if result.IsError {
        return marshalBridgeResponse(nil, flattenContent(result))
      }
      if result.StructuredContent != nil {
        return marshalBridgeResponse(result.StructuredContent, "")
      }
      return marshalBridgeResponse(map[string]any{"content": flattenContent(result)}, "")
    }, false); err != nil {
      return Result{}, fmt.Errorf("register %s: %w", bridgeName, err)
    }
  }

quickjs.go

The application then adds a prelude that defines console.log and the tool functions, wraps the generated code in a synchronous Immediately Invoked Function Expression (IIFE), and evaluates it in the QuickJS VM (vm.Eval).

Because the __bridge_ functions expect a JSON string as input and return a JSON string as output, the prelude also defines normal JavaScript functions for each tool that the model can call. These functions take a normal JavaScript object as input, convert it to a JSON string, call the corresponding __bridge_ function, and then parse the JSON string result back into a JavaScript object. This way, the model can call these functions with normal JavaScript objects and get normal JavaScript objects back without needing to worry about the JSON serialization.

  prelude := s.prelude()
  wrapped := prelude + "\n(() => {\n" + code + "\n})()"
  value, err := vm.Eval(wrapped, quickjs.EvalGlobal)
  if err != nil {
    return Result{}, fmt.Errorf("execute javascript: %w", err)
  }

  return Result{
    Logs:  logs,
    Value: value,
  }, nil
}

func (s *Sandbox) prelude() string {
  var builder strings.Builder
  builder.WriteString("const console = {\n")
  builder.WriteString("  log: (...args) => __host_log(JSON.stringify(args)),\n")
  builder.WriteString("};\n")
  for _, item := range s.catalog {
    fmt.Fprintf(&builder, "function %s(args) { const response = JSON.parse(__bridge_%s(JSON.stringify(args || {}))); if (response.error) { throw new Error(response.error); } return response.value; }\n", item.Callable, item.Callable)
  }
  return builder.String()
}

quickjs.go

After QuickJS evaluates the code, it returns the final result, which the application then sends to the LLM in the third request to the LLM.

      {
        "content": [
          {
            "content": [
              {
                "text": "{\"logs\":[],\"value\":{\"base_price_eur\":12.16,\"carrier\":\"correos_priority\",\"currency\":\"EUR\",\"delivery_window\":\"3-4 business days\",\"max_days\":4,\"min_days\":3,\"surcharge_eur\":0.75,\"total_price_eur\":12.91}}",
                "type": "text"
              }
            ],
            "is_error": false,
            "tool_use_id": "toolu_01EDUDPPJxweBsnt7r3htp1D",
            "type": "tool_result"
          }
        ],
        "role": "user"
      }

The logs field would contain the output of any console.log statements. The value field contains the final result of the code execution, which is the value returned by the last statement in the JavaScript code (return acceptable[0] || null;).

The LLM can now either answer the user's question directly based on this result or do another tool call if it needs additional processing. Ideally, though, the model should be able to answer directly based on this final result without needing any more tool calls.

In this case, the response from the LLM to the third request is a normal response with the final answer:

    "content": [
      {
        "text": "**Correos Priority** — **€12.91**\n- Delivery: 3–4 business days\n- Base price: €12.16 + €0.75 surcharge",
        "type": "text"
      }
    ],

This concludes the code mode workflow. Compared to a normal tool-calling workflow, the application does one additional request/response cycle for the search step, but after that the model can do all the tool calls and orchestration in one go and only get the final result back into the model context. This is a much more efficient way to use tools, especially when a workflow requires multiple dependent tool calls. The model can do all the heavy lifting in code and only get the final answer back into the model context.

Request/response flow for the shipping example ¶

As an illustration of the full flow, here is a sequence diagram that shows the interactions between the user, the application, the LLM, and the MCP runtime.

Wrapping up ¶

Code mode can save a lot of context tokens by exposing only a search tool and then letting the model discover the relevant tools for the task. It can also save a lot of request/response cycles by letting the model write code that orchestrates multiple tool calls instead of calling tools one by one and getting intermediate results back into the model context.

Code mode does not make tool calling or MCP obsolete. It is simply a different way to use them, and it can be more efficient in certain scenarios, especially when you have a large number of tools and complex workflows. You can also use code mode and normal tool calling together in the same application and let the model decide which one to use based on the task at hand.

The whole pattern can be implemented in a relatively small amount of code, and it does not require any specific features from the LLM provider. The main requirement is that the platform where you implement this can execute code, ideally in a sandboxed environment for security reasons, and that you can map tool calls from that code into your MCP runtime. QuickJS solves this quite nicely in Go.