Code Mode for Tool Calling in Java

In a previous blog post I showed you how to implement the code mode pattern in Go. In this post, I will show you how to implement the same pattern in Java.

Check out the blog for the full explanation of the code mode pattern, the problems it solves, and how it works. This post will focus on the Java implementation of the same pattern.

In short, code mode is a way to use tools that can save context tokens and reduce the number of request/response cycles by letting the model write code that orchestrates tool calls. Our client sends the LLM an API definition of all the available tools, the LLM writes code to orchestrate those tools, and the client executes that code in a sandboxed environment and returns the final result to the LLM. This way, the model can do complex workflows with multiple dependent tool calls in one go without needing to get intermediate results back into the model context.

The example in this blog post is the same demo application that I use in the Go blog post, just translated to Java. It is a shipping quote application where the user can ask for the best shipping option for a given parcel and route, and the model uses various tools to get the available carriers, get quotes and delivery estimates for each carrier, apply surcharges, and return a final recommendation.

MCP server ¶

A MCP server hosts all the tools that we will use in this demo application. The MCP server is a simple Spring Boot application that uses the official MCP Java SDK to define and expose the tools. Note that Spring AI would simplify the implementation of the MCP server, but the latest version depends on an older version of the MCP Java SDK that does not have one specific feature (output schema) that we need for code mode to work well, so for this demo the MCP server is implemented without Spring AI.

You can find the implementation of the MCP server on GitHub. The MCP server can be started like any other Spring Boot application: ./mvnw spring-boot:run. This server will listen on port 8081 for incoming MCP client connections.

Setup ¶

The client application is also a Spring Boot application that uses Spring AI for the interaction with the LLM. The client also uses the official MCP Java SDK to connect to the MCP server and call the tools. As on the MCP server side, we use the latest version of the MCP Java SDK without Spring AI because we need the output schema feature for code mode to work well. As soon as Spring AI updates to a version of the MCP Java SDK that has the output schema feature, we can simplify both the MCP server and client implementation by using Spring AI's abstractions for tools and MCP.

The demo application uses GPT-5.4 as the LLM. With Spring AI you can configure these settings in the application properties:

spring:
  application:
    name: codemode-demo

  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-5.4
          max-completion-tokens: 10000

codemode:
  mcp:
    base-url: http://localhost:8081
    endpoint: /mcp
    api-key: ${DEMO_MCP_SERVER_API_KEY:demo-mcp-api-key}
  max-turns: 10

application.yml

The application properties also include the configuration for connecting to the MCP server, including the base URL, the endpoint, and the API key. The max-turns property is used to limit the number of turns in the conversation with the LLM to prevent infinite loops.

The MCP client setup in the demo is straightforward. It creates a synchronous MCP client over streamable HTTP, adds the API key header, and initializes the connection once during startup.

    HttpClientStreamableHttpTransport transport = HttpClientStreamableHttpTransport.builder(baseUrl)
      .endpoint(endpoint)
      .httpRequestCustomizer((requestBuilder, _, _, _, _) -> {
        if (apiKey != null && !apiKey.isBlank()) {
          requestBuilder.header("X-api-key", apiKey);
        }
      })
      .build();

    this.client = McpClient.sync(transport)
      .clientInfo(McpSchema.Implementation.builder("codemode-demo", "0.0.1").build())
      .requestTimeout(Duration.ofSeconds(30))
      .build();
    this.client.initialize();

DemoMcpClient.java

At the start of the program, the client fetches the list of all tools from the MCP server and builds an index for the tool search tool.

  @PostConstruct
  void init() {
    List<ToolEntry> entries = new ArrayList<>();

    for (McpSchema.Tool tool : this.mcpClient.listTools()) {
      String serverPrefix = "demo";
      String callable = serverPrefix + "_" + tool.name();
      Map<String, Object> inputSchema = toSchemaMap(tool.inputSchema());
      Map<String, Object> outputSchema = tool.outputSchema() == null ? Map.of() : tool.outputSchema();
      entries.add(new ToolEntry(tool.name(), callable, tool.description(), inputSchema, outputSchema));
    }

    entries.sort(Comparator.comparing(ToolEntry::name));

    for (ToolEntry entry : entries) {
      this.catalog.put(entry.callable(), entry);
      this.toolSearcher.indexTool(SESSION,
          ToolReference.builder()
            .toolName(entry.callable())
            .summary(entry.name() + ": " + entry.description())
            .build());
    }
  }

ToolCatalog.java

Turn 1: Initial request ¶

This step is not part of the code mode pattern itself, but it fits very well with code mode. Using code mode is especially beneficial when you have a large number of tools, and including the description of all these tools in the initial context would waste a lot of tokens, especially if many of those tools are not relevant for the task at hand. The search tool solves this problem by letting the model discover the relevant tools for the task.

To implement the tool search tool we use a library from the Spring AI community repository: spring-ai-tool-search-tool. This library uses Lucene underneath to build an index of the tools and perform efficient searches based on keywords.

In the first request to the LLM the application sends the user query and one tool, the tool-search tool. In the system prompt we instruct the model to use the tool-search tool to find the relevant tools that it needs to answer the user query.

    List<ToolCallback> toolCallbacks = firstTurn ? List.of(definitionOnly(searchToolDefinition()))
        : List.of(definitionOnly(searchToolDefinition()), definitionOnly(executeToolDefinition()));

    OpenAiChatOptions options = OpenAiChatOptions.builder()
      .toolCallbacks(toolCallbacks)
      .toolChoice(firstTurn ? "required" : "auto")
      .build();

    return new Prompt(allMessages, options);

CodeModeService.java

Turn 1: Tool call response ¶

If the LLM is not able to answer the user query with its own knowledge, it will send a tool call response with the tool-search tool and a query for the relevant tools. The Java application uses the spring-ai-tool-search-tool library to perform the search.

  private String handleSearch(String argsJson) {
    try {
      Map<String, Object> args = this.objectMapper.readValue(argsJson, Map.class);
      String query = (String) args.getOrDefault("query", "");
      int limit = args.containsKey("limit") ? ((Number) args.get("limit")).intValue() : 8;
      String apiDefs = this.catalog.search(query, limit);
      return this.objectMapper.writeValueAsString(Map.of("api_definition", apiDefs));
    }
    catch (Exception e) {
      return errorJson("search failed: " + e.getMessage());
    }
  }

CodeModeService.java

The method this.catalog.search performs the search and returns a string that contains the API definition of the found tools.

  public String search(String query, int limit) {
    var response = this.toolSearcher.search(new ToolSearchRequest(SESSION, query, limit <= 0 ? 8 : limit, null));
    List<ToolEntry> matched = new ArrayList<>();
    for (var ref : response.toolReferences()) {
      ToolEntry entry = this.catalog.get(ref.toolName());
      if (entry != null) {
        matched.add(entry);
      }
    }
    return helperDefinitions(matched);
  }

ToolCatalog.java

The method that generates the API definition from the tool entries uses the input and output schemas of the tools to create a JSDoc-like comment that describes the function, the parameters, and the return value. This way, the model has all the information it needs to use the tools correctly in its code.

  private String helperDefinitions(List<ToolEntry> entries) {
    if (entries.isEmpty()) {
      return "// No helper functions matched this search.\n";
    }
    var sb = new StringBuilder();
    for (ToolEntry entry : entries) {
      sb.append("/**\n");
      String description = sanitizeJSDocText(entry.description());
      if (!description.isEmpty()) {
        sb.append(" * ").append(description).append("\n");
      }
      sb.append(" * @param ").append(jsDocTypeTag(entry.inputSchema())).append(" args\n");
      writeSchemaFieldDescriptions(sb, "Input fields", "args", entry.inputSchema());
      sb.append(" * @returns ").append(jsDocTypeTag(entry.outputSchema())).append(" result\n");
      writeSchemaFieldDescriptions(sb, "Output fields", "result", entry.outputSchema());
      sb.append(" */\n");
      sb.append("function ").append(entry.callable()).append("(args) {}\n\n");
    }
    return sb.toString();
  }

ToolCatalog.java

An example of the generated API definition for a tool looks like this:

/**
 * Estimate a deterministic business-day delivery window for a carrier and route.
 * @param {{ carrier: string, destinationCountry: string, originCountry: string }} args
 * Input fields:
 *   - args.carrier: Carrier identifier
 *   - args.destinationCountry: Destination ISO country code
 *   - args.originCountry: Origin ISO country code
 * @returns {{ carrier: string, destinationCountry: string, maxDays: number, minDays: number, originCountry: string }} result
 * Output fields:
 *   - result.carrier: The carrier identifier used for the estimate.
 *   - result.destinationCountry: The destination ISO 3166-1 alpha-2 country code.
 *   - result.maxDays: The maximum estimated delivery time in business days.
 *   - result.minDays: The minimum estimated delivery time in business days.
 *   - result.originCountry: The origin ISO 3166-1 alpha-2 country code.
 */
function demo_estimate_delivery(args) {}

// more tool definitions...

Turn 2: Request with API definition and execution tool ¶

The application sends the whole API definition as one string in the tool call response to the LLM. In the system prompt we instruct the model to write JavaScript code that uses the defined API to answer the user query. The model can use any of the tools in the API definition, and it can make multiple calls to the same tool or different tools as needed. The model can also use control flow statements like loops and conditionals to orchestrate the tool calls and handle the logic of the workflow.

In this request we send one additional tool, the code-execution tool. The model should call this tool with the JavaScript code that it wrote to orchestrate the tool calls.

  private static Prompt buildPrompt(List<Message> messages, int turn) {
    boolean firstTurn = turn == 0;
    List<Message> allMessages = new ArrayList<>();
    allMessages.add(new SystemMessage(systemPromptForTurn(turn)));

    allMessages.addAll(messages);

    List<ToolCallback> toolCallbacks = firstTurn ? List.of(definitionOnly(searchToolDefinition()))
        : List.of(definitionOnly(searchToolDefinition()), definitionOnly(executeToolDefinition()));

    OpenAiChatOptions options = OpenAiChatOptions.builder()
      .toolCallbacks(toolCallbacks)
      .toolChoice(firstTurn ? "required" : "auto")
      .build();

    return new Prompt(allMessages, options);
  }

CodeModeService.java

Turn 2: Response with orchestration JavaScript code ¶

The LLM sends another tool call response with the code-execution tool and the JavaScript code that it wrote. The Java application executes this code in a sandboxed environment using the GraalVM JavaScript engine. All the tools from the MCP server are available in the sandboxed environment as JavaScript functions, so the model can call them directly in its code.

The handleExecute method validates the payload, runs the code in the sandbox, and returns both the final value and any captured logs.

  private String handleExecute(String argsJson) {
    try {
      Map<String, Object> args = this.objectMapper.readValue(argsJson, Map.class);
      String code = (String) args.get("code");
      if (code == null || code.isBlank()) {
        return errorJson("'code' is required");
      }
      log.debug("code: {}", code);
      JsSandbox.ExecutionResult result = this.sandbox.execute(code);
      log.debug("result: {}", result);
      return this.objectMapper.writeValueAsString(Map.of("logs", result.logs(), "value", result.value()));
    }
    catch (Exception e) {
      return errorJson("execute failed: " + e.getMessage());
    }
  }

CodeModeService.java

Turn 3: Request with code execution result ¶

The application then sends another request to the LLM with the result of the code execution and the captured logs.

Turn 3: Final response with answer ¶

In the ideal case, the code execution returns all the necessary information that the LLM needs to answer the user query. In that case, the final response from the LLM is a normal response with the answer to the user query. But this could also be another tool call response if the model needs to call more tools or search for more tools. This can lead to multiple turns of code generation and execution until the model has all the information it needs to answer the user query. The best case scenario is that the model can do everything in one go and only needs one turn of code generation and execution, but in practice it might need multiple turns to get all the information it needs.

Executing JavaScript code in Java ¶

The core of the code mode pattern is the ability to execute JavaScript code in a sandboxed environment. This allows the model to write code that orchestrates tool calls and executes that code in one go, without needing to get intermediate results back into the model context. In this section we take a closer look at how to execute JavaScript code in Java using the GraalVM JavaScript engine.

In a Java application, we can use the GraalVM JavaScript engine to execute JavaScript code. The GraalVM JavaScript engine can be embedded in any Java application. It provides a sandboxed environment for executing JavaScript code, which is essential for security when executing code generated by an LLM.

    <dependency>
      <groupId>org.graalvm.polyglot</groupId>
      <artifactId>polyglot</artifactId>
      <version>${graal.version}</version>
    </dependency>
    <dependency>
      <groupId>org.graalvm.polyglot</groupId>
      <artifactId>js</artifactId>
      <version>${graal.version}</version>
      <type>pom</type>
    </dependency>

pom.xml

The sandbox starts with a restricted JavaScript context and then injects a small bridge layer. Each MCP tool is registered as a synchronous JavaScript function. Whenever one of these functions is called, the bridge translates the JavaScript arguments into JSON, calls the corresponding MCP tool with the JSON payload, and translates the JSON response back into a JavaScript value.

allowAllAccess is set to false to prevent the executed code from accessing the Java internals of the application. The only way for the executed code to interact with the outside world is through the defined bridges. This is important for security, as it prevents the executed code from doing anything malicious or unintended in the Java application.

  public ExecutionResult execute(String code) {
    List<String> logs = new ArrayList<>();

    try (Context ctx = Context.newBuilder("js").allowAllAccess(false).option("js.strict", "false").build()) {

      Value bindings = ctx.getBindings("js");

      // console.log bridge
      bindings.putMember("__host_log", (org.graalvm.polyglot.proxy.ProxyExecutable) args -> {
        StringBuilder sb = new StringBuilder();
        for (Value arg : args) {
          sb.append(arg.toString());
        }
        logs.add(sb.toString());
        return null;
      });

      // Register each MCP tool as a synchronous JS callable
      for (ToolCatalog.ToolEntry entry : this.catalog.all()) {
        String callable = entry.callable();

        bindings.putMember("__bridge_" + callable, (org.graalvm.polyglot.proxy.ProxyExecutable) args -> {
          String argsJson = args.length > 0 ? args[0].asString() : "{}";
          return invokeTool(entry.name(), argsJson);
        });
      }

JsSandbox.java

The prelude that is prepended to the model-generated code exposes these bridges as normal JavaScript functions.

  private String buildPrelude() {
    var sb = new StringBuilder();
    sb.append("const console = {\n");
    sb.append("  log: (...args) => __host_log(args.map(a => JSON.stringify(a)).join(' ')),\n");
    sb.append("};\n");

    for (ToolCatalog.ToolEntry entry : this.catalog.all()) {
      String callable = entry.callable();
      sb.append("function ")
        .append(callable)
        .append("(args) { const r = JSON.parse(__bridge_")
        .append(callable)
        .append("(JSON.stringify(args || {}))); if (r.error) { throw new Error(r.error); } return r.value; }\n");
    }
    return sb.toString();
  }

JsSandbox.java

Finally, the sandbox evaluates the JavaScript program and converts the result back into plain Java values.

      String prelude = buildPrelude();
      String wrapped = prelude + "\n(() => {\n" + code + "\n})()";
      Value result = ctx.eval("js", wrapped);

      Object value = toJavaValue(result);
      return new ExecutionResult(logs, value);

JsSandbox.java

Wrapping up ¶

Implementing the code mode pattern in Java is not too complicated, thanks to the GraalVM JavaScript engine that allows us to execute JavaScript code in a sandboxed environment. The main challenge is to define a clear API for the tools and to instruct the model to use that API correctly in its code. I had to tweak the system prompt a few times to get the desired behavior from the model.