mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-03-01 21:00:04 +01:00
* common : fix Step-3.5-Flash format detection and thinking support Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder (<tool_call><function=...><parameter=...>) but its Jinja template lacks the bare <function> and plural <parameters> markers that the detection logic previously required. This caused it to fall through to Hermes 2 Pro, which doesn't call func_args_not_string(), so arguments stayed as JSON strings and templates using arguments|items crashed. Additionally, the Qwen3-Coder-XML format handler had no thinking support. Models like Step-3.5-Flash that unconditionally emit <think> in their generation prompt need the same thinking_forced_open handling that Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content is never separated from content in API responses. Changes: - Relax Qwen3-Coder XML detection to only require the 3 shared markers - Tighten Nemotron v3 branch to also require bare <function> and plural <parameters>, preventing Step-3.5-Flash from being misrouted via <think> - Add thinking_forced_open support to Qwen3-Coder-XML init function - Add <think>/</think> to preserved tokens - Fix build_grammar_xml_tool_call to handle thinking_forced_open in the grammar root rule, allowing </think> before tool calls - Add Step-3.5-Flash chat template and format detection test Builds on: https://github.com/ggml-org/llama.cpp/pull/19283 * chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with unconditional <think> output. Route it to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing. Detection: templates with <think> + XML tool tags use Nemotron v3 PEG parser; templates without <think> (Qwen3-Coder) use GBNF grammar. Tests cover: basic messages, tool calls with/without thinking content, parallel tool calls, code string parameters, optional </parameter> closing tags, and JSON schema response format. * chat : remove dead thinking code from qwen3_coder_xml Remove thinking handling code that became unreachable after routing Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no <think> in its template, so the thinking_forced_open logic, preserved tokens, and grammar prefix were dead paths.
81 lines
4.9 KiB
Django/Jinja
81 lines
4.9 KiB
Django/Jinja
{% macro render_content(content) %}{% if content is none %}{{- '' }}{% elif content is string %}{{- content }}{% elif content is mapping %}{{- content['value'] if 'value' in content else content['text'] }}{% elif content is iterable %}{% for item in content %}{% if item.type == 'text' %}{{- item['value'] if 'value' in item else item['text'] }}{% elif item.type == 'image' %}<im_patch>{% endif %}{% endfor %}{% endif %}{% endmacro %}
|
|
{{bos_token}}{%- if tools %}
|
|
{{- '<|im_start|>system\n' }}
|
|
{%- if messages[0].role == 'system' %}
|
|
{{- render_content(messages[0].content) + '\n\n' }}
|
|
{%- endif %}
|
|
{{- "# Tools\n\nYou have access to the following functions in JSONSchema format:\n\n<tools>" }}
|
|
{%- for tool in tools %}
|
|
{{- "\n" }}
|
|
{{- tool | tojson(ensure_ascii=False) }}
|
|
{%- endfor %}
|
|
{{- "\n</tools>\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...>\n...\n</function> block must be nested within <tool_call>\n...\n</tool_call> XML tags\n- Required parameters MUST be specified\n</IMPORTANT><|im_end|>\n" }}
|
|
{%- else %}
|
|
{%- if messages[0].role == 'system' %}
|
|
{{- '<|im_start|>system\n' + render_content(messages[0].content) + '<|im_end|>\n' }}
|
|
{%- endif %}
|
|
{%- endif %}
|
|
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
|
{%- for message in messages[::-1] %}
|
|
{%- set index = (messages|length - 1) - loop.index0 %}
|
|
{%- if ns.multi_step_tool and message.role == "user" and render_content(message.content) is string and not(render_content(message.content).startswith('<tool_response>') and render_content(message.content).endswith('</tool_response>')) %}
|
|
{%- set ns.multi_step_tool = false %}
|
|
{%- set ns.last_query_index = index %}
|
|
{%- endif %}
|
|
{%- endfor %}
|
|
{%- for message in messages %}
|
|
{%- set content = render_content(message.content) %}
|
|
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
|
{%- set role_name = 'observation' if (message.role == "system" and not loop.first and message.name == 'observation') else message.role %}
|
|
{{- '<|im_start|>' + role_name + '\n' + content + '<|im_end|>' + '\n' }}
|
|
{%- elif message.role == "assistant" %}
|
|
{%- if message.reasoning_content is string %}
|
|
{%- set reasoning_content = render_content(message.reasoning_content) %}
|
|
{%- else %}
|
|
{%- if '</think>' in content %}
|
|
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
|
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
|
{%- else %}
|
|
{%- set reasoning_content = '' %}
|
|
{%- endif %}
|
|
{%- endif %}
|
|
{%- if loop.index0 > ns.last_query_index %}
|
|
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n' + content }}
|
|
{%- else %}
|
|
{{- '<|im_start|>' + message.role + '\n' + content }}
|
|
{%- endif %}
|
|
{%- if message.tool_calls %}
|
|
{%- for tool_call in message.tool_calls %}
|
|
{%- if tool_call.function is defined %}
|
|
{%- set tool_call = tool_call.function %}
|
|
{%- endif %}
|
|
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
|
{%- if tool_call.arguments is defined %}
|
|
{%- set arguments = tool_call.arguments %}
|
|
{%- for args_name, args_value in arguments|items %}
|
|
{{- '<parameter=' + args_name + '>\n' }}
|
|
{%- set args_value = args_value | tojson(ensure_ascii=False) | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
|
|
{{- args_value }}
|
|
{{- '\n</parameter>\n' }}
|
|
{%- endfor %}
|
|
{%- endif %}
|
|
{{- '</function>\n</tool_call>' }}
|
|
{%- endfor %}
|
|
{%- endif %}
|
|
{{- '<|im_end|>\n' }}
|
|
{%- elif message.role == "tool" %}
|
|
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
|
{{- '<|im_start|>tool_response\n' }}
|
|
{%- endif %}
|
|
{{- '<tool_response>' }}
|
|
{{- content }}
|
|
{{- '</tool_response>' }}
|
|
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
|
{{- '<|im_end|>\n' }}
|
|
{%- endif %}
|
|
{%- endif %}
|
|
{%- endfor %}
|
|
{%- if add_generation_prompt %}
|
|
{{- '<|im_start|>assistant\n<think>\n' }}
|
|
{%- endif %}
|