Chat Completionを作成

承認

Authorization

string

header

必須

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

ボディ

application/json

messages

必須

ユーザーが送信したメッセージにかかわらず、モデルが従うべき開発者指定の指示です。o1 モデル以降では、developer メッセージが従来の system メッセージに置き換わります。

Show child attributes

model

string | null

frequency_penalty

number | null

デフォルト:0

logit_bias

Logit Bias · object

Show child attributes

logprobs

boolean | null

デフォルト:false

top_logprobs

integer | null

デフォルト:0

max_tokens

integer | null

非推奨

max_completion_tokens

integer | null

デフォルト:1

presence_penalty

number | null

デフォルト:0

response_format

ResponseFormat · object

ResponseFormat
StructuralTagResponseFormat
LegacyStructuralTagResponseFormat

Show child attributes

seed

integer | null

必須範囲: -9223372036854776000 <= x <= 9223372036854776000

stop

デフォルト:[]

stream

boolean | null

デフォルト:false

stream_options

StreamOptions · object

Show child attributes

temperature

number | null

top_p

number | null

tools

ChatCompletionToolsParam · object[] | null

Show child attributes

tool_choice

デフォルト:none

Allowed value: "none"

reasoning_effort

enum<string> | null

利用可能なオプション:

low,

medium,

high

include_reasoning

boolean

デフォルト:true

parallel_tool_calls

boolean | null

デフォルト:true

user

string | null

use_beam_search

boolean

デフォルト:false

top_k

integer | null

min_p

number | null

repetition_penalty

number | null

length_penalty

number

デフォルト:1

stop_token_ids

integer[] | null

include_stop_str_in_output

boolean

デフォルト:false

ignore_eos

boolean

デフォルト:false

min_tokens

integer

デフォルト:0

skip_special_tokens

boolean

デフォルト:true

spaces_between_special_tokens

boolean

デフォルト:true

truncate_prompt_tokens

integer | null

必須範囲: -1 <= x <= 9223372036854776000

prompt_logprobs

integer | null

allowed_token_ids

integer[] | null

bad_words

string[]

echo

boolean

デフォルト:false

true の場合、同じロールに属していれば、新しいメッセージは直前のメッセージの前に追加されます。

add_generation_prompt

boolean

デフォルト:true

true の場合、生成プロンプトが chat template に追加されます。これは、モデルの tokenizer 設定内の chat template で使用されるパラメーターです。

continue_final_message

boolean

デフォルト:false

これが設定されている場合、チャットは末尾のメッセージが EOS トークンなしの未完了の状態になるように整形されます。モデルは新しいメッセージを開始するのではなく、そのメッセージの続きを生成します。これにより、モデルの応答の一部を事前に埋めておくことができます。add_generation_prompt とは同時に使用できません。

add_special_tokens

boolean

デフォルト:false

true の場合、chat template によって追加されるものに加えて、特殊トークン（例: BOS）もプロンプトに追加されます。ほとんどのモデルでは、特殊トークンの追加は chat template が処理するため、これは false に設定する必要があります（デフォルト値も false です）。

documents

Documents · object[] | null

モデルが RAG（検索拡張生成）を実行する場合にアクセスできるドキュメントを表す dict のリストです。テンプレートが RAG をサポートしていない場合、この引数は効果を持ちません。各ドキュメントは、"title" キーと "text" キーを含む dict にすることを推奨します。

Show child attributes

chat_template

string | null

この変換に使用する Jinja テンプレートです。transformers v4.44 以降ではデフォルトの chat template は使用できないため、tokenizer で chat template が定義されていない場合は、chat template を指定する必要があります。

chat_template_kwargs

Chat Template Kwargs · object

テンプレート renderer に渡す追加のキーワード引数です。chat template からアクセスできます。

mm_processor_kwargs

Mm Processor Kwargs · object

HF processor に渡す追加の kwargs です。

structured_outputs

StructuredOutputsParams · object

structured outputs 用の追加の kwargs です。

Show child attributes

priority

integer

デフォルト:0

リクエストの優先度です（値が小さいほど先に処理されます。デフォルト: 0）。Serve されたモデルが優先度スケジューリングを使用していない場合、0 以外の優先度を指定するとエラーになります。

request_id

string

このリクエストに関連する request_id です。呼び出し元が設定しない場合は、random_uuid が生成されます。この ID は Inference プロセス全体を通じて使用され、Response で返されます。

logits_processors

(string | LogitsProcessorConstructor · object)[] | null

サンプリング時に適用する logits processor の完全修飾名、またはコンストラクター object の list です。コンストラクターは JSON object で、プロセッサークラスまたはファクトリーの完全修飾名を指定する必須の 'qualname' フィールドと、位置引数およびキーワード引数を含む省略可能な 'args' フィールドと 'kwargs' フィールドを持ちます。例: {'qualname': 'my_module.MyLogitsProcessor', 'args': [1, 2], 'kwargs': {'param': 'value'}}。

return_tokens_as_token_ids

boolean | null

'logprobs' を指定した場合、JSON にエンコードできない token を識別できるよう、token は 'token_id:{token_id}' 形式の文字列として表されます。

return_token_ids

boolean | null

指定した場合、結果には生成されたテキストに加えて token ID も含まれます。ストリーミングモードでは、prompt_token_ids は最初の chunk にのみ含まれ、token_ids には各 chunk の差分 token が含まれます。これはデバッグ時や、生成テキストを入力 token に対応付ける必要がある場合に役立ちます。

cache_salt

string | null

指定した場合、複数ユーザー環境で攻撃者がプロンプトを推測することを防ぐため、prefix cache に指定した文字列でソルトを追加します。ソルトはランダムで、第三者が access できないよう保護され、かつ予測不能であるのに十分な長さである必要があります（例: 256 bit に相当する、base64 エンコードで 43 文字）。

kv_transfer_params

Kv Transfer Params · object

分離サービングに使用される KVTransfer パラメーター。

vllm_xargs

Vllm Xargs · object

custom 拡張機能で使用される、文字列または数値の値（またはその list）からなる追加の request パラメーター。

Show child attributes

レスポンス

正常なレスポンス

model

string

必須

choices

ChatCompletionResponseChoice · object[]

必須

Show child attributes

usage

UsageInfo · object

必須

Show child attributes

string

object

string

デフォルト:chat.completion

Allowed value: "chat.completion"

created

integer

service_tier

enum<string> | null

利用可能なオプション:

auto,

default,

flex,

scale,

priority

system_fingerprint

string | null

prompt_logprobs

(object | null)[] | null

Show child attributes

prompt_token_ids

integer[] | null

kv_transfer_params

Kv Transfer Params · object

KVTransfer パラメーター。

Serverless RL

Serverless SFT

API Reference

承認

ボディ

レスポンス