Playground Guide · Playground & Options

This guide helps you understand the Playground screen and covers the full workflow from choosing a model → adjusting options → using Web Search/Reasoning → working with variables → using results & history.

Last updated: 2026. 03. 05

Section 1

Playground layout

The Playground is your workbench for prompt experiments.
It’s designed around a simple loop: change inputs (prompt) and settings (model/options), compare results quickly, and restore past experiments using history.

1-1. Key areas (overview)

① Model & Options · Choose a model and set parameters such as temperature and max output tokens.
② Prompt editor · Write your question/instructions.
③ Prompt variables · Define variables used in your prompt and fill in values.
④ Results panel · Review output (copy/save/re-run).
⑤ Run / Reset · Run instantly or reset settings.
⑥ Analysis / Code tools · Analyze results (Eval/Improve) or view the equivalent API request code.
⑦ Projects/Templates/Panel actions · Manage prompts as assets, duplicate/create panels (multi-panel).
⑧ Run history · Browse previous runs and restore them with one click.

GPT Prompt Tester Playground — Example Playground layout (button positions/names may vary by version)

💡Tip: The Playground is built for repetition—adjust → run → review → tweak.
Start with defaults, then open up options little by little as you get comfortable.

Section 2

Run flow · Prompt → Run → Review

The most common workflow is these three steps. Once you’re comfortable with this loop, variables/templates/history will feel much more natural.

Write your prompt · Include goal, output format, and constraints (length, tone, required fields, etc.).
Choose model & options · Adjust temperature, max output tokens, Web Search/Reasoning if needed.
Run and review · Copy the result, or tweak one option and re-run.

2-1. A simple “3-line rule” for better prompts

Goal: One line describing what you want to produce
Format: List/table/JSON/Markdown, etc.
Constraints: Length, tone, exclusions, must-include items

💡Example: “Summarize this product manual (goal). Put it in a table (format). Don’t omit warnings and cautions (constraints).”

Section 3

Model selection guide

Models differ in performance, speed, cost, and strengths. Even with the same prompt, switching models can change tone, quality, and cost.

3-1. Quick criteria for choosing a model

When accuracy & reasoning matter: consider enabling Reasoning (when appropriate)
When speed matters: use a lighter model, reduce tokens, keep search off
When cost matters: reduce output length (and be careful with Web Search/Reasoning)
When writing long-form content: check max output tokens first

💡Build the habit of checking model cost and call volume in the Usage Dashboard. It quickly answers questions like “Why did my cost spike today?”

Section 4

Model options guide

Options shape the “personality” of your output—creativity, consistency, format stability, and length.
Below is a practical explanation based on the in-app tooltips.

Model options panel — Example model options screen

4-1. Temperature

Summary: Higher values produce more creative and less predictable outputs.
Details: Temperature controls randomness during sampling. Closer to 0 yields consistent answers; higher values yield more variety.
Recommended: For most use cases, 0.6 ~ 1.0 works well.

4-2. Top P

Summary: Limits token choices to the smallest set whose cumulative probability reaches P.
Details: Also called nucleus sampling. 1.0 considers all candidates; 0.9 focuses on the top ~90% probability mass.
Recommended: 0.8 ~ 1.0 is a safe range.

4-3. Max tokens

Summary: The maximum number of output tokens the model can generate in one response.
Details: Larger values allow longer responses but increase cost and latency. You cannot exceed the model’s limit (MODEL_MAX_OUTPUT_TOKENS).

4-4. Text format

Summary: Controls output format (text / json_object / json_schema).
Details: text is free-form text, json_object outputs a JSON object, and json_schema enforces schema-driven JSON output.

4-5. Reasoning (GPT-5)

Effort: Controls how much compute the model uses for reasoning. low is faster/lighter, high aims for deeper reasoning but can be slower. medium is recommended for typical use.
Summary: Whether to include a reasoning summary. off shows only the final result, on always includes a summary, auto includes it only when helpful.
Verbosity: How detailed the reasoning explanation should be. low is brief, high is detailed, medium is the default.

4-6. Web Search

Context size: Roughly how much content to pull from the web (small/medium/large). Larger sizes may use more sources and context.
Country: Country code for search localization (e.g., KR, US).
Domains: Prefer specific domains, comma-separated (e.g., openai.com, example.com).

💡Note
Options and tools may appear or behave differently depending on the selected model.
For example, Reasoning options may only be available on GPT-5 family models, and tools like Web Search may also vary by model/environment.
The options/tools shown in your app are the current “supported set” for the model you selected.
Hover over ⓘ to see the in-app tooltip explanations.

Section 5

Web Search · When should I turn it on?

Web Search helps the model pull in up-to-date information and cite sources when needed.
However, it can increase latency and cost.

5-1. Good times to turn it on

Information that changes over time: policies, pricing, versions, regulations, releases
When you need source-backed writing: news, official docs, references
Fact-checking in unfamiliar domains

5-2. When you can keep it off

Summarizing, restructuring, or rewriting text you already have
Internal work like refactoring code, polishing copy, designing templates
Principles, concepts, and methods that don’t change frequently

💡Tip: If Web Search feels slow, make your prompt more specific (narrow the search), and reduce unnecessarily long outputs (max output tokens).

Section 6

Reasoning · Quality vs. cost

Reasoning helps the model think more carefully through complex tasks. It can improve quality for logic-heavy work, but often increases response time and cost.

6-1. When it helps most

Requirements analysis, design comparisons, pros/cons evaluation, debugging direction
Structuring complex documents (policies/terms/specs) and checking for omissions
When you need an answer that satisfies many constraints

6-2. When you probably don’t need it

Short edits, simple summaries, marketing copy ideas

💡Tip: If Reasoning outputs feel too long/heavy, enforce a stricter output format and length in your prompt.

Section 7

Variables · Sample preview

Variables turn prompts into reusable templates. You keep the structure fixed and swap values (product name, audience, tone, etc.) to run repeatable tests.

7-1. Basic syntax

Declare variables in your prompt using {{variableName}}.
Fill values in the variables panel before running, or choose a sample preset.

7-2. Use Preview to prevent mistakes

Previewing the final prompt before running helps prevent errors.
It’s especially useful when you use multiple variables (e.g., {{product}}, {{audience}}, {{tone}}).

Variables and sample preview — Example variables panel and sample preview

💡Example: Generating the same “manual summary format” while swapping only the product name becomes dramatically faster with variables.

Section 8

Results panel · Copy/Save/Re-run tips

The results panel isn’t just for reading output. It’s where you iterate quickly and improve your prompt with small, controlled changes.

8-1. What to check when reviewing results

Did it follow the format? (list/table/JSON, etc.)
Anything missing? (required items, warnings, steps)
Is the length right? (if too long, tighten instructions or adjust max output tokens)
Is the tone right? (tune temperature and reinforce tone instructions)

8-2. How to re-run faster

Change only one line in the prompt or one option at a time and compare.
If formatting keeps drifting, lower temperature and enforce formatting rules more strongly.
If output cuts off, increase max output tokens or restructure (summary/steps/table).

Section 9

Run history · Restore/Manage

Run history is your time machine for experiments.
Prompt tuning is inherently iterative—using history well makes you dramatically faster.

9-1. Basic usage

Click [Restore to active panel] to restore the prompt/options from a past run into your current panel.
With Pro, [Restore to new panel] creates a new panel and restores the run there—perfect for side-by-side comparison.
When you run the same prompt with only option tweaks, history makes comparisons effortless.

9-2. Free vs Pro (at a glance)

Free: stores the most recent 20 runs (focused on basic restore)
Pro: more storage + search/filter + backup/restore and advanced management

Run history restore/manage — Example run history restore/manage screen

💡Asset features like history and backup become truly powerful in Pro—especially for long-term or operational usage.

Section 10

Multi-panel Playground · Side-by-side testing PRO

Multi-panel is one of Pro’s core features.
Starting from one baseline prompt, you can run variations across models, options, and prompt edits and compare results on a single screen.

10-1. Add & duplicate panels

[Add panel]: open an empty panel to start a new experiment
[Duplicate panel]: copy the current panel’s prompt/options as-is
With multiple panels, you can repeat “change one condition” experiments extremely quickly.

10-2. Independent settings per panel

Each panel can use a different model and options.
You can enable Web Search or Reasoning only on selected panels.
Outputs remain separated per panel for clean comparison.

Multi-panel Playground — Example multi-panel side-by-side testing screen

💡Tip: Assign roles like “baseline panel” vs “experiment panel” to make comparisons effortless.

Section 11

Template management · Project/Version structure PRO

Templates are a Pro-only feature designed to help you reuse, refine, and evolve your best prompts over time.

11-1. Organize by project

A project is the top-level container for a set of prompts.
Examples: “Blog automation”, “Manual summaries”, “Customer support replies”

11-2. Templates and versions

Each project can contain multiple templates.
Templates can be saved as versions to track changes over time.
You can roll back to older versions, or compare versions as you improve them.

Template management project/version — Example template management project/version screen

💡Tip: Don’t overwrite “the version that works.” Save it as a version—future you will thank you.

Section 12

Eval / Improve · Build an improvement loop PRO

With Pro, you don’t stop at “running a prompt.”
You can evaluate (Eval) outputs and continue directly into Improve, building a repeatable loop for better prompts.

Eval

Reviews outputs from multiple angles and summarizes strengths & weaknesses.
Provides actionable Next Actions for what to fix next.
Explains why the output happened (prompt structure, instruction hierarchy, output contracts, etc.).

Improve

Generates an improved prompt based on Eval findings.
Helps you compare before/after versions to find what works better.
Lets you turn improvements into reusable assets (templates/versions).

12-1. Where do I run it?

Run Eval directly from the Playground result view.
From the Eval view, continue into Improve with one click.
Combined with multi-panel, it becomes easy to compare variants using consistent evaluation criteria.

12-2. Recommended real-world flow

Run 2–3 variants side-by-side in multi-panel
Apply Eval to the most promising output and confirm what to improve
Generate an improved prompt with Improve and re-run immediately
Save the winning prompt as a template and keep versions
Save Eval results as a PDF to report, share, and archive experiments

Save Eval as PDF — Example: Save Eval results as a PDF

💡Eval/Improve is less about “finding the one correct answer” and more about building a reliable improvement loop. The more prompt assets you accumulate, the more valuable it becomes.

PRO

How Pro changes your workflow

Pro doesn’t just add features—it changes how you work.
It helps you experiment, evaluate, improve, and accumulate prompts as assets.

Multi-panel Playground · Compare prompts, models, and options side-by-side to discover what works best—faster.
Eval / Improve workflow · Use built-in evaluation tools to analyze outputs and iteratively refine prompts.
Prompt asset management · Organize prompts with projects, templates, and versions for long-term reuse.
Usage Dashboard analytics · Track calls, tokens, and cost—and analyze usage by project or template.
Extended history & backup · Store up to 200 runs, search experiments, and restore data after reinstall or PC changes.

💡With Pro, GPT Prompt Tester becomes more than a “run button.”
It starts to feel like a true prompt experimentation environment.

👉 For the full Pro workflow and examples, see the Pro guide.

Section 14

FAQ (options / speed / cost)

Q1. My results change every time. Is that normal?

Yes. Model outputs can vary because generation is probabilistic.
If you want more consistent results, lower temperature and make your format/constraints more explicit.

Q2. Responses feel slower. What should I check first?

The biggest drivers are usually Web Search/Reasoning and output length (max output tokens).
Try turning search/reasoning off first, and enable them only when needed.

Q3. My cost suddenly increased. Where do I check?

Use the Usage Dashboard to see cost by time range, model, and tools.
In particular, heavy use of Web Search/Reasoning can make cost increases noticeable.

For the most natural next steps, continue with: