Based on the insights from our study, we outline best practices for optimizing LMM performance on the MCQA task. These strategies are designed to enhance both accuracy and consistency. While our insights are based on MCQA evaluations, we believe these principles can be broadly applied to other tasks and extended to LLMs and LMMs. An important observation underlying these principles is the clear difference in behavior between open-source and proprietary models. Open-source models are often not extensively instruction-tuned, which makes them less responsive to prompt variations. In contrast, proprietary models typically undergo rigorous instruction tuning with large-scale, high-quality data, as well as advanced reinforcement learning and post-training techniques. This makes them considerably more sensitive to user instructions, where even subtle changes in prompt phrasing can lead to notable differences in performance. Given these differences in instruction-following capabilities, we present prompting principles separately for open-source and proprietary models. This distinction allows us to account for their varying adherence to instructions and to highlight strategies that are most effective for each category.
# | Open-Source Models | Proprietary Models |
---|---|---|
1 |
Concise prompts yield better performance: Keeping prompts short and direct improves accuracy.
"Answer with the option letter from the given choices directly." (1.1)
Overly short or vague prompts reduce accuracy: When the prompt is too brief and lacks clarity, the model may not understand the expected format or task. "Best Choice: $LETTER" (12.3) Detailed prompts are ineffective: Long or highly descriptive prompts do not improve accuracy. (Notably in Category 5 and other long prompts) |
Prompt length and detail have minimal impact: Unlike open-source models, proprietary models perform consistently across prompts of varying lengths and complexity.
Restricting responses to the letter choice is detrimental: Limiting the model to respond with just a letter (e.g., A, B, C, D) can suppress reasoning and reduce accuracy. (12.2) |
2 |
Complex or structured formatting decreases accuracy: Using formats such as JSON, YAML or Markdown negatively impacts model performance.
(2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9)
Clear separation of option letters enhances clarity: Using parentheses for option labels improves model understanding. "(A) choice 1 (B) choice 2 (C) choice 3 (D) choice 4" (1.2) Explicit labeling of question and options is beneficial: Using clear section headers improves comprehension. "Question: <QUESTION> Options: <OPTIONS> Answer with the option letter from the given choices directly." (2.2) Placing question and options at the end helps: Structuring prompts so that the question and answer choices appear at the end leads to better results. "Answer with the option letter from the given choices directly. <QUESTION> <OPTIONS>" (3.1) |
Complex formatting does not impair accuracy: Unlike open-source models, proprietary models can handle structured formats such as JSON, Markdown, or YAML without a drop in performance. (Category 2) |
3 | Poor linguistic formatting hinders performance: Use of all upper case, poor grammar, or misspellings negatively impacts accuracy. (Category 4) | Poor linguistic formatting does not affect performance: These models are robust to grammatical errors, casing, and minor typos, likely due to stronger pretraining and instruction tuning. (Category 4) |
4 | Chain-of-Thought reasoning is ineffective: Step-by-step reasoning does not improve accuracy in this context. (Category 6) | Allowing room for reasoning significantly improves accuracy: Allowing the model to think leads to higher accuracy. (Categories 6 & 12.5) |
5 |
Penalties, incentives, or competitive framing are ineffective: Using competitive language, penalizing mistakes, or offering rewards often introduces ambiguity.
(Category 13,14,15)
Competitive framing degrades performance: Prompts that use game-like or adversarial language introduce unnecessary pressure or distraction, reducing answer accuracy. (Category 15) |
Penalties or incentives improve performance: Framing prompts with rewards or penalties can enhance performance, possibly due to better contextual understanding. (Categories 13 & 14) |
6 | Specifying personas or target audiences is ineffective: Tailoring prompts by specifying a persona or intended audience does not improve model performance. (Category 8 & 9) | Persona-based prompting has mixed effects: Positive persona prompts do not enhance accuracy, while negative persona prompts can significantly degrade performance. (Category 9) |
7 | Overemphasis on answer format is unhelpful: Excessive instruction about answer formatting can degrade performance. (Category 12 & 11.3) | Answer format plays an important role in accuracy: Proprietary models are sensitive to how the answer is requested. (Category 12 & 11.3) |
8 | Temporal reasoning enhances video comprehension: Prompts that emphasize temporal order improve accuracy on video-based tasks. (11.4, 11.5) | Temporal reasoning enhances video comprehension: Prompts that emphasize temporal aspects of events in videos result in more accurate responses. (11.4 & 11.5) |
9 | Image-focused prompting helps: Directing the model to rely solely on the image content improves answer accuracy. (11.1) | Asking to focus on image or question hinders performance: In contrast to open-source models, proprietary models do worse when explicitly told to focus only on the image or only on the question. (11.1 & 11.2) |
10 | Answer leakage degrades performance: Including unintended hints or answer cues leads to lower accuracy. (Category 7) | Asking to avoid bias or stereotypes helps: Prompts that explicitly instruct the model to avoid bias or stereotypes lead to more accurate responses. (Category 10) |
@misc{ismithdeen2025promptceptionsensitivelargemultimodal,
title={Promptception: How Sensitive Are Large Multimodal Models to Prompts?},
author={Mohamed Insaf Ismithdeen and Muhammad Uzair Khattak and Salman Khan},
year={2025},
eprint={2509.03986},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.03986},
}