Building a AI function and the test problems. When we update input requests or change models, we mainly carry out manual spot checking, which feels risky. You wonder how others deal with it: Do you have systematic regression tests for quick changes? How do you catch the drops of performance when updating models? Any tools/workflows you would recommend? At the moment we only press our fingers and monitor the feedback from the user, but it feels that there should be a better way. What is your setup?
prompts·1 min read8.9.2025
How do you test AI prompt changes in production?
Source: Original