Apple’s instructions to its new Siri GenAI offering illustrate the GenAI challenge

Deep within Apple’s systems is a variety of instructions it has given to its GenAI Apple Intelligence mechanism. The screen captures of those instructions provide a peek into Apple’s efforts to influence its GenAI deployment, and also illustrate the steep challenges in controlling an algorithm that is simply trying to guess answers.

The more explicit and contained an instruction, the easier it is for GenAI to understand and obey it. Therefore, some of the Apple instructions, such as “You prefer to use clauses instead of complete sentences”, and “Please keep your summary of the input within a 10-word limit”, should work well, AI specialists said.

But other, more interpretable commands from the Apple screen captures, such as “Do not hallucinate. Do not make up factual information,” may not be nearly as effective.

“I have not had good luck telling it not to hallucinate. It’s not clear to me that it knows when it is hallucinating and when it is not. This thing isn’t sentient,” said Michael Finley, CTO at AnswerRocket. “What does work is to ask it to reflect on its work, or to use a second prompt in a chain to check the results of the first one. Asking it to double check results is common. This has a verifiably good impact on results.”

Finley was also baffled at a comment that told the system to “only output valid JSON and nothing else.”

“I am surprised that they told it to only use valid JSON. The model is either going to use it or not,” Finley said, adding it has no practical or meaningful way to assess validity. “The whole thing is really unsophisticated. I was surprised that this is what is at the heart.” He concluded that “it was kind of cobbled together. That is not necessarily a bad thing.” By that he meant that Apple developers were under pressure to move the software out quickly.

The instructions under scrutiny were for new GenAI capabilities being built into Apple’s Siri. The dataset Apple will be using is far larger than earlier efforts, which is why it will only be available on the latest devices with the strongest CPU horsepower as well as the most RAM.

“Apple’s models for Siri have been small until now. Using GPT — arguably some of the largest models — means new capabilities,” Finley said. “As parameter counts get bigger, models learn to do things that are more indirect. Small models can’t role-play, larger models can. Small models don’t know about deception, larger models do.”

Clyde Williamson, product security architect at Protegrity, was amused by how the existence in a public forum of the comments, which were presumably not intended to be seen by Apple customers, nicely illustrates the overall privacy/data security challenges within GenAI.

“This does highlight, though, the idea of how security in AI becomes a bit fuzzy. Anything we tell an AI, it might tell someone else,” Williamson said. “I don’t see any evidence that Apple tried to secure this prompt template, but it’s reasonable to expect that they didn’t intend for end-users to see the prompts. Unfortunately, LLMs are not good at keeping secrets.”

Another AI specialist, Rasa CTO Alan Nichol, applauded many of the comments. “It was very pragmatic and simple,” Nichol said, but added that “a model can’t know when it’s wrong.”

“These models produce plausible texts that sometimes overlap with the truth. And sometimes, by sheer accident and coincidence, it is correct,” Nichol said. “If you think about how these models are trained, they are trying to please the end-user, they are trying to think of what the user wants.”

Nichol liked many of the comments, though, noting, “The instructions to keep everything short, I always use comments like that,” because otherwise, LLMs tend to be “incredibly verbose and fluffy.”