In the rapidly evolving world of artificial intelligence, developers are constantly seeking ways to enhance their AI applications. Spring AI, a Java framework for building AI-powered applications, has introduced a powerful feature: the Spring AI Advisors.
The advisors can supercharge your AI applications, making them more modular, portable and easier to maintain.
If reading the post isn't convenient, you can listen to this experimental podcast, AI-generated from blog's content:
What are Spring AI Advisors?
At their core, Spring AI Advisors are components that intercept and potentially modify the flow of chat-completion requests and responses in your AI applications. The key player in this system is the AroundAdvisor, which allows developers to dynamically transform or utilize information within these interactions.
The main benefits of using Advisors include:
- Encapsulation of Recurring Tasks: Package common GenAI patterns into reusable units.
- Transformation: Augment data sent to Language Models (LLMs) and format responses sent back to clients.
- Portability: Create reusable transformation components that work across various models and use cases.
How Advisors Work
The Advisor system operates as a chain, with each Advisor in the sequence having the opportunity to process both the incoming request and the outgoing response. Here's a simplified flow:
- An
AdvisedRequest
is created from the user's prompt, along with an emptyadvisor-context
. - Each Advisor in the chain processes the request, potentially modifying it and then forwards the execution to the next advisor in the chain. Alternatively, it can choose to block the request by not making the call to invoke the next entity.
- The final Advisor sends the request to the Chat Model.
- The Chat Model's response is passed back through the advisor chain as an
AdvisedResponse
a combination of the originalChatResponse
and the advise context from the input path of the chain. - Each Advisor can process or modify the response.
- The augmented
ChatResponse
from the finalAdvisedResponse
is returned to the client.
Using Advisors
Spring AI comes with several pre-built Advisors to handle common scenarios and Gen AI patterns:
- MessageChatMemoryAdvisor, PromptChatMemoryAdvisor, and VectorStoreChatMemoryAdvisor: These manage conversation history in various ways.
- QuestionAnswerAdvisor: Implements the RAG (Retrieval-Augmented Generation) pattern for improved question-answering capabilities.
- SafeGuardAdvisor: Very basic, sensitive words based advisor, that helps prevent the model from generating harmful or inappropriate content. It demonstrates how to block a request by not making the call to invoke the next adviser in the chain. In this case, it's advisor's responsible for filling out the response or throw and error.
With the ChatClient API you can register the advisors needed for your pipeline:
var chatClient = ChatClient.builder(chatModel)
.defaultAdvisors(
new MessageChatMemoryAdvisor(chatMemory), // chat-memory advisor
new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()) // RAG advisor
)
.build();
String response = chatClient.prompt()
// Set chat memory parameters at runtime
.advisors(advisor -> advisor.param("chat_memory_conversation_id", "678")
.param("chat_memory_response_size", 100))
.user(userText)
.call()
.content();
Implementing Your Own Advisor
The Advisor API consists of CallAroundAdvisor and CallAroundAdvisorChain for non-streaming, and StreamAroundAdvisor and StreamAroundAdvisorChain for streaming scenarios.
It also includes AdvisedRequest to represent the unsealed Prompt request data, and AdvisedResponse for the chat completion data.
The AdvisedRequest and the AdvisedResponse have an advise-context
field, used to share state across the advisor chain.
Simple Logging Advisor
Creating a custom Advisor is straightforward. Let's implement a simple logging Advisor to demonstrate the process:
public class SimpleLoggerAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {
private static final Logger logger = LoggerFactory.getLogger(SimpleLoggerAdvisor.class);
@Override
public String getName() {
return this.getClass().getSimpleName();
}
@Override
public int getOrder() {
return 0;
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest advisedRequest, CallAroundAdvisorChain chain) {
logger.debug("BEFORE: {}", advisedRequest);
AdvisedResponse advisedResponse = chain.nextAroundCall(advisedRequest);
logger.debug("AFTER: {}", advisedResponse);
return advisedResponse;
}
@Override
public Flux<AdvisedResponse> aroundStream(AdvisedRequest advisedRequest, StreamAroundAdvisorChain chain) {
logger.debug("BEFORE: {}", advisedRequest);
Flux<AdvisedResponse> advisedResponses = chain.nextAroundStream(advisedRequest);
return new MessageAggregator().aggregateAdvisedResponse(advisedResponses,
advisedResponse -> logger.debug("AFTER: {}", advisedResponse));
}
}
This Advisor logs the request before it's processed and the response after it's received, providing valuable insights into the AI interaction process.
The aggregateAdvisedResponse(...)
utility aggregates a stream of AdviseResponse chunks into a single AdvisedResponse
. This method returns the original stream and accepts a Consumer<AdvisedResponse>
that is called once the aggregated AdvisedResponse is complete. This utility won't alter the original content or context.
Re-Reading (Re2) Advisor
Let's implement a more advanced Advisor based on the Re-Reading (Re2) technique, inspired by this paper, which can improve the reasoning capabilities of large language models:
public class ReReadingAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {
private static final String DEFAULT_USER_TEXT_ADVISE = """
{re2_input_query}
Read the question again: {re2_input_query}
""";
@Override
public String getName() {
return this.getClass().getSimpleName();
}
@Override
public int getOrder() {
return 0;
}
private AdvisedRequest before(AdvisedRequest advisedRequest) {
String inputQuery = advisedRequest.userText(); //original user query
Map<String, Object> params = new HashMap<>(advisedRequest.userParams());
params.put("re2_input_query", inputQuery);
return AdvisedRequest.from(advisedRequest)
.withUserText(DEFAULT_USER_TEXT_ADVISE)
.withUserParams(params)
.build();
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest advisedRequest, CallAroundAdvisorChain chain) {
return chain.nextAroundCall(before(advisedRequest));
}
@Override
public Flux<AdvisedResponse> aroundStream(AdvisedRequest advisedRequest, StreamAroundAdvisorChain chain) {
return chain.nextAroundStream(before(advisedRequest));
}
}
This Advisor modifies the input query to include a "re-reading" step, potentially improving the AI model's understanding and reasoning about the question.
Advanced Topics
Spring AI's advanced topics encompass important aspects of advisor management, including order control, state sharing, and streaming capabilities. Advisor execution order is determined by the getOrder() method. State sharing between advisors is enabled through a shared advise-context object, facilitating complex multi-advisor scenarios. The system supports both streaming and non-streaming advisors, allowing for processing of complete requests and responses or handling continuous data streams using reactive programming concepts.
Controlling Advisor Order
The order of Advisors in the chain is crucial and is determined by the getOrder()
method. Advisors with lower order values are executed first.
Because the advisor chain is a stack, the first advisor in the chain is the last to process the request and the first to process the response.
If you want to ensure that an advisor is executed last, set its order close to the Ordered.LOWEST_PRECEDENCE
value and vice versa to execute first set the order close to the Ordered.HIGHEST_PRECEDENCE
value.
If you have multiple advisors with the same order value, the order of execution is not guaranteed.
Using AdvisorContext for State Sharing
Both the AdvisedRequest
and the AdvisedResponse
share an advise-context object.
You can use the advise-context
to share state between the advisors in the chain, and build more complex processing scenarios that involve multiple advisors.
Streaming vs. Non-Streaming
Spring AI supports both streaming and non-streaming Advisors. Non-streaming Advisors work with complete requests and responses, while streaming Advisors handle continuous streams using reactive programming concepts (e.g., Flux for responses).
For streaming advisors, it's crucial to note that a single AdvisedResponse
instance represents only a chunk (i.e., part) of the entire Flux<AdvisedResponse>
response. In contrast, for non-streaming advisors, the AdvisedResponse
encompasses the complete response.
Best Practices
- Keep Advisors focused on specific tasks for better modularity.
- Use the
advise-context
to share state between Advisors when necessary. - Implement both streaming and non-streaming versions of your Advisor for maximum flexibility.
- Carefully consider the order of Advisors in your chain to ensure proper data flow.
Conclusion
Spring AI Advisors provide a powerful and flexible way to enhance your AI applications. By leveraging this API, you can create more sophisticated, reusable, and maintainable AI components. Whether you're implementing custom logic, managing conversation history, or improving model reasoning, Advisors offer a clean and efficient solution.
We encourage you to experiment with Spring AI Advisors in your projects and share your custom implementations with the community. The possibilities are endless, and your innovations could help shape the future of AI application development!
Happy coding, and may your AI applications be ever more intelligent and responsive!