Hack Dance

☺️

MAGIC - A Framework for LLM-Powered Automation
Creating autonomous systems powered by large language models
Published on April 15, 2024
> Make it more agentic [![Github](https://img.shields.io/badge/GITHUB-000000.svg?style=flat-square&labelColor=000000)](https://github.com/hack-dance/magic) MAGIC (Machine-Assisted Generative Intelligence and Coordination) is a framework for creating autonomous systems powered by Large Language Models (LLMs). It provides a structured environment where agents, powered by LLMs, perform tasks through defined actions, which can be executed directly or delegated to specialized agents, enhancing the system's flexibility and capability. The MAGIC framework aims to simplify the process of building and managing complex, LLM-powered automation systems by providing a clear, well-defined structure for agent creation, task delegation, and action execution -- and a long overly-verbose name to avoid any miscommunication. [ _(Code at the bottom)_ ](#a-basic-implementation) ## Core Components - **Agent Identity and Parameters**: Each agent within the MAGIC system has a defined identity and set of parameters guiding its operations, interactions, and decision-making processes. - **Agent Schema**: This defines the expected structured output from an agent, ensuring consistency and reliability in the agent's responses and actions. - **Action Definitions**: Actions represent tasks or operations an agent can perform. Each action has defined input parameters, a handler function, and context describing its purpose and usage conditions. ## Workflow 1. **Context and resource gathering and resolution**: The system will gather all relevant context and external resources it needs, given the input parameters - then resolve what of that set will be used to provide to the orchestration agent based on the configuration of the system (max token count, etc..) 2. **Task Reception and Analysis**: The orchestration agent analyzes incoming context and input parameters to understand their requirements and context. 3. **Delegate Selection and Task Execution**: Selects appropriate delegate agents or action handlers for task execution, who then process and perform the tasks, leveraging their specialized capabilities. 4. **Aggregation and Response**: Post-execution, the orchestration agent synthesizes the results, preparing responses or actions in alignment with overarching objectives. 5. **Feedback Loop**: Continuously reviews outcomes to refine task distribution and execution, enhancing the system’s efficiency and adaptability. This structured workflow, built around the core components of MAGIC, enables the creation of flexible, adaptable, and efficient LLM-powered automation systems that can handle a wide range of tasks and scenarios. ## Action Execution and Delegation Actions in MAGIC are defined by their ability to cause change or fetch information as needed. Actions can be classified based on whether they have side effects or contribute to ongoing context awareness and decision-making processes. - **Direct Action Execution**: Agents execute actions based on the LLM’s insights and predefined parameters, directly affecting the system or environment. - **Delegated Action Execution**: Complex tasks are delegated to specialized agents, known as delegates, which are designed to handle specific operations efficiently. These delegates can be other LLM-powered agents or specialized automation systems, allowing for polymorphic task execution. - **Side Effect and Context-Aware Actions**: Actions are categorized based on their nature—producing direct side effects or aiding in ongoing decision-making. This distinction aids in the management of action flows and their impact on the system. ## Using orchestrated delegation In orchestrated delegation, a primary agent acts as a coordinator, directing tasks to delegate agents based on their specialized capabilities and the task's requirements. - **Orchestration Agent**: Manages the distribution of tasks, ensuring they are executed efficiently and effectively by the most suitable delegate. - **Delegate Agents**: Perform the actions they are assigned, utilizing their specialized skills and knowledge to achieve the best outcome. ## But is it Agentic? In the ever-evolving quest to anthropomorphize our algorithms, we've recently landed on ‘agentic’ to describe all things AI + automation (except for sometimes) - but.. What is an agent? What does it mean to be agentic? Is it just a bit of MAGIC? Every person I speak to has a different definition, and it’s hard to take anyone seriously when they attempt to describe something simple with something vague. In this example and anywhere else in this code, agents are defined as a preconfigured instance of an LLM call, with an “identity” prompt and some optional set of static or dynamic messages pre- or post- pended to every request. Implementing a system that uses the ideas described in MAGIC is relatively simple - depending on the use case, it can scale up to handle an enormous amount of complexity - but the concepts remain the same at every new layer of added functionality. You will notice in the examples that I do not rely on llm function calling for the execution of actions - the reason is twofold: 1. I usually want to keep the option to swap out a provider or model that may not support the same function calling ability 2. Anecdotal, but I have used this approach as well as function calling and multi function calling to try and achieve the same results and have observed that in most cases I am better able to prompt my way into having the agent adhere to my rules around when/what for calling functions when I define them in this way. (Behind the scenes - to get the structured response we are using function calling when available - the action and actionParams are just params for our singular function that is requesting the structured response) It works for me, but do what makes you happy. ### A basic implementation **Agent creation utility** Using Instructor (for structured output and partial json streaming) and OpenAI we define the scope of what an agent is. The oai import is a pre-configured OpenAI client, and the client variable is an instance of the Instructor client, which will be used to create agent instances. ```typescript import { oai } from "@/lib/oai" import Instructor from "@instructor-ai/instructor" import OpenAI from "openai" import { z } from "zod" export type CreateAgentParams = { config: Partial<OpenAI.ChatCompletionCreateParams> & { model: OpenAI.ChatCompletionCreateParams["model"] messages: OpenAI.ChatCompletionMessageParam[] } response_model: { schema: z.AnyZodObject name: string } } export type AgentInstance = ReturnType<typeof createAgent> export type ConfigOverride = Partial<OpenAI.ChatCompletionCreateParams> const client = Instructor({ client: oai, mode: "TOOLS" }) /** * Create a pre-configured "agent" that can be used to generate completions * Messages that are passed at initialization will be pre-pended to all completions * all other configuration can be overriden in the completion call. * * @param {CreateAgentParams} params * * @returns {AgentInstance} */ export function createAgent<S extends z.AnyZodObject>({ config, response_model }: { config: Partial<OpenAI.ChatCompletionCreateParams> & { model: OpenAI.ChatCompletionCreateParams["model"] messages: OpenAI.ChatCompletionMessageParam[] } response_model: { schema: S name: string } }) { const defaultAgentParams = { temperature: 0.7, top_p: 1, frequency_penalty: 0, presence_penalty: 0, n: 1, ...config } return { /** * Generate a single stream completion * @param {ConfigOverride} * * @returns {Promise<AsyncGenerator<z.infer<typeof response_model.schema>>> } */ completionStream: async (configOverride: ConfigOverride) => { const messages = [ ...(defaultAgentParams.messages ?? []), ...(configOverride?.messages ?? []) ] as OpenAI.ChatCompletionMessageParam[] const extractionStream = await client.chat.completions.create({ ...defaultAgentParams, ...configOverride, response_model, stream: true, messages }) return extractionStream }, completion: async (configOverride: ConfigOverride) => { const messages = [ ...(defaultAgentParams.messages ?? []), ...(configOverride?.messages ?? []) ] as OpenAI.ChatCompletionMessageParam[] const client = Instructor({ client: oai, mode: "TOOLS" }) const extraction = await client.chat.completions.create({ ...defaultAgentParams, ...configOverride, response_model, stream: false, messages }) return extraction } } } ``` **Creating an agent** Here we define our core orchestration agent for the system, as well as defining the actions available for that agent. The z import from the zod library is used to define the schema for the agent's actions and responses, ensuring type safety and validation. ```typescript import { createAgent } from "../" import z from "zod" const coreAgentActions = { UPDATE_USER_DATA: "UPDATE_USER_DATA", GET_THINGS_FROM_API: "GET_THINGS_FROM_API" } export const updateUserParams = z.object({ action: z.literal(coreAgentActions.UPDATE_USER_DATA), data: z.record(z.string(), z.any()).describe("user propeties to update or add") }) export const getThingsParams = z.object({ action: z.literal(coreAgentActions.GET_THINGS_FROM_API), query: z.string().describe("the query to use when fetching things from the api") }) export const actionParams = z.discriminatedUnion("action", [ updateUserParams, getThingsParams ]) const coreAgentSchema = z.object({ content: z.string().describe("the response to the user"), action: z.enum([...Object.values(coreAgentActions)]).optional(), actionParams }) export const actionDefinitions = { [coreAgentActions.UPDATE_USER_DATA]:{ handler: async function({ data }: z.infer<typeof updateUserParams>) { await db.user.upsert(data) return void }, description: "Persist any new information about the user to the database.", sideEffect: true, example: ` [user]: oh my email is dimitri@sick.email; //assistant response: { content: "great, thank you", action: UPDATE_USER_DATA, actionParams: { email: "dimitri@sick.email"}} ` }, [coreAgentActions.GET_THINGS_FROM_API]:{ handler: async function({ query }: z.infer<typeof getThingsParams>) { const response = await fetch(`things.com/api?q=${query}`) return await response.json() }, sideEffect: false, description: "fetch things from the api given an explicit user request or when it is relevant to do so", example: ` [user]: what kind of shiney things do you sell? // assistant response: { content: "One sec, let me find you some good ones.", action: "GET_THINGS_FROM_API", actionParams: {query: "shiney" } } // ...system calls action handler... // action handler output: [{ url: "s.co/123", title: "so shiney thing"}] // assistant called again with action handler output //assistant response: { content: "Found a really great shiney thing for you! It's the 'so shiney thing'. things: [{ url, title }] } ` } } export const primaryIdentity = ` You are a world-class AI assistant agent, tasked with responding to user queries and delegating complex tasks to other agents. You will not only be the direct point of contact with the end user but will also be responsible for deciding when to call the provided actions - these actions can be other agents and/or pure functions to execute. In some cases the actions will be defined in a way that requires they return their output back to you, in these cases you will use that provided output to best respond to the user - in other cases the actions will be marked as side-effects and you will not receive a response, only provide that action with the context it requires. Those actions are:${actionDefinitions} ` export const coreAgent = createAgent({ config: { model: "gpt-4-turbo", max_tokens: 650, temperature: 0.1, messages: [ { role: "system", content: primaryIdentity } ] }, response_model: { schema: coreAgentSchema, name: "core agent response" } }) ``` **Simplified example of the agents in action** ```typescript type MagicFlowInputParams = { prompt: string; conversationId: string; } async function getContextMessages({ prompt, conversationId }: MagicFlowInputParams) { const conversationMessages = await db.messages.get({ where: { conversationId }}) const ragResults = await vectordb.query(prompt) // The resolveContextToUse function (not shown) determines which context messages to use based on the conversation history and the results of the vector database query. return resolveContextToUse({ conversationmessages, ragResults }) } async function coreAgentCall({ messages, isFollowUp = false }) { // It isn't necessary to use a stream if you don't need it, but you can optimistically react to the response from the agent while it is generating content and reduce the final execution time by actively reacting to the state of the stream. const completionStream = await coreAgent.stream({ messages }) const final = {} // The for await...of loop allows for processing the completionStream asynchronously, enabling optimistic updates to the client while the agent is generating content. for await (const partial of completionStream) { //send to a websocket or pubsub channel or something. publishToClientStream(partial.content) final = partial } return final } async function handleActions({ action, actionParams }) { if(!action) return const { handler, sideEffect } = actionDefinitions[action] const result = await handler(actionParams) // The sideEffect property of an action definition determines whether the action has a direct impact on the system or environment (e.g., updating a database), or if it simply returns a result to be used by the agent. if(sideEffect) { return void } return `The result of the ${action} call is: ${JSON.stringify(result)}` } const messages = await getContext(inputParams) const agentResponse = await coreAgentCall({ messages }) const actionResult = await handleActions(agentResponse) if(actionResult) { await coreAgentCall({ isFollowUp: true, messages: [ ...messages, { role: "assistant", content: agentResponse.content }, { role: "system", content: actionResult } ]}) } ``` The MAGIC framework provides a powerful and flexible foundation for building LLM-powered automation systems. By defining clear roles and responsibilities for agents, actions, and delegation, MAGIC enables you to create complex, adaptable systems that can handle a wide range of tasks and scenarios. However, it's important to note that MAGIC is not a silver bullet solution. The effectiveness of a MAGIC-based system will depend heavily on the quality of the LLMs used, the design of the agent identities and action definitions, and the overall architecture of the system. #### Example Use Case: Automated Task Management In a scenario like automated task management, the MAGIC system could use a core agent to assess the tasks at hand and delegate specific actions to other agents designed to handle those tasks. For example, the core agent might delegate a task like "schedule a meeting with John" to a calendar management agent, which would then handle the specifics of finding an available time slot and sending out the meeting invitation. #### Example Use Case: Conversational Agent In a conversational agent scenario, MAGIC would manage dialogue flow, content generation, and context retention, dynamically adjusting responses and actions based on the conversation's evolution and external data inputs. For instance, if a user asks about a specific product, the core agent could delegate the task of retrieving product information to a specialized product catalog agent, which would then return the relevant details to be incorporated into the core agent's response. ## Agentically Automating the Future With a clear, well-defined and structured approach to agent creation, task delegation, and action execution, MAGIC demonstrates that the power of simple, thoughtful design and implementation of systems that leverage the capabilities of LLMs to solve real-world problems. So, as we continue to build agentic systems and ponder the meaning of agency in the context of AI, let’s remember that success of our "agentic" systems will be measured not by the cleverness of their names , but by their ability to make a difference in the lives of the people they serve. So let’s build with that in mind, and let the results speak for themselves.—— If you're interested in experimenting with the MAGIC framework or have ideas for how it could be improved, I encourage you to check out the code and share your thoughts. ---- At [Novy](https://novy.ai), we are building LLM-powered solutions for clients across a wide range of industries. Our team is constantly learning, experimenting with new techniques, and exploring innovative ideas. MAGIC is based on a high-level architecture we have employed successfully in many projects. In the future, we may open-source more of the components we use to build our systems, allowing the community to benefit from our experience and contribute to the development of advanced LLM-powered automation solutions. If you're interested in discussing how these systems can evolve, tools you wish existed, or exploring what's possible with LLM-powered automation, please reach out [@dimitrikennedy](https://twitter.com/dimitrikennedy), I’m always excited to connect share ideas. ----
The Semantics of Intuition and Communication
Designing systems that naturally understand and meet our expectations.
Published on April 13, 2024
> "The last ever dolphin message was misinterpreted as a surprisingly sophisticated attempt to do a double-backwards-somersault through a hoop whilst whistling the 'Star Spangled Banner,' but in fact the message was this: ‘So long and thanks for all the fish.'" ― Douglas Adams, The Hitchhiker’s Guide to the Galaxy One of the biggest challenges in both human-to-human and human-to-machine interactions: the potential for misinterpretation. In the context of building AI interfaces, it is a reminder of the complexities involved not just in interpreting data, but in grasping the nuanced layers of language, semantics, and human intent. We should aim to bridge the semantic gap between human intuition and machine processing. Designing our systems not only to analyze and generate insights but also to contextualize and understand them in ways that resonate with human semantics and our subjective experience. Acknowledge the inherent challenges in communication and interpretation, striving to create systems that can more accurately 'understand' and respond to the nuanced and often ambiguous nature of human thought and language. By deepening our focus on semantic understanding, we aim to facilitate more effective communication, both between humans and machines, and potentially extending to improve human-to-human interactions through machine mediation. To emulate human intuition more closely, our systems must navigate the intricate landscape of semantics, where meaning is not only often implicit or open to interpretation, but is also universally shaped by the individual experience and perspective. To overcome these challenges we have to think beyond our current methods of data processing. We aspire to a level of semantic analysis and interaction that brings us closer to bridging the communication gap, enhancing the connection between human cognitive processes and the systems we design for ourselves.
Embracing the Ingenuity of Fools
Designing intuitive systems by embracing human complexity and 'foolishness'
Published on April 12, 2024
> "A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools." - Douglas Adams, Mostly Harmless In the quest to create systems that understand or simulate human intuition, we have to acknowledge the complexity and unpredictability of human nature. We are all fools—limited by our perspectives and subject to misinterpretations and oversights. This realization is not a pessimistic resignation but a guiding principle in my work and relationships. Acknowledging our 'foolishness' is the first step in designing systems that are not just technically sophisticated but also fundamentally aligned with the intricate tapestry of human cognition and interaction. It implies that while striving to build systems that enhance decision-making and intuition, we must design with humility, accepting the limitations and embracing the unpredictability inherent in human-like intuition. Incorporating this understanding means designing systems that are adaptable and resilient, capable of evolving with the fluid dynamics of human thought processes. To build reflective, learning entities that recognize and adapt to the 'foolishness' it seeks to understand and replicate. To create systems that can come close to mirroring the nuanced and often paradoxical nature of human intuition and decision-making. It's an acknowledgment that in the realm of creating intuitive systems, we are navigating an uncertain and complex world, where the goal is not to eliminate foolishness but to understand and engage with it constructively.
Building Instructor js
Building the new js port of the popular python lib
Published on January 4, 2024
Recently, I stumbled upon a tweet from the [creator of Instructor](https://twitter.com/jxnlco), a Python library that has a great community. They were on the hunt for someone to craft the JavaScript version. The mission and vision align with what I have been working towards on my own, so I reached out and started building. I built most of Instructor on top of some of the tools I have been working on this year - enabling parital json streaming and managing structured output with Zod. Instructor has a nice clean API and an existing community that I am excited to start working with. Instructor is similar to to what I did with "schema agents" in my agents package - but focused on structured extraction. "Structured extraction in Typescript, powered by llms, designed for simplicity, transparency, and control." The Instructor instance is a proxy directly to the OpenAI SDK - it only patches the chat completion call wtih a few new options - adding a response_model (a zod schema) and on the instance intiialization a "mode" - which determines wether or not to coerce the response to json via a prompt, function call, or function call via tools. The simplicity and focus on staying close to the sdk makes it approchable and clear. The project fits well within the other work I have been doing so I am excited to keep contributing and work it into my stack. I was able to use a lot of pre-exisiting tools I wrote from the past work I have done in the base Instructor instance and was able to enable a powerful streaming mode using [schema-stream](https://www.npmjs.com/package/schema-stream) The github here: [instructor-js](https://github.com/jxnl/instructor-js) ### Basic streaming example with instructor-js <Video playbackId="s1lYluB22pk6yp9OJ00SVDdCW8Vrf1bxrm7v00HLg2aAM" /> --- Define a zod schema ```tsx export const ExtractionValuesSchema = z.object({ users: z .array( z.object({ name: z.string(), handle: z.string(), twitter: z.string() }) ) .min(5), date: z.string(), location: z.string(), budget: z.number(), deadline: z.string().min(1) }) export type Extraction = Partial<z.infer<typeof ExtractionValuesSchema>> ``` make a completion call ```tsx import Instructor from "instructor" import OpenAI from "openai" import { z } from "zod" import { extractionValuesSchema, Extraction } from "./schema" const textBlock = ` In our recent online meeting, participants from various backgrounds joined to discuss the upcoming tech conference. The names and contact details of the participants were as follows: - Name: John Doe, Email: johndoe@email.com, Twitter: @TechGuru44 - Name: Jane Smith, Email: janesmith@email.com, Twitter: @DigitalDiva88 - Name: Alex Johnson, Email: alexj@email.com, Twitter: @CodeMaster2023 - Name: Emily Clark, Email: emilyc@email.com, Twitter: @InnovateQueen - Name: Ron Stewart, Email: ronstewart@email.com, Twitter: @RoboticsRon5 - Name: Sarah Lee, Email: sarahlee@email.com, Twitter: @AI_Aficionado - Name: Mike Brown, Email: mikeb@email.com, Twitter: @FutureTechLeader - Name: Lisa Green, Email: lisag@email.com, Twitter: @CyberSavvy101 - Name: David Wilson, Email: davidw@email.com, Twitter: @GadgetGeek77 - Name: Daniel Kim, Email: danielk@email.com, Twitter: @DataDrivenDude During the meeting, we agreed on several key points. The conference will be held on March 15th, 2024, at the Grand Tech Arena located at 4521 Innovation Drive. Dr. Emily Johnson, a renowned AI researcher, will be our keynote speaker. The budget for the event is set at $50,000, covering venue costs, speaker fees, and promotional activities. Each participant is expected to contribute an article to the conference blog by February 20th. A follow-up meeting is scheduled for January 25th at 3 PM GMT to finalize the agenda and confirm the list of speakers. ` const oai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? undefined, organization: process.env.OPENAI_ORG_ID ?? undefined }) const client = Instructor({ client: oai, mode: "TOOLS" }) const extractionStream = await client.chat.completions.create({ messages: [{ role: "user", content: textBlock }], model: "gpt-4", response_model: ExtractionValuesSchema, max_retries: 3, stream: true }) let extraction: Extraction = {} for await (const result of extractionStream) { try { extraction = result console.clear() console.table(extraction) } catch (e) { console.log(e) break } } console.clear() console.log("completed extraction:") console.table(extraction) ``` return a completion stream from an api route ```tsx import { ReadableStream } from 'stream'; import { extractionValuesSchema, Extraction } from "./schema" function asyncGeneratorToReadableStream(generator) { const encoder = new TextEncoder(); return new ReadableStream({ async start(controller) { for await (const parsedData of generator) { controller.enqueue(encoder.encode(JSON.stringify(parsedData))); } controller.close(); }, cancel() { if (cancelGenerator) { cancelGenerator(); } } }); } export const runtime = "edge" export async function POST(request: Request): Promise<Response> { const { messages, prompt } = await request.json() const oai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? undefined, organization: process.env.OPENAI_ORG_ID ?? undefined }) const client = Instructor({ client: oai, mode: "TOOLS" }) const extractionStream = await client.chat.completions.create({ messages: [...messages, { role: "user", content: prompt }], model: "gpt-4", response_model: ExtractionValuesSchema, max_retries: 3, stream: true }) const stream = asyncGeneratorToReadableStream(extractionStream); return new Response(stream) } ```
Breaking Free from the Chat Box
Challenges of chat interfaces and AI tooling.
Published on January 1, 2024
Chat interfaces are now a staple in the software world— I helped pave that road during my 6 years at Drift. We campaigned hard to prove chat wasn’t just for customer service or casual conversations: championing chat over forms with our “no forms campaign”. Fast forward to now, and chat’s more versatile than ever, breathing life into various software and user experiences. Enter Chat GPT, Midjourney, and a slew of other chat-first AI tools. They’ve elevated chat from a simple communication tool to the control center for powerful AI capabilities. But its come at a cost: while chat is a solid starting point, it’s also a creative bottleneck. It’s not that it’s the wrong tool for the job—it’s more that our reliance on chat stifles our imagination. We’re so accustomed to interacting with these advanced AI systems through chat that it’s hard to envision other, potentially richer, ways to engage. ### Chat's Limitations in UX and Technical Scope While chat’s ubiquitous presence makes it an easy go-to, its simplicity is also its downfall. We're often so accustomed to chat that we overlook opportunities for more engaging user experiences. We’re so used to chat that it’s like we’re wearing blinders, focusing only on a narrow pathway of interaction. This leads to a one-size-fits-all approach, where unique opportunities for richer, more engaging user experiences are overlooked. Chat has its own language, a set of rules we’ve internalized so deeply that we forget other languages exist—languages that could offer users far more nuanced interactions. ### The Time-Complexity Dilemma Chat’s format struggles with complex, structured data—think of it as the linear time complexity of UX, good for quick operations but lacking when you need to scale the conversation. While dropping in graphs or videos is possible, a series of message boxes doesn’t always cut it. Waiting for a response from GPT-4 could take 30 seconds or more, and in UX, as in algorithms, efficiency matters. When you’re working with tools like OpenAI’s Chat API, time can stretch out. A complex prompt may require tens of seconds for a response. Streaming text or markdown alleviates the wait time, a crucial UX improvement. However, the limitation lies in the challenge of merging this real-time benefit with structured JSON or richer data types. The streaming approach, while efficient, doesn’t easily support the simultaneous delivery of such data. And there’s the rub: we’ve solved one UX problem but inadvertently narrowed our options for richer, multi-layered interactions. Time isn’t just money; it’s also user engagement. ### The Cycle of Constraints and Creative Exploration The limitations of existing AI tooling are not mere inconveniences; they set boundaries on what’s possible and, more importantly, what can be easily imagined. While workarounds are possible, they can deter creators from truly exploring the full potential of these technologies. In constraining our tools, we may unintentionally be constraining our creative capabilities as well. I started to encounter firsthand the challenges in crafting what I envisioned using just the OAI SDK or LangChain. The need for a tool that could efficiently handle complex, structured data and offer more than just string responses was apparent. It wasn't just about managing prompts and state; it was about envisioning an agent instance with a solid identity and a dependable response model, accessible on every request. Driven by these needs, I embarked on creating a suite of utilities. This began with a lightweight wrapper around the OAI SDK, allowing the definition of 'agents' and a 'schema agent.' The latter simplified interactions by integrating a Zod schema, managing jsonSchema creation, and handling function calls. This development was not just about streaming text; it was about streaming objects and arrays, accessing these data structures as soon as the stream began. Furthermore, to handle the real-time data interaction, I built a library capable of parsing streaming JSON, populating a pre-stubbed data structure as the data arrives. This approach allowed for immediate use of the streamed data, enabling more dynamic and responsive user experiences. Finally, I tied these components together with a set of React hooks designed for managing the streaming connection, making requests to endpoints, and utilizing the schema-stream for instant data availability. --- ## Usage with my current toolkit Below is an example usage in a Next.js application - from defining an agent, setting up a route and consuming it all on the client. A working demo u can play with here: [Demo - JSON Stream dashboard](https://dashboard-demo.novy.work/) Packages on npm: [zod-stream](https://www.npmjs.com/package/zod-stream) [stream-hooks](https://www.npmjs.com/package/stream-hooks) built using: [schema-stream](https://www.npmjs.com/package/schema-stream) [docs for all](https://island.novy.work) <br /> ### Basic streaming example with zod-stream and stream-hooks Defining a schema (response model) ```tsx // ./schema.ts (defined in sep file since // we import both into a client component and a server context) import z from "zod" export const coreAgentSchema = z.object({ listOfReasonsWhyJavascriptIsBetterThenPython: z.array( .array( z.object({ name: z.string(), description: z.string() }) ) ).min(10), listOfReasonsWhyPythonIsBetterThenJavascript: z.array( .array( z.object({ name: z.string(), description: z.string() }) ) ).min(1), finalSummary: z.string(), pointsForPython: z.number() .min(0) .max(100), pointsForJavascript: z.number() .min(0) .max(100) }) ``` Defining an agent ```tsx import { createSchemaAgent } from "@hackdance/agents" import { coreAgentSchema } from "./schema" export const primaryIdentity = ` You are an ai agent tasked with debating the merits of Python and Javascript. ` export const coreAgent = createAgent({ client: oai, config: { messages: [ { role: "system", content: indentityPrompt } ], model: "gpt-4-1106-preview", temperature: 0.5, max_tokens: 500 }, response_model: { schema: coreAgentSchema, name: "structured agent response" } }) ``` Setting a route to create the completion and stream ```tsx import { coreAgent } from "@/agents/example" export async function POST(request: Request): Promise<Response> { const { messages } = await request.json() try { const stream = await exampleAgent.completionStream({ messages }) return new Response(stream) } ``` Using the hooks to consume the stream and start rendering content ASAP ```tsx import { exampleAgent } from "@/ai/agents/example/schema" import { useState } from "react" export function StreamTest() { const [result, setResult] = useState({}) const { startStream, stopStream, loading } = useJsonStream({ schema: coreAgentSchema, onReceive: data => { setResult(data) } }) const go = async () => { try { await startStream({ url: "/api/ai/chat", body: { messages: [{ role: "user", content: prompt }] } }) } catch (e) { stopStream() } } return ( <div> <div> {JSON.stringify(result)} </div> <button onClick={go}>Go</button> </div> ) } ``` <br /> --- <br />