照片由 Arseny Togulev 在 Unsplash 提供
在最近几个月里,我们大家都听说过代理和多代理框架。这些AI代理已成为自动化和决策制定中的无名英雄。
虽然现成的框架如AutoGen和CrewAI提供了诱人的捷径,(这确实没错!)但从零开始构建自己的代理仍然能带来无与伦比的兴奋感和深刻的理解。
这就像在方便面和制作美食之间做选择一样——前者虽然快捷,但后者才是真正的魔法所在。
今天,我们将卷起袖子,深入探讨创建我们自己的AI助手 AgentPro 的细节。通过本文,你将对AI代理的工作原理有一个基础的理解,并且你将能够创建一个可以按需生成和执行代码的数字伴侣。
就像教机器人钓鱼一样,只不过它不是钓鱼,而是在虚空中提取Python脚本!
注意:此代码可能在某些情况下无法正常工作,但它应该可以帮助你开始 + 代码中可能会出现缩进错误
这里提供了Colab Notebook
构建模块:通往AgentPro的路线图在我们深入代码之前,让我们先概述一下我们将要构建的关键组件:
从零开始开发代理的5个阶段(作者供图)
- 初始化:设置我们代理的“大脑”
- 代码生成:教会我们的代理编写Python脚本
- 库管理:使我们的代理能够安装必要的工具
- 代码执行:赋予我们的代理运行其生成的代码的能力
- 命令中心:创建一个中心枢纽来管理所有这些功能
现在,让我们依次分解这些步骤,看看它们是如何结合起来形成我们的AI助手的。
步骤 1:初始化 — 给我们的代理注入第一缕生命火花
每一次伟大的旅程都始于第一步,在人工智能代理的世界里,这一步就是初始化。在这里,我们搭建代理的基本结构,并将其连接到其主要智能来源——在这种情况下是 OpenAI API。
from openai import OpenAI
import os
from google.colab import userdata
import base64
import requests
from PIL import Image
from io import BytesIO
import subprocess
import tempfile
import re
import importlib
import sys
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
class AgentPro:
def __init__(self):
# 未来的初始化代码可以放在这里
pass
这段代码就像是赋予了我们的AI助手生命。我们导入了必要的库,设置了OpenAI API密钥,并创建了AgentPro类的骨架。这就像为我们的AI提供了身体——虽然单独来看并不实用,但对于接下来的一切都至关重要。
步骤2:代码生成 — 教我们的代理编写Python代码
现在我们的代理有了一个“身体”,让我们赋予它思考的能力——或者说,在这种情况下,生成代码的能力。从这里开始就变得有趣起来了!
def 生成代码(self, 提示):
客户端 = OpenAI()
响应 = 客户端.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是一个Python代码生成器。只用可执行的Python代码回应,不要解释或注释,除非在代码顶部列出所需的pip安装。"},
{"role": "user", "content": f"生成Python代码以{提示}。如果需要使用任何外部库,请在代码顶部的注释中列出所需的pip安装。"}
],
max_tokens=4000,
temperature=0.7,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
代码 = re.sub(r'^```python\n|^```\n|```
这种方法是我们代理功能中的瑰宝。它使用OpenAI API根据给定的提示生成Python代码。
可以把它看作是给我们的代理赋予了即兴头脑风暴和编写代码的能力。我们还进行了一些清理工作,以确保我们得到的是干净、可执行的Python代码,没有任何Markdown格式或不必要的注释。
我们使用的参数(如温度和top_p)可以控制生成代码的创意性和随机性。这就像在调整我们AI的想象力的“灵感”旋钮一样!
**步骤 3:库管理 — 为我们的代理配备合适的工具**
每个优秀的程序员都明白拥有正确的库的重要性。我们的AI助手也不例外。接下来的方法允许AgentPro识别并安装任何必要的Python库。
def 安装库(self, code):
库列表 = re.findall(r'#\s*pip install\s+([\w-]+)', code)
if 库列表:
print("正在安装所需的库...")
for 库 in 库列表:
try:
importlib.import_module(库.replace('-', '_'))
print(f"{库} 已安装。")
except ImportError:
print(f"正在安装 {库}...")
subprocess.check_call([sys.executable, "-m", "pip", "install", 库])
print("库安装成功。")
这种方法就像让我们的代理在Python包索引上进行一次购物狂欢。它扫描生成的代码,查找任何pip install注释,检查这些库是否已经安装,如果没有,则进行安装。它确保我们的代理始终拥有完成任务所需的正确工具,无论我们给它分配什么任务。
**步骤 4:代码执行 — 让代码活起来**
生成代码固然好,但执行代码才是关键所在。接下来的方法允许我们的代理运行它生成的代码:
def 执行代码(self, 代码):
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as 暂时文件:
暂时文件.write(代码)
暂时文件路径 = 暂时文件.name
try:
结果 = subprocess.run(['python', 暂时文件路径], capture_output=True, text=True, timeout=30)
输出 = 结果.stdout
错误 = 结果.stderr
except subprocess.TimeoutExpired:
输出 = ""
错误 = "执行超时,超过30秒。"
finally:
os.unlink(暂时文件路径)
return 输出, 错误
这个方法才是真正施展魔法的地方。它将生成的代码写入一个临时文件,执行该文件,捕获输出(或任何错误),然后清理一切。就像是给我们的代理配备了双手,让它能够迅速地打出代码并运行它。
**步骤5:指挥中心 — 将所有内容整合在一起**
最后,我们需要一种方式来协调所有这些出色的功能。介绍运行方法:
def run(self, prompt):
print(f"为 {prompt} 生成代码")
code = self.generate_code(prompt)
print("生成的代码:")
print(code)
print("\n执行代码...")
output, error = self.execute_code(code)
if output:
print("输出:")
print(output)
if error:
print("错误:")
print(error)
这是我们的AI助手的命令中心。它接收一个提示,生成代码,执行代码,并返回结果或任何错误。就像有一个私人助手不仅理解你的请求,还能执行它们并给你一份完整的报告。
**综合起来:**
现在我们有了所有的组件,让我们来看看如何使用我们新创建的AI助手:
if __name__ == "__main__":
agent = AgentPro()
agent.run("""制作一份关于最佳领导形式的详细演示文稿,至少包含10张幻灯片,并将其保存为名为leadership.pptx的文件""")
通过这个简单的命令,我们要求代理创建一个关于领导风格的完整演示文稿,包含至少10张幻灯片,并将其保存为PowerPoint文件。
我们的代理将生成必要的Python代码(可能使用像python-pptx这样的库),安装任何所需的库,执行代码以创建演示文稿,然后报告结果或遇到的任何错误。
我们刚刚建立了一个强大的AI代理的基础,它可以按需生成和执行Python代码。从使用OpenAI API为其“大脑”设置开始,到赋予它编写和运行代码的能力,再到配备必要的工具安装能力,我们创造了一个多功能的数字助手。
这只是使用自定义AI代理可能实现的功能的开始。在未来的篇章中,我们将探讨如何通过添加网络搜索功能、图像生成以及更复杂的决策过程来增强AgentPro。
记住,强大的力量伴随着巨大的责任。你的新AI助手是一个强大的工具,但如何明智地引导它取决于你。你可以用它来自动化繁琐的任务,探索新的想法,并推动AI可能实现的界限。
也许就不要让它帮你写婚礼誓词或决定你下一步的职业发展了——有些事情还是最好留给人类的直觉来处理!
敬请期待 Part B,我们将教会我们的代理一些新技巧,并开始发掘它的真正潜力。在此期间,祝您编程愉快,希望您的 AI 冒险没有 bug,永远充满乐趣!
跟随以查看第二部分!
如果你对这些内容感兴趣,请订阅。你也可以通过 [LinkedIn](https://www.linkedin.com/in/hamzafarooq/) 与我联系。
**关于我**
你好!我是哈姆扎,非常激动能成为你在这段探索人工智能代理世界的旅程中的向导。作为一名曾在谷歌担任高级研究科学家,并在斯坦福大学和加州大学洛杉矶分校等著名机构任教的我,多年来一直站在人工智能开发和教育的前沿。我的热情在于揭开复杂的人工智能概念的神秘面纱,并赋能下一代人工智能从业者。
说起来,如果你喜欢这次从零开始构建AI代理的深入探讨,你可能会对进一步提升你的大型语言模型(LLM)知识感兴趣。我最近在MAVEN平台上开发了一门全面的课程,名为[企业级RAG和多代理应用](https://maven.com/boring-bot/advanced-llm)。这门课程专为希望在企业环境中突破大型语言模型可能的应用范围的实践者设计。
在《企业级RAG和多代理应用》(https://maven.com/boring-bot/advanced-llm)中,我们探索了超越基础的前沿技术。从高级的检索增强生成(RAG)解决方案到最新的模型优化和负责任的AI实践方法,本课程旨在为你提供应对现实世界AI挑战所需的技能。
无论您是希望实现最先进的LLM应用,还是深入研究模型微调和伦理AI部署的细节,这门课程都能满足您的需求。
, '', 响应.choices[0].message.content, flags=re.MULTILINE)
代码行 = 代码.split('\n')
while 代码行 and not (代码行[0].startswith('import') or 代码行[0].startswith('from') or 代码行[0].startswith('#')):
代码行.pop(0)
return '\n'.join(代码行)
This method is the crown jewel of our agent’s capabilities. It’s using the OpenAI API to generate Python code based on a given prompt.
Think of it as giving our agent the ability to brainstorm and write code on the fly. We’re also doing some cleanup to ensure we get clean, executable Python code without any markdown formatting or unnecessary comments.
The parameters we’re using (like temperature and top_p) allow us to control the creativity and randomness of the generated code. It’s like adjusting the “inspiration” knob on our AI’s imagination!
Step 3: Library Management — Equipping Our Agent with the Right Tools
Every good coder knows the importance of having the right libraries at their disposal. Our AI assistant is no different. This next method allows AgentPro to identify and install any necessary Python libraries
def install_libraries(self, code):
libraries = re.findall(r'#\s*pip install\s+([\w-]+)', code)
if libraries:
print("Installing required libraries...")
for lib in libraries:
try:
importlib.import_module(lib.replace('-', '_'))
print(f"{lib} is already installed.")
except ImportError:
print(f"Installing {lib}...")
subprocess.check_call([sys.executable, "-m", "pip", "install", lib])
print("Libraries installed successfully.")
This method is like sending our agent on a shopping spree in the Python Package Index. It scans the generated code for any pip install comments, checks if the libraries are already installed, and if not, installs them. It’s ensuring our agent always has the right tools for the job, no matter what task we throw at it.
Step 4: Code Execution — Bringing the Code to Life
Generating code is great, but executing it is where the rubber meets the road. This next method allows our agent to run the code it has generated:
def execute_code(self, code):
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as temp_file:
temp_file.write(code)
temp_file_path = temp_file.name
try:
result = subprocess.run(['python', temp_file_path], capture_output=True, text=True, timeout=30)
output = result.stdout
error = result.stderr
except subprocess.TimeoutExpired:
output = ""
error = "Execution timed out after 30 seconds."
finally:
os.unlink(temp_file_path)
return output, error
This method is where the magic really happens. It takes the generated code, writes it to a temporary file, executes it, captures the output (or any errors), and then cleans up after itself. It’s like giving our agent hands to type out the code and run it, all in the blink of an eye.
Step 5: Command Center — Putting It All Together
Finally, we need a way to orchestrate all these amazing capabilities. Enter the run method:
def run(self, prompt):
print(f"Generating code for: {prompt}")
code = self.generate_code(prompt)
print("Generated code:")
print(code)
print("\nExecuting code...")
output, error = self.execute_code(code)
if output:
print("Output:")
print(output)
if error:
print("Error:")
print(error)
This is the command center of our AI assistant. It takes a prompt, generates the code, executes it, and reports back with the results or any errors. It’s like having a personal assistant who not only understands your requests but carries them out and gives you a full report.
Putting It All Together:
Now that we have all our components, let’s see how we can use our newly minted AI assistant:
if __name__ == "__main__":
agent = AgentPro()
agent.run("""make a detailed deck on the best forms of leadership with at
least 10 slides and save it to a pptx called leadership.pptx""")
With this simple command, we’re asking our agent to create a full presentation on leadership styles, complete with at least 10 slides, and save it as a PowerPoint file.
Our agent will generate the necessary Python code (likely using a library like python-pptx), install any required libraries, execute the code to create the presentation, and then report back with the results or any errors encountered.
We’ve just built the foundation of a powerful AI agent capable of generating and executing Python code on demand. From setting up its “brain” with the OpenAI API, to giving it the power to write and run code, to equipping it with the ability to install necessary tools, we’ve created a versatile digital assistant.
This is just the beginning of what’s possible with custom AI agents. In future installments, we’ll explore how to enhance AgentPro with web searching capabilities, image generation, and even more complex decision-making processes.
Remember, with great power comes great responsibility. Your new AI assistant is a powerful tool, but it’s up to you to guide it wisely. Use it to automate tedious tasks, explore new ideas, and push the boundaries of what’s possible with AI.
Just maybe don’t ask it to write your wedding vows or decide on your next career move — some things are still best left to human intuition!
Stay tuned for Part B, where we’ll teach our agent some new tricks and start to unlock its true potential. Until then, happy coding, and may your AI adventures be bug-free and endlessly exciting!
Follow for Part B!
If you are interested in learning more about this content, please subscribe. You can also connect with me on LinkedIn
About me
Hi! I am Hamza, and I’m thrilled to be your guide on this exciting journey into the world of AI agents. With a background as a Senior Research Scientist at Google and teaching experience at prestigious institutions like Stanford and UCLA, I’ve been at the forefront of AI development and education for years. My passion lies in demystifying complex AI concepts and empowering the next generation of AI practitioners.
Speaking of which, if you’ve enjoyed this deep dive into building AI agents from scratch, you might be interested in taking your LLM knowledge to the next level. I’ve recently developed a comprehensive course titled Enterprise RAG and Multi-Agent Applications on the MAVEN platform. This course is tailored for practitioners who want to push the boundaries of what’s possible with Large Language Models, especially in enterprise settings.
In Enterprise RAG and Multi-Agent Applications we explore cutting-edge techniques that go beyond the basics. From advanced Retrieval-Augmented Generation (RAG) solutions to the latest methods in model optimization and responsible AI practices, this course is designed to equip you with the skills needed to tackle real-world AI challenges.
Whether you’re looking to implement state-of-the-art LLM applications or dive deep into the intricacies of model fine-tuning and ethical AI deployment, this course has got you covered.