Introduction
The capabilities of enormous language fashions (LLMs) are advancing quickly. They permit us to construct quite a lot of LLM purposes. These vary from process automation to workflow optimization. One thrilling software is utilizing LLMs to create an clever information digest or publication agent. This agent can pull in related content material, summarize it, and ship it in a custom-made format. It could work together dynamically with exterior instruments and information sources to fetch related data. On this article, allow us to discover ways to construct a information digest agent for a personalised every day information digest with LangGraph and exterior instruments like Information API.
Overview
- Perceive the structure of LangGraph and its key elements (State, Nodes, and Edges) to construct customizable workflow brokers.
- Discover ways to combine exterior APIs like NewsAPI to fetch real-time information for dynamic content material era in newsletters.
- Develop the talents to make use of LLMs for content material analysis by implementing a scoring system that ranks information articles based mostly on high quality standards.
- Acquire sensible information of automating e-mail supply with curated content material utilizing Python’s email-sending libraries.
Temporary About LangGraph
LangGraph is constructed on high of LangChain. LangGraph is a framework designed for constructing dynamic workflows that combine LLMs with customized logic and instruments. This enables for extremely custom-made and complicated workflows that mix a number of instruments and APIs.
LangGraph consists of three core elements:
- State: The State incorporates the information that’s shared all through the applying. It may be any Python information construction that may maintain the information. We will outline it utilizing a State object with completely different parameters. Alternatively, we will additionally use pre-built MessagesState which may comprise solely a listing of messages.
- Nodes: Nodes are features that may learn and modify the State. These features take the State as the primary argument to learn or write to the State. We even have a START node to indicate which node will take the consumer enter and be referred to as first and an END node to indicate the top of the graph.
- Edges: Edges outline the move of knowledge by means of completely different nodes. We even have conditional edges which use a operate to find out which node to go to subsequent. The benefit of LangGraph is that we will customise the agent in some ways. So, there might be a couple of technique to construct this agent.
As proven within the picture, edges join nodes, and nodes learn or write the information within the State.
Also Learn: Optimize Your Organisation’s Electronic mail Advertising and marketing with GenAI Brokers
Conditions
Earlier than we begin constructing the LLM agent, let’s make sure that now we have the required keys and passwords.
Accessing an LLM by way of API
Start by producing an API key for the LLM you’re utilizing. Create a textual content file with the title ‘.env’. Retailer this key securely in a .env file to maintain it non-public and simply accessible inside your mission.
Right here’s an instance of how a .env file appears to be like like
Fetching Information Information
To assemble information content material, we are going to use https://newsapi.org/. Join an API key and retailer it in the identical .env file for safe entry.
Sending the Electronic mail
To ship e-mail utilizing Python, we will allow ‘much less safe apps’ and retailer the Gmail password within the .env file. If that possibility is just not accessible, we will acquire entry to Gmail by following the steps talked about right here.
Libraries Required
Now we have used the next variations for the key libraries:
- langchain – 0.2.14
- langgraph – 0.2.14
- langchain-openai – 0.1.14
- newsapi-python – 0.2.7
Outline the Utility Circulation
The purpose is to question the agent utilizing pure language to collect information on a particular matter and get the publication by way of e-mail. To implement this move, we are going to first outline three instruments to deal with every key process after which construct the agent to name the LLM and instruments.
The three instruments are as follows:
- Fetching the Information: The Information API retrieves related information articles based mostly on the parsed question.
- Scoring the Information: The fetched articles are handed to a different LLM, which evaluates and scores them for high quality. The output is a listing of articles sorted by their high quality rating.
- Delivering the Information: The highest-scoring articles are formatted right into a well-readable e-mail and despatched to the consumer.
Now we will begin defining the features.
Get Information
Import the required libraries and cargo the .env file
import os
import json
import pandas as pd
from datetime import datetime, timedelta
from IPython.show import Picture, show
from typing import Record, Literal, Elective, TypedDict, Annotated
from langchain_core.instruments import software
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv('/.env')
# different to the .env file we will additionally use the .txt file as follows
with open('mykey.txt', 'r') as file:
openai_key = file.learn()
os.environ['OPENAI_API_KEY'] = openai_key
Provoke the news_api from NewsApiClient and API key
from newsapi import NewsApiClient
NEWS_API_KEY = os.environ['NEWS_API_KEY']
news_api = NewsApiClient(api_key=NEWS_API_KEY)
Now let’s outline the LangChain software utilizing the ‘software’ decorator from LangChain
@software
def get_news(question: str, past_days: int, domains: str):
"""
Get information on the given parameters like question, past_days, and many others.
Args:
question: search information about this matter
past_days: For what number of days prior to now ought to we search?
domains: search information in these assets
"""
at this time = datetime.at this time()
from_date = at this time - timedelta(days=past_days)
news_details = news_api.get_everything(q=question, from_param=from_date, domains=domains,
sort_by='relevancy')
return news_details
The agent may also kind the articles based mostly on relevancy. Right here’s an instance of how the output of this operate appears to be like like:
‘@software’ decorator is used to outline langchain software. Then we will bind this software to the LLM. Within the above operate, the doc string can also be vital. That’s what will get handed to the LLM as a immediate to have these arguments within the output of the tool-calling LLM.
# initialize the LLM
gpt = ChatOpenAI(mannequin="gpt-4o-mini", temperature=0)
# we will bind the software to the LLM in order that the LLM can return the software based mostly on the question.
gpt_with_tools = gpt.bind_tools([get_news])
Rating Information
The score_news operate processes information articles by scoring them based mostly on predefined standards. Then the operate returns a sorted listing of the highest-quality articles.
Import the required strategies
from langchain_core.pydantic_v1 import BaseModel, Area
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.messages import HumanMessage
Allow us to outline the operate
def score_news(news_details: dict):
"""
Calculate rating for news_articles and type them by the rating.
news_details: all of the information articles
"""
# entry the final message of the state for the articles.
# passing all of the articles to the LLM will enhance the fee.
# we will select to attain just some articles.
json_articles = json.masses(news_details['messages'][-1].content material)['articles']
if len(json_articles) > 15:
articles = json_articles[:15]
else:
articles = json_articles
# system immediate to information the LLM to attain the articles.
system_prompt = """
You're a information high quality evaluator.
I'll offer you a information article, with a title, description, and truncated content material and different particulars.
Analyze and rating the information article based mostly on the next standards:
Readability: How properly the article conveys the message in a concise and comprehensible method.
Scale: 1 (unclear) to 25 (very clear)
Credibility: Based mostly on the outline and different particulars supplied, how doubtless is the article to be credible and factually correct?
Scale: 1 (not credible) to 25 (extremely credible)
Engagement potential: How doubtless the article is to seize the reader's consideration or provoke additional thought.
Scale: 1 (not partaking) to 25 (very partaking)
Affect: How important or influential the article is when it comes to its potential societal, technological, or political penalties.
Scale: 1 (minimal impression) to 25 (excessive impression)
Present the entire rating out of 100 for the information article, including the scores for every of the above standards.
You'll be evaluating rather a lot information articles. So, rating them such that we will kind all of them later.
"""
prompt_template = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{news}")])
# outline pydantic class to get the output in a structured format.
class Information(BaseModel):
"""Information scoring system"""
total_score: int = Area(description='complete rating for the information article')
supply: str = Area(description="The supply of the information")
creator: Elective[str] = Area(default=None, description="The creator to the information")
title: str = Area(description="The title of the information")
description: str = Area(description="The outline to the information")
url: str = Area(description="The url of the information")
urlToImage: Elective[str] = Area(default=None, description="The picture url of the information")
# GPT 4o performs higher at scoring however extra pricey.
gpt_4o = ChatOpenAI(mannequin="gpt-4o", temperature=0)
structured_gpt = gpt_4o.with_structured_output(Information)
chain = prompt_template | structured_gpt
# ship every article to the LLM to get the rating with the opposite particulars.
outcomes = [chain.invoke({'news': article}).dict() for article in articles]
# kind the articles by complete rating.
df = pd.DataFrame(outcomes).sort_values(by='total_score', ascending=False)
return {"messages": [HumanMessage(content=df.to_dict(orient="records"))]}
The operate takes the state because the enter with the title as news_details. Because the state has all of the messages, we will entry the final message for the articles. We will select to attain just some articles from the highest to avoid wasting the prices. We will strive completely different system prompts to get the very best scoring system.
It’s simpler to course of the information if the output is in an outlined format. So, we will use LLM with structured output, the place the construction is outlined utilizing the Pydantic class.
Then we will rating every article and retailer them in a dataframe. As soon as we kind the articles utilizing the entire rating and add them as a message to the state.
Rationalization
1. Enter
The operate receives the state object as enter, which incorporates all messages. The most recent message from this state holds the information articles. To attenuate prices, as a substitute of scoring all articles, we will restrict the variety of articles.
2. Scoring Course of
We offer an in depth system immediate to the LLM, instructing it to attain every article based mostly on the standards given within the system immediate.
The LLM evaluates every article based mostly on the standards outlined within the system immediate and assigns a complete rating out of 100, including scores of every criterion.
3. Structured Output
To make sure the output is structured and simple to course of, we outline a Pydantic mannequin (Information). This mannequin consists of fields like `total_score`, `title`, `description`, and `url`. Through the use of this structured format, the LLM can return constant, well-organized outcomes.
4. LLM Integration
We use GPT-4o, recognized for its accuracy in structured duties, to attain the articles. It’s discovered that GPT-4o is healthier than GPT-4o-mini in ranking the articles. Every article is handed by means of the LLM, and the outcomes are transformed right into a dictionary format utilizing Pydantic.
5. Sorting and Output
After scoring all of the articles, we retailer them in a Pandas DataFrame, kind them by their `total_score` in descending order. Then we will return the sorted listing as a message to the State, prepared for use within the subsequent a part of the workflow.
Ship Electronic mail
The send_email operate takes a listing of sorted information articles, generates an HTML e-mail, and sends it to the recipient.
Import the libraries
import smtplib, ssl
import base64
import e-mail
outline the send_email operate
def send_email(sorted_news):
# get the sorted information from the final message of the state.
articles = sorted_news['messages'][-1].content material
# If the news_article has picture, we will show it within the e-mail.
news_items_html = ""
for article in articles[:10]:
if article['urlToImage'] is just not None:
news_items_html += f"""
<div class="news-item">
<img src="{article['urlToImage']}" alt="{article['title']}">
<div>
<h3><a href="{article['url']}">{article['title']}</a></h3>
<p>{article['description']}</p>
</div>
</div>
"""
else:
news_items_html += f"""
<div class="news-item">
<div>
<h3><a href="{article['url']}">{article['title']}</a></h3>
<p>{article['description']}</p>
</div>
</div>
"""
# CSS for styling the HTML message. we add the above 'news_items_html' right here.
html = f"""
<html>
<head>
<type>
physique {{
font-family: Arial, sans-serif;
background-color: #c4c4c4;
margin: 0;
padding: 0;
}}
.container {{
width: 80%;
max-width: 600px;
margin: 0 auto;
background-color: #ffffff;
padding: 20px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
}}
h1 {{
text-align: middle;
colour: #333;
}}
.news-item {{
show: flex;
align-items: middle;
justify-content: space-between;
border-bottom: 1px stable #eeeeee;
padding: 15px 0;
}}
.news-item h3 {{
margin: 0;
font-size: 16px;
colour: #007BFF;
margin-left: 5px;
}}
.news-item p {{
font-size: 14px;
colour: #666666;
margin: 5px 0;
margin-left: 5px;
}}
.news-item a {{
colour: #007BFF;
text-decoration: none;
}}
.news-item img {{
width: 100px;
peak: 100px;
object-fit: cowl;
border-radius: 8px;
}}
.footer {{
margin-top: 20px;
text-align: middle;
font-size: 12px;
colour: #999999;
}}
</type>
</head>
<physique>
<div class="container">
<h1>Curated Information</h1>
{news_items_html}
<div class="footer">
<p>That is your personalised publication.</p>
</div>
</div>
</physique>
</html>
"""
port = 465 # For SSL
sender_email = "[email protected]"
password = os.environ['GMAIL_PASSWORD']
context = ssl.create_default_context()
# add the content material for the e-mail
mail = e-mail.message.EmailMessage()
mail['To'] = "[email protected]"
mail['From'] = "[email protected]"
mail['Subject'] = "Information Digest"
mail.set_content(html, subtype="html")
with smtplib.SMTP_SSL("smtp.gmail.com", port, context=context) as server:
server.login(sender_email, password)
server.send_message(mail)
Rationalization
1. Extracting Sorted Information
The operate begins by accessing the sorted information articles from the final message within the State. We restrict the variety of articles displayed within the e-mail to the highest 10.
2. Producing HTML Content material
The operate dynamically constructs the HTML for every information article. If an article consists of a picture (`urlToImage`), the picture is embedded within the e-mail subsequent to the article’s title, hyperlink, and outline. In any other case, solely the title and outline are displayed. This HTML block (`news_items_html`) is generated utilizing a loop that processes every article.
3. HTML and CSS Styling
The HTML e-mail is styled utilizing embedded CSS to make sure a visually interesting format. The types cowl:
- Container: The primary e-mail content material is wrapped in a centered container with a white background and delicate shadow.
- Information Objects: Every information article is displayed with its title (as a clickable hyperlink), description, and optionally a picture. The format makes use of flexbox to align the picture and textual content facet by facet, with a border separating every information merchandise.
4. Composing the Electronic mail
The e-mail is about up utilizing Python’s `e-mail.message.EmailMessage` class. The HTML content material, topic line (“Information Digest”), sender, and recipient are specified. The HTML is included as the primary content material utilizing `mail.set_content(html, subtype=’html’)`.
5. Sending the Electronic mail
The operate makes use of Gmail’s SMTP server to ship the e-mail securely by way of SSL (port 465). The sender’s Gmail credentials are fetched from the surroundings variable `GMAIL_PASSWORD` to keep away from hardcoding delicate data. After logging into the SMTP server, the e-mail is distributed to the recipient.
Constructing the Agent
Allow us to construct the agent based mostly on the instruments and features outlined above.
Step 1. Defining features to name the fashions and instruments.
from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, MessagesState, START, END
# operate to name the mannequin which return the software based mostly on the question.
def call_model(state: MessagesState):
messages = state["messages"]
response = gpt_with_tools.invoke(messages)
return {"messages": [response]}
# if the final message from the above LLM is tool_calls then we return "instruments"
def call_tools(state: MessagesState) -> Literal["tools", END]:
messages = state["messages"]
last_message = messages[-1]
if last_message.tool_calls:
return "instruments"
return END
Step 2. Constructing the workflow graph. Now we will use all of the outlined features to construct the agent.
#create a software node with operate in order that we will use this within the graph.
get_news_tool = ToolNode([get_news])
workflow = StateGraph(MessagesState)
# We begin the agent from the call_model operate.
workflow.add_node("LLM", call_model)
workflow.add_edge(START, "LLM")
# Add the get_news_tool, which known as from the above LLM based mostly on the question.
workflow.add_node("instruments", get_news_tool)
workflow.add_conditional_edges("LLM", call_tools)
# then we hook up with the score_news operate from get_news operate
workflow.add_node("rating", score_news)
workflow.add_edge("instruments", "rating")
# then we hook up with the send_email operate from score_news operate
workflow.add_node("mail", send_email)
workflow.add_edge("rating", "mail")
# we will finish with the agent after sending the mail
workflow.add_edge("mail", END)
Step 3. Compiling the graph.
agent = workflow.compile()
show(Picture(agent.get_graph().draw_mermaid_png()))
Now we will name the agent with a question.
let’s use a question that has fewer information to print the outputs at every step of the agent.
question = "what is the information on Inidan cricket group prior to now month from cricinfo?"
# this question will go the START node.
inputs = {"messages": [("user", query)]}
async for chunk in agent.astream(inputs, stream_mode="values"):
chunk["messages"][-1].pretty_print()
The output will probably be within the format proven beneath. If no articles are returned, we will change the question.
As we will see, we begin with the question. The LLM will then name the software ‘get_news’. Then, the software returns all of the articles. The ‘score_news’ operate will then course of them and output a listing of articles with scores. Then ‘send_email’ operate sends the e-mail, although there isn’t any output within the state.
On this approach, we will question the agent about any matter and get an e-mail with curated information.
Conclusion
Constructing a publication agent utilizing LangGraph and LLMs presents a strong technique to automate information curation and supply. By combining real-time information, clever scoring, and personalised e-mail supply, this method streamlines the creation of custom-made newsletters, enhancing reader engagement and content material relevance effortlessly.
Ceaselessly Requested Questions
A. LangGraph is a framework for constructing dynamic workflows that combine massive language fashions (LLMs) with customized logic. It permits builders to outline workflows as graphs utilizing States, Nodes, and Edges, the place every Node represents a operate or process, and Edges outline the move of knowledge between these duties.
A. LangGraph consists of three core elements: State, which holds information shared throughout the applying; Nodes, which characterize particular person features that learn or modify the State; and Edges, which outline the move of knowledge between Nodes. Conditional Edges enable for versatile, decision-based workflows.
A. Sure, LangGraph can combine exterior APIs and instruments. You possibly can outline Nodes to deal with particular duties, akin to making API calls or interacting with third-party companies, after which use these Nodes throughout the workflow to create dynamic, real-time purposes.
A. LangGraph lets you outline conditional Edges, which use a operate to find out the following step within the workflow. This function makes it straightforward to deal with advanced, decision-based eventualities the place the move depends upon particular circumstances or consumer enter.