MLOps Community
+00:00 GMT

GenV: An Agentic Workflow for Actionable Insights from Google Meet Recordings

GenV: An Agentic Workflow for Actionable Insights from Google Meet Recordings
# GenV
# GenAI
# API

Transforming Google Meet Recordings into Actionable Insights with Agentic AI Workflows

May 12, 2025
Médéric Hurier
Médéric Hurier
GenV: An Agentic Workflow for Actionable Insights from Google Meet Recordings

Video meetings on platforms like Google Meet are essential for collaboration, but how often do crucial details get lost moments after the call ends? Decisions are made, action items are assigned, and valuable context is shared, only to become buried in recordings that are rarely revisited. Manually scrubbing through hours of video to find specific information is tedious and inefficient.

As someone deeply involved in AI, Generative AI, and MLOps, I saw another opportunity to apply these technologies to a common challenge. How can we efficiently extract the key takeaways, action items, and decisions from our meeting recordings without the manual effort?

This led to the creation of GenV (Generative AI for Video Analytics), a Python notebook designed as an agent to distill Google Meet recordings into structured, actionable insights. Similar to my previous BKFC project for Google Chat, the goal was to build a practical, focused agentic workflow using powerful, accessible tools like Google ColabGoogle Cloud Storage, the Vertex AI API, and specifically, Vertex AI’s Gemini models.


The Motivation: From Video Overload to Clear Takeaways

Google Meet recordings often contain a wealth of information:

  1. Detailed discussions and brainstorming sessions.
  2. Project updates and status reports.
  3. Decisions made and rationale explained.
  4. Action items assigned, sometimes with owners and deadlines.
  5. Technical details, configurations, or troubleshooting steps discussed.
  6. Feedback and suggestions shared.

Relying on memory or manual note-taking during meetings is often insufficient. Rewatching recordings is time-consuming, and note taking apps are too generic to capture specific contents and insights. GenV aims to automate the process of extracting structured knowledge from these video assets.


How GenV Works: An Agentic Workflow for Video

GenV follows a clear agentic process: Locate -> Prepare -> Analyze -> Report.

Setup & Authentication: Standard Google Cloud setup is required. This involves enabling the Vertex AI API in your GCP project and potentially creating a GCS bucket. The script uses google.colab.auth library for secure access. Google Drive access is also needed to locate the Meet recordings.

Locate & Prepare Data (Perception & Preparation): The agent first needs access to the video files.

  1. It mounts your Google Drive to access the specified path for Meet recordings (e.g., MyDrive/Meet Recordings).
  2. It identifies video files modified within a recent period (e.g., last 30 days, configurable via SINCE_DAYS).
  3. Crucially, because the Vertex AI API often works best with cloud storage, the script uploads these video files from Drive to a designated Google Cloud Storage (GCS) bucket if they don’t already exist there. This prepares the video for analysis.

Analyze Data (Reasoning & Action): This is where Vertex AI’s Gemini model performs the heavy lifting.

  1. Structured Output Definition: Like with BKFC, defining the desired output structure is key. We use Pydantic models to create a detailed schema, MeetingInsight, specifying exactly what information to extract. This includes fields for:
  2. title: Inferred meeting title.
  3. summary: A concise meeting summary.
  4. questions_answers: Pairs of questions and answers.
  5. unanswered_questions: Questions lacking clear answers.
  6. projects_discussed: Project updates and details.
  7. action_items: Tasks with potential owners and deadlines.
  8. decisions_suggestions: Key decisions or proposals.
  9. technical_insights: Technical details mentioned.
  10. key_topics_mentioned: Other important themes or keywords.
class MeetingInsight(pdt.BaseModel):
"""Structured insights extracted from a Google Meet recording analysis."""
title: T.Optional[str] = pdt.Field(
default=None,
description="The inferred title or primary subject of the meeting based on the discussion."
)
summary: T.Optional[str] = pdt.Field(
default=None,
description="A concise summary (3-5 sentences) of the main topics discussed and key outcomes of the meeting."
)
questions_answers: T.Optional[list[QuestionAnswer]] = pdt.Field(
default=None,
description="A list of significant questions asked and their corresponding answers from the meeting."
)
unanswered_questions: T.Optional[list[str]] = pdt.Field(
default=None,
description="A list of significant questions that were asked but do not appear to have been clearly answered during the meeting.."
)
projects_discussed: T.Optional[list[ProjectInfo]] = pdt.Field(
default=None,
description="A list of specific projects discussed, including status or key points."
)
action_items: T.Optional[list[ActionItem]] = pdt.Field(
default=None,
description="A list of specific tasks or action items assigned, including owner and deadline if specified."
)
decisions_suggestions: T.Optional[list[DecisionSuggestion]] = pdt.Field(
default=None,
description="A list of key decisions made or significant suggestions proposed during the meeting."
)
technical_insights: T.Optional[list[str]] = pdt.Field(
default=None,
description="Specific mentions related technical solutions, configurations, or code snippets discussed."
)
key_topics_mentioned: T.Optional[list[str]] = pdt.Field(
default=None,
description="A list of other important topics, keywords, or themes discussed that are not captured elsewhere (e.g., specific technologies, methodologies, upcoming events, general announcements)."
)


  1. API Call: For each video file (referenced by its GCS URI), the script sends a request to the Gemini model (gemini-2.0-flash or similar multimodal model). The key here is providing the video URI directly to the model, along with the prompt and, importantly, specifying the response_mime_type as application/json and providing the response_schema (our MeetingInsight Pydantic model). This multimodal capability combined with structured output is incredibly powerful.
  2. Parsing: The Gemini API directly returns JSON conforming to the schema, which the library automatically parses into our Pydantic MeetingInsight object.
# Simplified Gemini API Call Example
uri = f"gs://{blob.bucket.name}/{blob.name}" # GCS URI of the video
contents = [
GT.Part.from_uri(file_uri=uri, mime_type="video/mp4"),
prompt, # Your prompt guiding the analysis
]
response = genai_client.models.generate_content(
model=MODEL, # e.g., "gemini-2.0-flash"
contents=contents,
config={
"response_mime_type": "application/json",
"response_schema": MeetingInsight, # Providing the schema
"temperature": TEMPERATURE
},
)
insight = response.parsed # The structured Pydantic object

Report Results (Output): The final step presents the insights clearly.

  1. The script iterates through the MeetingInsight objects for each video.
  2. It generates readable Markdown summaries, organizing the extracted information under relevant headings (Summary, Q&A, Projects, Action Items, Decisions, etc.). See example below.
  3. It also saves the raw structured data to a jsonlines file for potential downstream use.

Example Output Snippet (from video_insights.txt):

# MLOps Community - Kubeflow

## Meeting Title

Kubeflow: The ML Toolkit for Kubernetes

## Summary

The speaker introduces Kubeflow as a toolkit for doing machine learning on top of Kubernetes. He outlines the agenda for the presentation, which includes components, architecture, ML workflow, Kubeflow pipelines, and a Q&A session. He emphasizes that Kubeflow is an open-source ML Ops platform based on Kubernetes, primarily using Python and YAML.

## Projects Discussed

* **Kubeflow:** Kubeflow is an open-source ML Ops platform based on Kubernetes, primarily using Python and YAML. It is a tool to power the develop platform with Kubernetes. The project has been sponsored by Google.

## Technical Insights

* Kubeflow uses Python and YAML.
* Kubeflow is based on Kubernetes.
* Kubeflow uses Docker.

## Key Topics

* MLOps
* Kubernetes
* Python
* YAML
* Docker
* Pipelines
* Components
* Architecture


The Value Proposition: Why Use GenV?

This focused agent provides tangible benefits:

  1. Rapid Meeting Summarization: Get the gist and key outcomes of meetings quickly without rewatching.
  2. Action Item Extraction: Reliably capture tasks, owners, and deadlines mentioned during calls.
  3. Decision Tracking: Easily reference key decisions made and suggestions proposed.
  4. Knowledge Retrieval: Quickly find specific technical details, project updates, or answers discussed.
  5. Identify Follow-ups: Surface unanswered questions needing attention.
  6. Efficiency Boost: Saves significant time compared to manual review or note-taking.
  7. Practical AI Demonstration: Showcases how multimodal models like Gemini combined with structured output can automate information extraction from video content.


Conclusion: Tapping into Video Knowledge with Focused AI

The GenV notebook demonstrates how agentic workflows powered by multimodal AI can unlock the valuable information trapped within video recordings. By leveraging the analytical power of Vertex AI’s Gemini models, guided by precise structured output schemas, we can transform hours of meeting video into concise, actionable summaries.

This highlights that sophisticated AI benefits don’t always require complex conversational agents. Focused, task-specific agents like GenV can automate laborious processes and make information more accessible. For those of us working in AI and MLOps, building such targeted solutions using cloud platforms like Google Cloud is increasingly feasible, allowing us to enhance productivity and knowledge sharing.

What other types of content could benefit from automated, structured insight extraction using Generative AI? Let me know your thoughts!




Dive in

Related

Blog
BKFC: An Agentic Workflow for Gathering Knowledge from Google Chat
By Médéric Hurier • Apr 29th, 2025 Views 60
Blog
Blueprints for the Agentic Era: Inside the Revolution Transforming Enterprise AI
By Robert Schwentker • May 13th, 2025 Views 1
55:58
video
Insights from Cleric: Building an Autonomous AI SRE
By Joselito Balleta • Feb 11th, 2025 Views 153
40:58
Blog
BKFC: An Agentic Workflow for Gathering Knowledge from Google Chat
By Médéric Hurier • Apr 29th, 2025 Views 60
55:58
video
Insights from Cleric: Building an Autonomous AI SRE
By Joselito Balleta • Feb 11th, 2025 Views 153
Blog
Blueprints for the Agentic Era: Inside the Revolution Transforming Enterprise AI
By Robert Schwentker • May 13th, 2025 Views 1