MLOps Community
timezone
+00:00 GMT
Sign in or Join the community to continue

Let's Talk About Raw Documents

Posted Mar 01, 2023 | Views 311
# Preprocessing API
# Unstrcutured.io
# NLP-focused
Share
SPEAKERS
Crag Wolfe
Crag Wolfe
Crag Wolfe
Infrastructure Team Lead @ Unstructured.io

Back End Engineer by trade including a decade at Red Hat. Previous 5 years at an NLP startup serving as the technical lead for a key product.

+ Read More

Back End Engineer by trade including a decade at Red Hat. Previous 5 years at an NLP startup serving as the technical lead for a key product.

+ Read More
Ben Epstein
Ben Epstein
Ben Epstein
Founding Software Engineer @ Galileo

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.

+ Read More

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.

+ Read More
SUMMARY

Modern ML pipelines still often need pre-processed documents. This isn't changing anytime soon, in fact, the appetite is growing.

Unstructured.io is focused on extracting structured data from raw documents (pdf, pptx, html, etc). In the near term, we're more NLP-focused.

Check out Unstructured.io's open-source libraries!

+ Read More

Watch More

51:56
Posted Nov 08, 2022 | Views 445
# Large Language Models
# Database Bundling
# Feature Stores
# Trade-offs
# Square
# Squareup.com
50:40
Posted Sep 28, 2023 | Views 501
# Evaluation
# LLM Applications
# Exploding Gradients
13:00
Posted Mar 06, 2024 | Views 335
# GitHub Copilot
# Automation
# GitHub