MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Let's Talk About Raw Documents

Posted Mar 01, 2023 | Views 475
# Preprocessing API
# Unstrcutured.io
# NLP-focused
Share
speakers
avatar
Crag Wolfe
Infrastructure Team Lead @ Unstructured.io

Back End Engineer by trade including a decade at Red Hat. Previous 5 years at an NLP startup serving as the technical lead for a key product.

+ Read More
avatar
Ben Epstein
Founding Software Engineer @ Galileo

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.

+ Read More
SUMMARY

Modern ML pipelines still often need pre-processed documents. This isn't changing anytime soon, in fact, the appetite is growing.

Unstructured.io is focused on extracting structured data from raw documents (pdf, pptx, html, etc). In the near term, we're more NLP-focused.

Check out Unstructured.io's open-source libraries!

+ Read More

Watch More

Let's Continue Bundling into the Database
Posted Nov 08, 2022 | Views 574
# Large Language Models
# Database Bundling
# Feature Stores
# Trade-offs
# Square
# Squareup.com
All About Evaluating LLM Applications
Posted Sep 28, 2023 | Views 757
# Evaluation
# LLM Applications
# Exploding Gradients
Let's Build a Website in 10 minutes with GitHub Copilot
Posted Mar 06, 2024 | Views 534
# GitHub Copilot
# Automation
# GitHub