Modern ML pipelines still often need pre-processed documents. This isn't changing anytime soon, in fact, the appetite is growing.
Unstructured.io is focused on extracting structured data from raw documents (pdf, pptx, html, etc). In the near term, we're more NLP-focused.
Check out Unstructured.io's open-source libraries!