PyPI page
Home page
Author:
None
Summary:
A library that prepares raw documents for downstream ML tasks.
Latest version:
0.21.2
Required dependencies:
beautifulsoup4
|
charset-normalizer
|
emoji
|
filelock
|
filetype
|
html5lib
|
installer
|
langdetect
|
lxml
|
numba
|
numpy
|
paddlepaddle
|
psutil
|
pypandoc-binary
|
python-iso639
|
python-magic
|
python-oxmsg
|
rapidfuzz
|
regex
|
requests
|
spacy
|
torch
|
tqdm
|
typing-extensions
|
unstructured-client
|
unstructured-inference
|
unstructured-ingest
|
wrapt
Optional dependencies:
google-cloud-vision
|
markdown
|
msoffcrypto-tool
|
networkx
|
openpyxl
|
pandas
|
pdf2image
|
pdfminer-six
|
pi-heif
|
pikepdf
|
pypdf
|
python-docx
|
python-pptx
|
sentencepiece
|
tiktoken
|
transformers
|
unstructured-paddleocr
|
unstructured-pytesseract
|
xlrd
Downloads last day:
74,783
Downloads last week:
997,575
Downloads last month:
4,363,999