You can load any pdf file from your local file system or through a URL.

Usage

Load from a local file

from embedchain import App
app = App()
app.add('/path/to/file.pdf', data_type='pdf_file')

Load from URL

from embedchain import App
app = App()
app.add('https://arxiv.org/pdf/1706.03762.pdf', data_type='pdf_file')
app.query("What is the paper 'attention is all you need' about?", citations=True)
# Answer: The paper "Attention Is All You Need" proposes a new network architecture called the Transformer, which is based solely on attention mechanisms. It suggests that complex recurrent or convolutional neural networks can be replaced with a simpler architecture that connects the encoder and decoder through attention. The paper discusses how this approach can improve sequence transduction models, such as neural machine translation.
# Contexts:
# [
#     (
#         'Provided proper attribution is ...',
#         {
#             'page': 0,
#             'url': 'https://arxiv.org/pdf/1706.03762.pdf',
#             'score': 0.3676220203221626,
#             ...
#         }
#     ),
# ]

We also store the page number under the key page with each chunk that helps understand where the answer is coming from. You can fetch the page key while during retrieval (refer to the example given above).

Note that we do not support password protected pdf files.