Data type handling
Automatic data type detection
The add method automatically tries to detect the data_type, based on your input for the source argument. So app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')
is enough to embed a YouTube video.
This detection is implemented for all formats. It is based on factors such as whether itโs a URL, a local file, the source data type, etc.
Debugging automatic detection
Set log_level: DEBUG
in the config yaml to debug if the data type detection is done right or not. Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
Forcing a data type
To omit any issues with the data type detection, you can force a data_type by adding it as a add
method argument.
The examples below show you the keyword to force the respective data_type
.
Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
Remote data types
Use local files in remote data types
Some data_types are meant for remote content and only work with URLs.
You can pass local files by formatting the path using the file:
URI scheme, e.g. file:///info.pdf
.
Reusing a vector database
Default behavior is to create a persistent vector db in the directory ./db. You can split your application into two Python scripts: one to create a local vector db and the other to reuse this local persistent vector db. This is useful when you want to index hundreds of documents and separately implement a chat interface.
Create a local index:
You can reuse the local index with the same code, but without adding new documents:
Resetting an app and vector database
You can reset the app by simply calling the reset
method. This will delete the vector database and all other app related files.
Was this page helpful?