Skip to main content

Documentation Index

Fetch the complete documentation index at: https://embedchain-user-dyadav-remove-pipeline.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

  1. Setup the Github loader by configuring the Github account with username and personal access token (PAT). Check out this link to learn how to create a PAT.
from embedchain.loaders.github import GithubLoader

loader = GithubLoader(
    config={
        "token":"ghp_xxxx"
        }
    )
  1. Once you setup the loader, you can create an app and load data using the above Github loader
import os
from embedchain.pipeline import Pipeline as App

os.environ["OPENAI_API_KEY"] = "sk-xxxx"

app = App()

app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)

response = app.query("What is Embedchain?")
# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.
The add function of the app will accept any valid github query with qualifiers. It only supports loading github code, repository, issues and pull-requests.
You must provide qualifiers type: and repo: in the query. The type: qualifier can be a combination of code, repo, pr, issue. The repo: qualifier must be a valid github repository name.

Valid queries

  • repo:embedchain/embedchain type:repo - to load the repository
  • repo:embedchain/embedchain type:issue,pr - to load the issues and pull-requests of the repository
  • repo:embedchain/embedchain type:issue state:closed - to load the closed issues of the repository
  1. We automatically create a chunker to chunk your GitHub data, however if you wish to provide your own chunker class. Here is how you can do that:
from embedchain.chunkers.common_chunker import CommonChunker
from embedchain.config.add_config import ChunkerConfig

github_chunker_config = ChunkerConfig(chunk_size=2000, chunk_overlap=0, length_function=len)
github_chunker = CommonChunker(config=github_chunker_config)

app.add(load_query, data_type="github", loader=loader, chunker=github_chunker)