Step 1. Create and Upload Content

The knowledge base feature offers a simple and effective way to store and manage external data, allowing agents to interact with specific datasets, thereby improving the accuracy and utility of models' responses.

Upon data upload, iSiri automatically segments the uploaded documents into content fragments and retrieves the most relevant content through various means. The LLM models then uses the searched and recalled content to generate the final response. iSiri's knowledge base effectively mitigates issues like model hallucinations and insufficient domain-specific knowledge, enhancing response accuracy.

The first step is to upload the content to the knowledge base. iSiri supports importing text and table data, and offers multiple ways to do so. Proper segmentation of uploaded content can improve the relevance of the recalled information, thereby increasing the response accuracy of large models.

It is recommended to familiarize yourself with the use cases and import methods for different knowledge types before uploading content, to better manage the knowledge base.

Text vs Table Comparison

Aspect

Text Type

Table Type

Use Case

Text-based knowledge bases allow retrieval and recall of content fragments for applications like Q&A.

Table-based knowledge bases support indexed column matching (row-wise), and can handle NL2SQL queries and calculations.

Import Methods

Local files (.txt, .pdf, .doc, .docx), or manual input.

Local tables (.csv, .xlsx), API integration, third-party (Feishu tables), or manual input.

Segmentation

Automatic, GraphRAG, or Customize segmentation.

Default to row-based segmentation; no further setup is needed.

Upload Text Content

Follow these steps to upload text content:

  1. Log in to the iSiri platform.

  2. In the left navigation panel, choose "Prompt Studio" and click "Knowledge Base" at the bottom

  3. On the "Knowledge Base" page, click "+ Create" at the top right.

  4. On the "Create Knowledge" page, complete the content upload by filling in the Name, Tag (easy to sort files), Describe (the purpose or functionality), Permissions (this content is used by public or only for private usage), Embeddings (model to vectorize the content. One agent should search through files with the same embeddings model).

  5. Select "File Type" accordingly and click "Next step."

  6. Choose the "Segment mode," then token counts, estimation fee, and Segmented Preview will be visible.

  7. If the segmented content preview is as expected, you may click "Confirm" and finish the creation process.

Segmentation Comparison

Segment mode

Automatic

GraphRAG

Customize

Description

Automatically set segmentation rules and preprocessing rules by iSiri.

The latest RAG segmentation method maximizes the preservation of text content relationships (high cost).

Customize parameters such as segment identifier, segmentation length, and preprocessing rules.

Use cases

There is no particular requirement for segmentation. Easiest choice.

Reserve complex relations in content, the most efficient mode for information search, but the highest cost.

The structure of the content file is unique and well-understood.

Recall method(view details in Retrieval Test)

Mixed, Vector, and Full-text recall methods are supported.

Local and Global recall methods are supported.

Mixed, Vector, and Full-text recall methods are supported.

Rerank

Supported.

Default, not able to re-rank

Supported

Last updated