Usage

Suphalak

Read content from files

Suphalak.reading(file, file_name, loader)

Extract and return the content of a file as a string.

Parameters:
  • file (BinaryIO) – The file object to read.

  • file_name (str) – The name of the file.

  • loader (str) – The loader type used for parsing.

Returns:

Extracted content from the file.

Return type:

str

Malet

Split content into chunks

Malet.chunking(text, chunk_method='token', **kwargs)

Split text into chunks using the specified method.

Parameters:
  • text (str) – The text to split.

  • chunk_method (Optional[Literal["token", "separate"]]) – The method for chunking (“token” or “separate”).

  • **kwargs (Any) – Additional parameters for chunking.

Returns:

List of text chunks.

Return type:

list[str]

WichienMaat

Embed chunks into vectors

WichienMaat.embedding(sentence, model_name=None)

Convert sentences into vector embeddings.

Parameters:
  • sentence (str | list[str]) – A single sentence or a list of sentences.

  • model_name (Optional[str]) – Optional model name for embedding.

Returns:

Embedding vectors as a NumPy array.

Return type:

numpy.ndarray

KhaoManee

Search vectors with queries

KhaoManee.searching(query_embed, sentence_embed, document, top_k)

Search for the most relevant chunks based on query embeddings.

Parameters:
  • query_embed (numpy.ndarray) – Query embedding vector.

  • sentence_embed (numpy.ndarray) – Embeddings of sentences to search.

  • document (Document) – The original document object.

  • top_k (int) – Number of top results to return.

Returns:

List of search results with relevance scores.

Return type:

list[dict]

Kornja

Generate answers from vectors

Kornja.generating()

Note

This function is under development and will generate answers from retrieved contexts in future releases.