Usage
Suphalak
Read content from files
- Suphalak.reading(file, file_name, loader)
Extract and return the content of a file as a string.
- Parameters:
file (BinaryIO) – The file object to read.
file_name (str) – The name of the file.
loader (str) – The loader type used for parsing.
- Returns:
Extracted content from the file.
- Return type:
str
Malet
Split content into chunks
- Malet.chunking(text, chunk_method='token', **kwargs)
Split text into chunks using the specified method.
- Parameters:
text (str) – The text to split.
chunk_method (Optional[Literal["token", "separate"]]) – The method for chunking (“token” or “separate”).
**kwargs (Any) – Additional parameters for chunking.
- Returns:
List of text chunks.
- Return type:
list[str]
WichienMaat
Embed chunks into vectors
- WichienMaat.embedding(sentence, model_name=None)
Convert sentences into vector embeddings.
- Parameters:
sentence (str | list[str]) – A single sentence or a list of sentences.
model_name (Optional[str]) – Optional model name for embedding.
- Returns:
Embedding vectors as a NumPy array.
- Return type:
numpy.ndarray
KhaoManee
Search vectors with queries
- KhaoManee.searching(query_embed, sentence_embed, document, top_k)
Search for the most relevant chunks based on query embeddings.
- Parameters:
query_embed (numpy.ndarray) – Query embedding vector.
sentence_embed (numpy.ndarray) – Embeddings of sentences to search.
document (Document) – The original document object.
top_k (int) – Number of top results to return.
- Returns:
List of search results with relevance scores.
- Return type:
list[dict]
Kornja
Generate answers from vectors
- Kornja.generating()
Note
This function is under development and will generate answers from retrieved contexts in future releases.