libthai  0.1.25
Functions
thbrk.h File Reference

Thai word segmentation. More...

Functions

ThBrk * th_brk_new (const char *dictpath)
 Create a dictionary-based word breaker. More...
 
void th_brk_delete (ThBrk *brk)
 Delete a word breaker. More...
 
int th_brk_find_breaks (ThBrk *brk, const thchar_t *s, int pos[], size_t pos_sz)
 Find word break positions in Thai string. More...
 
int th_brk_insert_breaks (ThBrk *brk, const thchar_t *in, thchar_t *out, size_t out_sz, const char *delim)
 Insert word delimitors in given string. More...
 
int th_brk (const thchar_t *s, int pos[], size_t pos_sz)
 Find word break positions in Thai string. More...
 
int th_brk_line (const thchar_t *in, thchar_t *out, size_t out_sz, const char *delim)
 Insert word delimitors in given string. More...
 

Detailed Description

Thai word segmentation.

Function Documentation

§ th_brk()

int th_brk ( const thchar_t s,
int  pos[],
size_t  pos_sz 
)

Find word break positions in Thai string.

Parameters
s: the input string to be processed
pos: array to keep breaking positions
pos_sz: size of pos[]
Returns
the actual number of breaking positions occurred

Finds word break positions in Thai string s and stores at most n breaking positions in pos[], from left to right. Uses the shared word breaker.

(This function is deprecated since version 0.1.25, in favor of th_brk_find_breaks(), which is more thread-safe.)

§ th_brk_delete()

void th_brk_delete ( ThBrk *  brk)

Delete a word breaker.

Parameters
brk: the word breaker

Frees memory associated with the word breaker.

(Available since version 0.1.25, libthai.so.0.3.0)

§ th_brk_find_breaks()

int th_brk_find_breaks ( ThBrk *  brk,
const thchar_t s,
int  pos[],
size_t  pos_sz 
)

Find word break positions in Thai string.

Parameters
brk: the word breaker
s: the input string to be processed
pos: array to keep breaking positions
pos_sz: size of pos[]
Returns
the actual number of breaking positions occurred

Finds word break positions in Thai string s and stores at most pos_sz breaking positions in pos[], from left to right.

(Available since version 0.1.25, libthai.so.0.3.0)

§ th_brk_insert_breaks()

int th_brk_insert_breaks ( ThBrk *  brk,
const thchar_t in,
thchar_t out,
size_t  out_sz,
const char *  delim 
)

Insert word delimitors in given string.

Parameters
brk: the word breaker
in: the input string to be processed
out: the output buffer
out_sz: the size of out
delim: the word delimitor to insert
Returns
the actual size of the processed string

Analyzes the input string and store the string in output buffer with the given word delimitor inserted at every word boundary.

(Available since version 0.1.25, libthai.so.0.3.0)

§ th_brk_line()

int th_brk_line ( const thchar_t in,
thchar_t out,
size_t  out_sz,
const char *  delim 
)

Insert word delimitors in given string.

Parameters
in: the input string to be processed
out: the output buffer
out_sz: the size of out
delim: the word delimitor to insert
Returns
the actual size of the processed string

Analyzes the input string and store the string in output buffer with the given word delimitor inserted at every word boundary. Uses the shared word breaker.

(This function is deprecated since version 0.1.25, in favor of th_brk_insert_breaks(), which is more thread-safe.)

§ th_brk_new()

ThBrk* th_brk_new ( const char *  dictpath)

Create a dictionary-based word breaker.

Parameters
dictpath: the dictionary path, or NULL for default
Returns
the created instance, or NULL on failure

Loads the dictionary from the given file and returns the created word breaker. If dictpath is NULL, first searches in the directory given by the LIBTHAI_DICTDIR environment variable, then in the library installation directory. Returns NULL if the dictionary file is not found or cannot be loaded.

The returned ThBrk object should be destroyed after use using th_brk_delete().

In multi-thread environments, th_brk_new() and th_brk_delete() should be used to create and destroy a word breaker instance inside critical sections (i.e. with mutex). And the word breaker methods can then be safely called in parallel during its lifetime.

(Available since version 0.1.25, libthai.so.0.3.0)


Generated for libthai by doxygen 1.8.12