pdftron::PDF::TextExtractor::Word Class Reference
TextExtractor::Word object represents a word on a
PDF page.
More...
#include <TextExtractor.h>
List of all members.
Detailed Description
TextExtractor::Word object represents a word on a
PDF page.
Each word contains a sequence of characters in one or more styles (see TextExtractor::Style).
Constructor & Destructor Documentation
pdftron::PDF::TextExtractor::Word::Word |
( |
|
) |
|
Member Function Documentation
int pdftron::PDF::TextExtractor::Word::GetNumGlyphs |
( |
|
) |
|
- Returns:
- The number of glyphs in this word.
void pdftron::PDF::TextExtractor::Word::GetBBox |
( |
double |
out_bbox[4] |
) |
|
- Parameters:
-
| out_bbox | The bounding box for this word (in unrotated page coordinates). |
- Note:
- To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().
void pdftron::PDF::TextExtractor::Word::GetQuad |
( |
double |
out_quad[8] |
) |
|
- Parameters:
-
| out_quad | The quadrilateral representing a tight bounding box for this word (in unrotated page coordinates). |
void pdftron::PDF::TextExtractor::Word::GetGlyphQuad |
( |
int |
glyph_idx, |
|
|
double |
out_quad[8] | |
|
) |
| | |
- Parameters:
-
| glyph_idx | The index of a glyph in this word. |
| out_quad | The quadrilateral representing a tight bounding box for a given glyph in the word (in unrotated page coordinates). |
Style pdftron::PDF::TextExtractor::Word::GetCharStyle |
( |
int |
char_idx |
) |
|
- Parameters:
-
| char_idx | The index of a character in this word. |
- Returns:
- The style associated with a given character.
Style pdftron::PDF::TextExtractor::Word::GetStyle |
( |
|
) |
|
- Returns:
- predominant style for this word.
int pdftron::PDF::TextExtractor::Word::GetStringLen |
( |
|
) |
|
- Returns:
- the number of characters in this word.
const Unicode* pdftron::PDF::TextExtractor::Word::GetString |
( |
|
) |
|
- Returns:
- the content of this word represented as a Unicode string.
Word pdftron::PDF::TextExtractor::Word::GetNextWord |
( |
|
) |
|
- Returns:
- the next word on the current line.
int pdftron::PDF::TextExtractor::Word::GetCurrentNum |
( |
|
) |
|
- Returns:
- the index of this word of the current line. A word that starts the line will return 0, whereas the last word in the line will return (line.GetNumWords()-1).
bool pdftron::PDF::TextExtractor::Word::IsValid |
( |
|
) |
|
- Returns:
- true if this is a valid word, false otherwise.
bool pdftron::PDF::TextExtractor::Word::operator== |
( |
const Word & |
|
) |
|
bool pdftron::PDF::TextExtractor::Word::operator!= |
( |
const Word & |
|
) |
|