Next:   [Contents][Index]

gString Text Conversion Guide

Copyright © 2005 - 2025
 Mahlon R. Smith, The Software Samurai

This manual describes version 0.0.37 of the gString class.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled  "GNU Free Documentation License".

 This document is an extract from the NcDialog API library documentation.
 Please see the larger document for more details and examples.





Table of Contents


gString Text Tool

’gString’ is a small, fast and flexible way to seamlessly convert, format and analyze both UTF-8 and wchar_t (’wide’) text.


Introduction to gString

Introduction to a Wider World

Modern applications must be designed for a worldwide audience, and for this reason, the application designer must plan for multi-language support.

Fortunately, the Universal Character Set standard ISO/IEC 10646 and UTF-8, the most popular and flexible method of character-encoding smoothly provide all the character sets, alphabets and special characters which are currently in general use.

Unfortunately, the C and C++ languages offer only minimal support for internationalization. std::string and std::wstring are nothing more than a cruel joke to a serious application designer. The GTK+ toolkit’s Glib::ustring class is an excellent internationalization tool, but it requires installation of the GTK+ toolkit’s ’glib’ and ’glibmm’ libraries.
For more information on Glib::ustring, see: https://developer.gnome.org/

’gString’ falls halfway between the full feature set of Glib::ustring and the meaningless garbage that is std::string. ’gString’ consists of one C++ header file and one C++ source code module. ’gString’ is integrated into the NcDialog API library, but may also be compiled independently as a small (16Kb) link library or the source may be embedded directly into your application.

Preparing to Support Multiple Languages

Here are the basic ideas you will need to understand in order for your application to smoothly support multiple languages.

  1. ASCII (American Standard Code for Information Interchange) is only the very first step in character encoding. It is an ancient and venerable encoding, but supports only the 95 printable characters of the basic Latin alphabet.
    If you think you can say "你是在欺骗自己!" in ASCII, you’re just deluding yourself!
  2. NEVER assume that one character can be represented with one byte.
  3. NEVER assume that one character is one display column in width.
  4. The idea that text flows from left-to-right, may only be PROVISIONALLY assumed, because again: "!איר זענט נאָר דילודינג זיך" you’re just deluding yourself.
  5. NEVER assume that "everyone reads English, so why bother?". Native speakers of Spanish, French, Chinese, the various flavors of Arabic and others (i.e. your potential customers) all have a significant impact on the daily events of our planet, so include them when planning for your next killer app.

See also a discussion of multiple-language support in the NcDialog API.




gString Public Methods

What follows is a list of all public methods of the gString class.
Methods are arranged in functional groups.

        gString Method Name            Chapter Reference    
 gString [constructor] see gString Instantiation
 ~gString [destructor] 
 operator= see Assignment Operators
  
 compose see Formatted Assignments
  
 formatInt see Integer Formatting
  
 gstr see Data Access
 ustr 
  
 copy see Copying Data
 operator<< 
 substr 
  
 append see Modifying Existing Data
 insert 
 limitChars 
 limitCols 
 shiftChars 
 shiftCols 
 padCols 
 strip 
 erase 
 replace 
 loadChars 
 textReverse 
 formatParagraph 
  
 compare see Comparisons
 compcoll 
 operator== 
 operator!= 
 find 
 findlast 
 after 
 findr 
 findx 
 scan 
  
 gscanf see Extract Formatted Data
  
 gschars see Statistical Info
 gscols 
 utfbytes 
 isASCII 
  
 clear see gString Miscellaneous
 wAlloc 
 uAlloc 
 freeSpace 
 reAlloc 
 Get_gString_Version 
 dumpGstring 
 dbMsg 



gString Instantiation

The following are the ’constructors’ for the gString class.

For those new to C++, a constructor creates an ’instance’ of the class. An instance is a particular, named object, and can be thought of as a complex variable.

  • gString ( void ) ;
      Input  :
         none
      Returns:
         nothing
    

    Constructor: Initialize members to default values (NULL string).


  • gString ( int32_t charAlloc );
      Input  :
         charAlloc : storage capacity
            interpreted as the maximum number of wchar_t (32-bit) characters 
            for the storage buffer.
            Range: gsALLOCMIN <= charAlloc <= gsALLOCMAX
            -- The minimum allocation of gsALLOCDMIN will result in the
               same storage capacity at the default constructor.
            -- The maximum allocation of gsALLOCMAX is 1000 times the
               minimum allocation.
            The specified value is normalized by rounding upward to the
            nearest multiple of gsALLOCMIN.
    
      Returns: nothing
    

    Constructor: Initialize members to default values with empty string.

    This constructor is used to specify the initial storage capacity for the gString object. This may be useful if it is known that the data the object will hold will be larger than the default storage capacity.
    For more information, see Dynamic Memory Allocation.


  • gString ( const char* usrc, int32_t charLimit = -1 ) ;
      Input  :
         usrc     : pointer to a UTF-8-encoded, null-terminated string
         charLimit: (optional, -1 by default) maximum number of source
                    characters to be stored (not including nullchar).
                    This is the number of characters (NOT the number of
                    UTF-8 bytes).
                    If not specified, then all source data will be stored
                    up to the maximum storage limit.
    
      Returns: nothing
    

    Constructor: Convert specified UTF-8-encoded source to gString.


  • gString ( const wchar_t* wsrc, int32_t charLimit = -1 ) ;
      Input  :
         wsrc      : pointer to a wchar_t-encoded, null-terminated string
         charLimit : (optional, -1 by default)
                     maximum number of characters from source array to 
                     convert
    
      Returns: nothing
    

    Constructor: Convert specified wchar_t (’wide’) source to gString.


  • gString ( short iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned short iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( long int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned long int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( long long int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned long long int iVal, short fWidth, bool lJust = false,
      bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

      Input  :
         iVal  : value to be converted
                 Supported value range: plus/minus 9.999999999999 terabytes
         fWidth: field width (number of display columns)
                 range: 1 to FI_MAX_FIELDWIDTH
         lJust : (optional, false by default)
                 if true, strip leading spaces to left-justify the value
                 in the field. (resulting string may be less than fWidth)
         sign  : (optional, false by default)
                 'false' : only negative values are signed
                 'true'  : always prepend a '+' or '-' sign.
         kibi  : (optional, false by default)
                 'false' : calculate as a decimal value (powers of 10)
                           kilobyte, megabyte, gigabyte, terabyte
                 'true'  : calculate as a binary value (powers of 2)
                           kibibyte, mebibyte, gibibyte, tebibyte
         units  : (optional) member of enum fiUnits (fiK by default)
                  specifies the format for the units suffix.
                  Note that if the uncompressed value fits in the field,
                  then this parameter is ignored.
    
      Returns: nothing
         Note: if field overflow, field will be filled with '#' characters.
    

    Constructor: Convert specified integer value to gString.
    Please see Integer Formatting 'formatInt' method group for formatting details.


  • gString ( const char* fmt, const void* arg1, ... )
      __attribute__ ((format (gnu_printf, 2, 0))) ;
      Input  :
         fmt       : a format specification string in the style of 
                     sprintf(), swprintf() and related formatting 
                     C/C++ functions.
         arg1      : pointer to first value to be converted by 'fmt'
         ...       : optional arguments (between ZERO and gsfMAXARGS - 1)
                     Each optional argument is a POINTER (address of) the 
                     value to be formatted.
      Returns:
         nothing
    

    Constructor: Convert formatting specification and its arguments to gString. Please refer to the compose() method:
    (see Formatted Assignments) for more information.

    Technical Note: There is no constructor using a “const wchar_t* fmt” format specification because it would conflict with the constructor which limits the number of characters used to initialize the instance.


  • ~gString ( void ) ;
      Input  :
         none
      Returns:
         nothing
    

    Destructor: Release all resources associated with the gString object.

    Object is destroyed either when it goes out of scope, or by explicitly deleting the object.

    For those new to C++, please note that if you use the ’new’ keyword to create objects, then those objects persist (take up space) until you explicitly delete them or until the application is closed, even if the pointer to the object has gone out-of-scope. See examples below.

    Examples

    void calling_method ( void )
    {
       gString *gsPtr = playful_kitten ( "Hello World!" ) ;
    
       // report contents of object created by called method
       wcout << gsPtr->gstr() << endl ;
       delete gsPtr ; // delete object created by called method
    }
    
    gString* playful_kitten ( const char* msg )
    {
       gString gs_local( "I love tuna!" ) ;  // local object
       gString *gsPtr1 = new gString,        // global object
               *gsPtr2 = new gString(msg) ;  // global object (initialized)
       gString *gsArray = new gString[4] ;   // global array
    
       *gsPtr1 = gs_local ;   // be a kitten: play with the strings...
       gsArray[2] = *gsPtr2 ;
       gsArray[3] = "Scratch my belly!" ;
       gsArray[1] = gsArray[3] ;
    
       delete gsPtr1 ;      // delete object referenced by gsPtr1
       delete [] gsArray ;  // delete object array referenced by gsArray
       return gsPtr2 ;      // return pointer to object referenced by gsPtr2
                            // (caller is responsible for deleting object)
    }           // 'gs_local' goes out of scope and is destroyed here
    


Dynamic Memory Allocation

  • During instantiation of a gString object, a dynamic memory allocation occurs; that is, memory is allocated from the “heap” which is RAM memory not directly controlled by the application.

    This is in contrast to the “stack” memory which is assigned to the application on startup.

  • The dynamic allocation is initially the default value: gsALLOCDFLT. This is the number of UTF-32 (wchar_t) characters the object can hold.

    Storage capacity is automatically expanded as necessary to accomodate the size of the data assigned to the object.

  • The exception to the default allocation is the gString constructor which takes as it argument a specific storage capacity, expressed as the number of UTF-32 (wchar_t) characters. Examples:
       gString gs( gsALLOCMED );   gString gs( 2500 );
    This constructor is described above.
  • The DEFAULT storage capacity is the minimum capacity:
          (gsALLOCDFLT == gsALLOCMIN)
    Total dynamic allocation for the default storage capacity will be:
       1K UTF-32 characters + 4K UTF-8 bytes + 1K 16-bit integers,
       or 4Kb + 4Kb + 2Kb == 10Kb (approximately).
  • The “medium” capacity, gsALLOCMED is four(4) times the minimum capacity, i.e. 4,096 wchar_t characters.
    This should handle all but the most extreme circumstances. Total dynamic allocation will be:
       4K UTF-32 characters + 16K UTF-8 bytes + 4Kb 16-bit integers,
       or 16Kb + 16Kb + 8Kb == 40Kb (approximately).
  • The MAXIMUM storage capacity is one thousand (1,000) times the minimum allocation.
    This is 1,024,000 wchar_t characters. (Note the mixture of Kilo-byte and Kibi-byte math.) Total dynamic allocation will be:
       1000K UTF-32 characters + 4000K UTF-8 bytes + 1000K 16-bit integers,
       or 1Mb + 1Mb + 500Kb == 2.5Mb (approximately).
    This is enough memory to hold a short novel, and so the maximum allocation will be needed in only the most extreme cicumstances.
  • The “reAlloc” method described in the chapter, gString Miscellaneous, may be used to manually increase or decrease the size of the storage buffers in increments of gsALLOCMIN.
  • All methods which set or modify the stored text data implement automatic re-sizing of the data buffers.

    The ‘gsmeter’ test application reports all these methods to be fully functional; however, be aware that the allocation/reallocation algorithm is still young, relatively speaking, (about four months as of Jan. 2025); so if a problem arises, please drop the author a note.

    Please see gsmeter Test App for additional information; specifically the R[a|b] (reallocate) tests which exercise all methods that perform modifications to the size of the storage buffers.




Assignment Operators

For those new to C++, an assignment operator assigns (initializes) the object to the left of the ’=’ using the data on the right of the ’=’. You may also hear the term ’overloaded operator’. This just means that the ’=’ assignment operator may be defined in more than one way, so it will perform different tasks according to the context or circumstance.

  • void operator = ( const char* usrc ) ;
  • void operator = ( const uint8_t* usrc ) ;
      Input  :
         usrc  : pointer to an array of UTF-8-encoded characters
      Returns:
         nothing
    

    Assignment operator: converts UTF-8-encoded source to gString.

  • void operator = ( const wchar_t* wsrc ) ;
      Input  :
         wsrc  : pointer to an array of wchar_t 'wide' characters
      Returns:
         nothing
    

    Assignment operator: converts wchar_t (’wide’) source to gString.

  • void operator = ( const gString& gssrc ) ;
      Input  :
         gssrc : gString object to be copied (by reference)
      Returns:
         nothing
    

    Assignment operator. Copies one gString object to another.

Examples

char utf8Data[] = { "Youth is wasted on the young." } ;
gString gs1, gs2 ;

gs1 = utf8Data ;
gs2 = gs1 ;
wcout << gs2 << endl ;
 - - -> Youth is wasted on the young.



Formatted Assignments

  • const wchar_t* compose ( const wchar_t* fmt, ... )
      __attribute__ ((format (gnu_wprintf, 2, 0))) ;

  • const wchar_t* compose ( const char* fmt, ... )
      __attribute__ ((format (gnu_printf, 2, 0))) ;
      Input  :
         fmt  : a format specification string in the style of sprintf(),
                swprintf() and related formatting C/C++ functions.
         ...  : optional arguments (between ZERO and gsfMAXARGS)
                Each optional argument is a POINTER (address of) the value
                to be formatted.
                - Important Note: There must be AT LEAST as many optional
                  arguments as the number of format specifiers defined in
                  the formatting string. Excess arguments will be ignored;
                  HOWEVER, too few arguments will result in an application
                  crash. You have been warned.
      Returns:
         const wchar_t* to formatted data
    

    Create formatted text data from a format specification string including between ZERO and gsfMAXARGS format specifications and their corresponding argument pointers.

    Supported data types:
     %d, %i  integer (decimal)
     %o      integer (octal)
     %u      integer (unsigned)
     %x, %X  integer (hex lower or upper case)
     %f      floating point (fixed point)
     %e, %E  floating point (scientific notation, lower/uppercase)
     %g, %G  floating point (normal/exponential, lower/uppercase)
     %a, %A  floating point (hex fraction)
     %c      character
     %C      character (alias for %lc)
     %s      string
     %S      string (alias for %ls)
     %p      pointer
     %b, %B  (extension to swprintf - see description below)
     %m      capture 'errno' description string (see /usr/include/errno.h)
     %n      number of characters printed so far
             (value written to corresponding argument's location)
     %%      literal '%'
    
    See man pages for the C/C++ function 'swprintf' or
    'Table of Output Conversions' for additional details.
    

Examples

char      Greeting[] = { "Hello!" } ;
int       iValue = 27064 ;
long long int qValue = 7842561 ;
long int  lValue = 485772 ;
short int sValue1 = 28875, sValue2 = -261, sValue3 = 529 ;
bool      flagValue = true ;
float     fltValue = 278.5610234 ;
double    dblValue = 9982.5610234 ;
gString gs ;
gs.compose( "%s - %d %12hd, %-hi, %#hx %08lXh %lld %hhd",
            Greeting, &iValue, &sValue1, &sValue2, &sValue3,
            &lValue, &qValue, &flagValue ) ;
wcout << gs << endl ;
 - - -> Hello! - 27064        28875, -261, 0x211 0007698Ch 7842561 1
gs.compose( "floating downstream:%10.2f and doubling our pace:%.4lf",
            &fltValue, &dblValue ) ;
wcout << gs << endl ;
 - - -> floating downstream:    278.56 and doubling our pace:9982.5610

See also formatted instantiation: gString Instantiation.


Important Note on Formatting

Because THE PARAMETERS ARE POINTERS TO THEIR DATA, similar to the C/C++ library function ’sscanf’ and friends, the compiler cannot perform automatic promotions from short int* to int* or from float* to double*, and so-on as it would for swprintf.

This implementation was selected because a) it eliminates data-width conflicts when moving among hardware platforms, and b) it reduces code size while increasing performance.

This implementation relies on you, the designer, to use care that the data type you specify in the formatting string matches the data type of the variable referenced by its parameter pointer AND that you use the ’address-of’ (’&’) operator to reference non-pointer variables. Note also that ’literal’ values may not be used as parameters because literals have no address.

The following constructs will produce errors:

gString gs ;
char   grade = 'A' ;
short  age   = 21 ;
int    sat   = 1550 ;
double gpa   = 3.75 ;

   // These examples fail to use the 'address-of' operator for the 
   // referenced variables, and will cause a 'segmentation fault' 
   // i.e. an application crash.
   gs.compose( "My grade is an %c", grade ) ;
   gs.compose( "I got a %d on my SAT.", sat ) ;
   // The above should be:
   gs.compose( "My grade is an %c", &grade ) ;
   gs.compose( "I got a %d on my SAT.", &sat ) ;

   // These examples use mismatched format-specification/variable 
   // reference. This will result in either bad data out OR will 
   // cause a memory-access violation.
   gs.compose( "I can't wait to be %d.", &age ) ;
   gs.compose( "My GPA is %1.3f", &gpa ) ;
   gs.compose( "The hex value of %c is: %#x", &grade, &grade ) ;
   gs.compose( "My GPA is %1.3lf", 3.88 ) ; // (literal value)
   // The above should be:
   gs.compose( "I can't wait to be %hd.", &age ) ;
   gs.compose( "My GPA is %1.3lf", &gpa ) ;
   gs.compose( "The hex value of %c is: %#hhx", &grade, &grade ) ;

Parameter Type Checking:
Unfortunately, type-checking of wchar_t formatting strings is not yet supported by the gnu (v:4.8.0) compiler, (but see wchar.h which is preparing for the future). Thus, use care when constructing your ’wchar_t fmt’ formatting string. The ’char fmt’ string IS type-checked.

IMPORTANT NOTE:
Depending on your compiler version, you may get a warning when using the '%b' binary format specification (described below):
   "warning: unknown conversion type character ‘b’ in format [-Wformat=]"
This is because the preprocessor does not recognize our custom format specifier. If this happens, use a ’wchar_t’ (wide) formatting template to avoid the preprocessor type checking.

Instead of:
      gs.compose( "bitmask: %b", &wk.mevent.eventType );
Use this (not type checked by the preprocessor):
      gs.compose( L"bitmask: %b", &wk.mevent.eventType );

Formatted binary output (extension to swprintf)

We implement an extension to the swprintf output-conversion-specifiers for binary formatted output. We have found this formatting option useful when working with bit masks, for verifying bit-shifting operations during encryption/decryption and other uses.

  • Base formatting specifier:
    %b , %B Note that the lower-case / upper-case variants have identical function, and indicate only the case of the identifier character. See format modifiers for data size.
  • Format modifiers for data size are the same as for swprintf:
    hh , h , l , ll , L , q Examples: %hhb %hB %llB %qb
  • Format modifier for prepending of a data-type indicator.
    '#' (hash character) This is the same principle as for prepending a '0x' indicator to hex output, and will place either a 'b' or 'B' character at the beginning of the output. Examples: %#hhb -> b0101.1010 %#hhB -> B0101.1010
  • Format modifier for appending of a data-type indicator.
    '-#' (minus sign and hash character) Rather than prepending the indicator, the indicator will be append to the end of the output. Examples: %-#hhb -> 0101.1010b %-#hhB -> 0101.1010B
  • Format modifier for specifying the group-seperator character.
    By default, the bit groups are seperated by a '.' (fullstop) character. To specify an alternate seperator character: % hB -> 0111 0101 1010 0001 (' ' (space) as seperator) %_hB -> 0111_0101_1010_0001 ('_' (underscore) as seperator) %#/hB -> B0111/0101/1010/0001 ('/' (slash) as seperator) %-#-hB -> 0111-0101-1010-0001B ('-' (dash) as seperator) Valid seperator characters are any printable ASCII character that IS NOT alphabetical, IS NOT a number, and IS NOT a '.'(fullstop)
  • Format modifier for specifying bit grouping.
    By default, bits are formatted in groups of four (4 nybble); however, if desired, bits can be formatted in groups of eight (8 byte): %.8hB -> 01110101.10100001 %-.8hB -> 01110101-10100001 %# .8hB -> B01110101 10100001 %-#`.8hb -> 01110101`10100001b

Field-width specification (swprintf bug fix)

The standard library ’swprintf’ function has a design flaw for format specifications that include a field-width specifier.

’swprintf’ pads the string to the specified number of CHARACTERS, not the number of COLUMNS as it should do. For ASCII numeric source values this is not a problem because one character equals one display column. For string source data, however, if the source string contains characters that require more than one display column each, then the output may be too wide.

Therefore, for string-source-formatting specifications ONLY:
 (examples: "%12s" "%-6s" "%16ls" "%5S" "%-24S")
we compensate for this ethnocentric behavior by interpreting the field-width specifier as number-of-columns, NOT number-of-characters. For non-ASCII string data, this will result in output that appears different (and better) than output created directly by the ’swprintf’ function.


Unsupported format specifications

Conversion modifiers that are not fully supported at this time:
 ’j’, ’z’, ’t’, ’%[’
Also, the ’*’ field-width specification or precision specification which uses the following argument as the width/precision value IS NOT supported.




Integer Formatting

  • bool formatInt ( short iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned short iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( int iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned int iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( long iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned long iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( long long iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned long long iVal, short fWidth,
      bool lJust = false, bool sign = false,
      bool kibi = false, fiUnits units = fiK ) ;

  Input  :
     iVal  : value to be converted
             Supported value range: plus/minus 9.999999999999 terabytes
     fWidth: field width (number of display columns)
             range: 1 to FI_MAX_FIELDWIDTH
     lJust : (optional, false by default)
             if true, strip leading spaces to left-justify the value
             in the field. (resulting string may be less than fWidth)
     sign  : (optional, false by default)
             'false' : only negative values are signed
             'true'  : always prepend a '+' or '-' sign.
     kibi  : (optional, false by default)
             'false' : calculate as a decimal value (powers of 10)
                       kilobyte, megabyte, gigabyte, terabyte
             'true'  : calculate as a binary value (powers of 2)
                       kibibyte, mebibyte, gibibyte, tebibyte
     units  : (optional) member of enum fiUnits (fiK by default)
              specifies the format for the units suffix.
              Note that if the uncompressed value fits in the field,
              then this parameter is ignored.

  Returns:
     'true' if successful
     'false' if field overflow (field will be filled with '#' chars)
             See notes below on field overflow.

Convert an integer value into a formatted display string of the specified width. Value is right-justified in the field, with leading spaces added if necessary (but see ’lJust’ parameter).

Maximum field width is FI_MAX_FIELDWIDTH. This is wide enough to display a 18-digit, signed and comma-formatted value: '+9,876,543,210,777'

Actual formatting of the value depends on the combination of: a) magnitude of the value b) whether it is a signed value c) the specified field-width d) the specified suffix format e) locale-specific grouping of digits according the LC_NUMERIC locale environment variable Important Note: the 'C' (default) locale defines an empty string as the grouping separator character. Therefore, the locale should be explicitly set before calling this method. (This is done automatically when the NcDialog API is initialized.) f) See notes below on the possible reasons for field overflow: see field overflow The following examples are based on the U.S. English locale: ‘en_US.utf8’.

Examples

1) Simple comma formatted output if specified field-width is sufficient.
   345    654,345    782,654,345    4,294,967,295

2) Output with values compressed to fit a specified field width.
   12.3K    999K    12.345M    1.234G    4.3G

   
gString gs ;      // gString object

3) Convert a signed integer value:
   int iValue = 28954 ;

   // field width == 8, right justified (note: compression unnecessary)
   gs.formatInt( iValue, 8 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :  28,954:

   // field width == 8, left justified (note: compression unnecessary)
   gs.formatInt( iValue, 8, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :28,954:

   // field width == 6
   gs.formatInt( iValue, 6 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :28.95K:

   // field width == 6 with forced sign
   gs.formatInt( iValue, 6, false, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :+29.0K:

   // field width == 5
   gs.formatInt( iValue, 5 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :29.0K:

   iValue = -28954 ;    // convert negative source value

   // field width == 8, right justified (note: compression unnecessary)
   gs.formatInt( iValue, 8 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  : -28,954:

   // field width == 8, left justified (note: compression unnecessary)
   gs.formatInt( iValue, 8, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :-28,954:

   // field width == 6
   gs.formatInt( iValue, 6 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :-29.0K:

   // field width == 5
   gs.formatInt( iValue, 5 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  : -29K:

4) Convert an unsigned long long integer value (field width == 11):
   unsigned long long int qValue = 39000009995 ;

   // decimal compression (gigabytes) with "official" IEC suffix
   gs.formatInt( qValue, 11, false, false, false, fikB ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :39.000010gB:

   // binary compression (gibibytes) with "official" IEC suffix
   gs.formatInt( qValue, 11, false, false, true, fiKiB ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :38.08595GiB:

Please see (NcDialog test application, ’Dialogw’ for more examples.)

Notes on the formatInt group

Optional specification of the units suffix for the ’formatInt’ methods. Units are specified using the optional 'units' parameter, which is a member of the 'fiUnits' enumerated type.. enum fiUnits : short { fiK, // 'K' 'M' 'G' 'T' (default) fik, // 'k' 'm' 'g' 't' fiKb, // 'Kb' 'Mb' 'Gb' 'Tb' fikB, // 'kB' 'mB' 'gB' 'tB' ("official" metric 'kilo' designation) fiKiB, // 'KiB' 'MiB' 'GiB' 'TiB' ("official" binary 'kibi' designation) } ; The ’formatInt’ methods use decimal (powers of 10) compression calculations by default. To use binary (powers of 2) compression, use the optional 'kibi' parameter. DECIMAL BINARY kilobytes (x/1000) kibibytes (x/1024) megabytes (x/1000000) mibibytes (x/1024000) gigabytes (x/1000000000) gibibytes (x/1024000000) terabytes (x/1000000000000) tebibytes (x/1024000000000)

The kilo/kibi controversy

The IEC (International System of Quantities) recommends lower-case for metric (powers of 10) and upper-case for binary (powers of 2). However, unless you must be accurate in presenting the data according to IEC standard, it is recommended that you choose the format according to: your space requirements, visual appeal, and clear communication with your users.
If you blindly follow style standards against your own better judgement, then be forever labelled as a weenie.

formatInt field overflow

As described above, the actual formatting of a fixed-width integer field depends on a number of factors. Every effort is made to compress the data to fit within the field while retaining an accurate representation of the numeric value.

There are cases, however, where it is not possible to represent the data within the specified field width. When this occurs, the entire field will be filled with HASH characters '#'.

The specified field must be wide enough to accomodate either the entire, uncompressed value, or the combination of compressed value, units designator and sign (if any). The following situations may cause field overflow.

a) Values <= -10.0 Tbytes or >= 10.0 Tbytes cannot be represented by 'formatInt' methods. b) One-column fields can display values between 0 and 9. Values outside this range will cause overflow. c) Two-column fields can display values between -9 and 99. Values outside this range will cause overflow. d) Three-column fields can display compressed data only if the combined width of value, sign and units require no more than three(3) columns. e) Four-column fields can display compressed data only if the combined width of value, sign and units require no more than four(4) columns. f) Five-column fields can accurately display any value IF the units designator requires only one(1) column. g) Six-column fields can accurately display any value IF the units designator requires no more than two(2) columns.

Fields of seven(7) or more columns can display any formatted value without danger of overflow.



How To Set the Application Locale In C++

The NcDialogAPI library automatically sets the application locale according to the console window environment. Please see the NcDialog documentation, chapter: Multi-language Support for details.

In brief, the “locale” specifies the upper-/lower-case text rules, numeric formatting, language-specific punctuation, currency symbols and so on.

The application locale should be taken from the terminal environment if possible, This is done by creating an instance of the "std::locale" structure referencing the empty string ("").
   locale* locptr = new locale("");
The captured locale is then made “global”, that is it replaces the so-called "classic" (C/C++ language) locale with the specified locale definition.
   locptr->global( *locptr );
Please see the C++ documentation for "std::locale" for details.




Data Access

  • const wchar_t* gstr ( void ) const ;
      Input  :
         none
      Returns:
         const pointer to array of wchar_t characters
    

    Return a const pointer to the wchar_t (wide) character array.


  • const wchar_t* gstr ( int32_t& charCount ) const ;
      Input  :
         charCount : (by reference, initial value ignored)
                     receives number of characters in array, 
                     including null terminator
      Returns:
         const pointer to array of wchar_t characters
    

    Return a const pointer to the wchar_t (wide) character array, along with the number of characters in the array (including the null terminator).


  • const char* ustr ( void ) const ;
      Input  :
         none
      Returns:
         const pointer to array of UTF-8 characters
    

    Return a const pointer to the char (UTF-8) character array.


  • const char* ustr ( int32_t& charCount, int32_t& byteCount ) const;
      Input  :
         charCount : (by reference, initial value ignored)
                     receives number of characters in array, 
                     including null terminator
         byteCount : (by reference, initial value ignored)
                     receives number of bytes in array, 
                     including null terminator
      Returns:
         const pointer to array of UTF-8 characters
    

    Return a const pointer to the char (UTF-8) character array, along with the number of characters and the number of bytes in the array (including the null terminator).


Examples

int32_t charCount, byteCount ;
gString gs( "Wherever you go, there you are!" ) ;

const wchar_t* wPtr = gs.gstr() ;
const wchar_t* wPtr = gs.gstr( charCount ) ;
const char* utf8Ptr = gs.ustr() ;
const char* utf8Ptr = gs.ustr( charCount, byteCount ) ;



Copying Data

  • int32_t copy ( char* uTarget, int32_t maxBytes, int32_t maxCols = -1 ) const ;
      Input  :
         uTarget  : pointer to target array to receive UTF-8-encoded text
         maxBytes : maximum number of bytes to copy (incl. NULL terminator)
         maxCols  : (optional, default: -1, count bytes only)
                    maximum number of display-columns to copy
      Returns:
         number of bytes copied (incl. NULL terminator)
    

    Copy gString text to specified target buffer.

    Important Note: It is the caller’s responsibility to specify a target buffer large enough to hold the data. Be safe: char ubuff[gs.utfbytes()];

    Technical Note: If source data bytes greater than specified 'maxBytes',
    then copy data up to the complete UTF-8 character <= byte limit.
    This avoids writing an invalid character into the target array.


  • int32_t copy ( wchar_t* wTarget, int32_t maxChars, int32_t maxCols = -1 ) const ;
      Input  :
         wTarget  : pointer to target array to receive wchar_t 'wide' text
         maxChars : maximum number of characters to copy (incl. NULL)
         maxCols  : (optional, default: -1, count characters only)
                    maximum number of display-columns to copy
      Returns:
         number of characters copied (incl. NULL terminator)
    

    Copy gString text to specified target buffer.

    Important Note: It is the caller’s responsibility to specify a target buffer large enough to hold the data. Be safe: wchar_t wbuff[gs.gschars()];


  • std::wostream& operator<< ( std:wostream& os, const gString& gs2 );
      Input  :
         IMPLIED reference to the output stream
         IMPLIED reference to the gString object
      Returns: reference to the specified output stream
    

    !! NON-MEMBER METHOD !!
    Insertion operator: Copies the contents of the gString object into the ’wcout’ (wide) standard output stream.

    Note that due to the way the output stream is defined, you cannot mix ’cout’ (narrow) and ’wcout’ (wide) output streams indiscriminately. −− If ’wcout’ is called first, then ’cout’ is disabled. −− If ’cout’ is called first, then both narrow and wide channels are active. −− ’wcout’ handles both narrow and wide data, however, ’cout’ handles ONLY narrow data. This is not related to gString, but is a characteristic of the default C++ output stream itself. We recommend that you always use the ’wcout’ stream in console applications for both narrow and wide text data.

  • std::ostream& operator<< ( std:ostream& os, const gString& gs2 );
      Input  :
         IMPLIED reference to the output stream
         IMPLIED reference to the gString object
      Returns: reference to the specified output stream
    

    !! NON-MEMBER METHOD !!
    Insertion operator: Copies the contents of the gString object into the ’cout’ (narrow) standard output stream.

    IMPORTANT NOTE: Access to the narrow output stream is provided for convenience only. It is recommended that the wide stream version (above), if available on your system, be used exclusively.

  • int32_t substr ( char* uTarget, int32_t offset, int32_t charCnt ) const;
  • int32_t substr ( wchar_t* wTarget, int32_t offset, int32_t charCnt ) const;
  • int32_t substr ( gString& wTarget, int32_t offset, int32_t charCnt ) const;
      Input  :
         targ    : (by reference, initial contents ignored)
                   receives null-terminated contents of specified
                   character range
                   -- If target buffer is a char*, then data returned is 
                      a UTF-8 text string.
                   -- If target buffer is a wchar_t*, then data returned is 
                      a wchar_t (wide) text string.
                   -- If target buffer is a gString object, then both 
                      UTF-8 and wchar_t data are returned 
         offset  : character index at which substring begins
                   (this IS NOT a byte index)
         charCnt : number of characters to copy (not incl. NULL terminator)
      Returns:
         if target is a wchar_t* or gString object, then returns number of
           characters written to target (not including the NULL terminator)
         if target is a char*, then returns number of bytes written to
           target (not including the NULL terminator)
    
         Note: returns ZERO if either 'offset' or 'charCnt' out of range
         Note: If 'charCnt' extends beyond the end of the source data, 
               then returns the available data.
    

    Copy the specified character range to target buffer.
    These methods copy the indicated substring (null terminated) to the target buffer, leaving the original data unchanged.

    If you have a fixed-format field, then the offset and character count will be known in advance. Otherwise you can use the ’find()’ method to locate the substring to be copied.

    Important Note: It is the caller’s responsibility to specify a target buffer large enough to hold the data. If size of substring is unknown, be safe: wchar_t wbuff[gs.gschars()]; or char ubuff[gs.utfbytes()];

    Please Note: The number of bytes can NEVER be assumed to be the same as the number of characters.
    Please refer to the ’Multi-language Support’ chapter of the 'NcDialog API' documentation.

Examples

gString gs( "That's not flying, that's falling -- with style!\n"
            "Buzz Lightyear" ) ;
char    utf8Data[gs.gs.utfbytes()] ;
wchar_t wideData[gs.gschars()] ;

gs.copy( utf8Data, gs.utfbytes() ) ;
gs.copy( wideData, gs.gschars() ) ;

gString gstream( "You're a child's TOY! -- Woody" ) ;
wcout << gstream << endl ;

// get a copy of the first word starting with 'c'
gString AusAnimal( "Aardvark Kangaroo Cockatoo Dingo Wombat " ) ;
gString gsc ;
int b = AusAnimal.find( " c" ) ;
if ( b >= 0 )
{
   int e = AusAnimal.find( L' ', b + 1 ) ;
   if ( e > b )
   {
      AusAnimal.substr( gsc, (b + 1), (e - b - 1) ) ;
      wcout << gsc << endl ;
   }
}
 - - -> Cockatoo



Modifying Existing Data

  • int32_t append ( const wchar_t* wPtr ) ;
  • int32_t append ( const char* uPtr ) ;
  • int32_t append ( const wchar_t wChar ) ;
      Input  :
         wPtr  : pointer to array of wchar_t 'wide' text to be appended
                 OR
         uPtr  : pointer to array of char UTF-8 text to be appended
                 OR
         wChar : a single, 'wide' character
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if value returned equals gsALLOCMAX, then 
               some data MAY HAVE BEEN discarded.
    

    Append text to existing gString text data up to a combined length of gsALLOCMAX. Characters in excess of the maximum will not be appended.

    Example

    gString gs( L"Be kind to your manager." ) ;
    gs.limitChars( gs.gschars() - 2 ) ;
    gs.append( L", and other lower forms of life." ) ;
    wcout << gs << endl ;
     - - -> Be kind to your manager, and other lower forms of life.
    

  • int32_t append ( const wchar_t* fmt, const void* arg1, ... ) __attribute__ ((format (gnu_wprintf, 2, 0)));
  • int32_t append ( const char* fmt, const void* arg1, ... ) __attribute__ ((format (gnu_printf, 2, 0)));
      Input  :
         fmt  : a format specification string in the style of sprintf(),
                swprintf() and related formatting C/C++ functions.
         arg1 : pointer to first value to be converted by 'fmt'
         ...  : optional arguments (between ZERO and gsfMAXARGS - 1)
                Each optional argument is a POINTER (address of) the value
                to be formatted.
    
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if return equals gsALLOCMAX, then
               some data MAY HAVE BEEN discarded.
    

    Append formatted text data to existing gString text data up to a combined length of gsALLOCMAX. Characters in excess of the maxmum will not be appended.

    Please refer to the ’compose’ method (see Formatted Assignments) for more information on converting data using a format specification string.

    Example

    short gaddress = 2840 ;
    wchar_t gdirection = L'E' ;
    const wchar_t* gstreet = L"Colorado Blvd." ;
    double gcost = 29.95 ;
    gString gs( "Gorilla Men's Clothing" ) ; // existing text
    
    gs.append( ", %hd %C %S\n  Dress shirts on sale, $%.2lf.", 
               &gaddress, &gdirection, gstreet, &gcost ) ;
    
    wcout << gs << endl ;
     - - -> Gorilla Men's Clothing, 2840 E Colorado Blvd.
              Dress shirts on sale, $29.95.
    

  • int32_t insert ( const wchar_t* wPtr, int32_t offset = 0 ) ;
  • int32_t insert ( const char* uPtr, int32_t offset = 0 ) ;
  • int32_t insert ( wchar_t wChar, int32_t offset = 0 ) ;
      Input  : 
         wPtr  : pointer to array of wchar_t 'wide' text to be inserted
                 OR
         uPtr  : pointer to array of char UTF-8 text to be inserted
                 OR
         wChar : a single wchar_t 'wide' character
         offset: (optional, ZERO by default)
                 character offset at which to insert specified text into
                 existing text.
                 Note: if specified 'offset' > number of characters in
                       existing text, then acts like 'append' method.
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if value returned equals gsALLOCMAX, then 
               some data MAY HAVE BEEN discarded.
    

    Insert text into existing gString text data up to a combined length of gsALLOCMAX. Characters in excess of the maximum will be truncated.

    Example

    gString gs( L"Remember to hurt people!" ) ;
    gs.insert( L"NOT ", 9 ) ;
    wcout << gs << endl ;
     - - -> Remember NOT to hurt people!
    

  • int32_t insert( int32_t offset, const char* fmt, const void* arg1, ... );
  • int32_t insert( int32_t offset, const wchar_t* fmt, const void* arg1, ... );
      Input  :
         offset: character offset at which to insert specified text into
                 existing text.
                 Note: if specified 'offset' > number of characters in
                       existing text, then data will be appended to existing text.
         fmt   : a format specification string in the style of sprintf(),
                 swprintf() and related formatting C/C++ functions.
                 Format spec may be either char data or wchar_t data.
         arg1  : pointer to first value to be converted by 'fmt'
         ...   : optional arguments (between ZERO and gsfMAXARGS - 1)
                 Each optional argument is a POINTER (address of) the value
                 to be formatted.
    
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if value returned equals gsALLOCMAX, then 
               some data MAY HAVE BEEN discarded.
    

    Insert formatted text data into existing text data at the specified offset.

    Please refer to compose() method for more information on converting data using a format-specification string.

    Example

    const char* Month = "April" ;
    short       Date  = 28 ;
    gString gs( L"We have a date on to see the Yankees." ) ;
    int off = gs.find( "to see" ) ; // index insertion point
    gs.insert( off, "%s %hd ", Month, &Date ) ;
    wcout << gs << endl ;
     - - -> We have a date on April 28 to see the Yankees.
    
    gString gs( "drive above 90 kph!" ) ;
    gs.insert( ZERO, "%S", L"Don't " ) ; // insert at head of text
    int off   = gs.gschars() ; // index end-of-text
    int Speed = 55 ;
    gs.insert( off, L" Stay under %02d kph in the city.", &Speed ) ;
     - - -> Don't drive above 90 kph! Stay under 55 kph in the city.
    

  • int32_t limitChars ( int32_t charCount ) ;
      Input  :
         charCount : maximum number of characters allowed in formatted data 
                     (not including NULL) Range: 1 to storage limit.
      Returns:
         number of characters in the adjusted data (including NULL)
    

    Truncate the data to no more than charCount display characters.
    Insert a null terminator after the specified number of characters.

    Example

    gString gs( "This shirt is available in yellow or red." ) ;
    gs.limitChars( 34 ) ;
    gs.append( "only." ) ;
    wcout << gs << endl ;
     - - -> This shirt is available in yellow only.
    

  • int32_t limitCols ( int32_t colCount ) ;
      Input  :
         colCount : maximum number of display columns allowed in formatted data
                    Range: 1 to storage capacity.
      Returns:
         number of columns needed to display the adjusted data
         Note: If specified column count occurs in mid-character, then the 
               partial character will be removed from the string.
    

    Truncate the data to no more than colCount display columns. Insert a null terminator after the number of characters required to fill the specified number of display columns.

    Example

    gString gs( "The manual is located at:\n"
                "http://cdn.funcom.com/aoc/pdf/aoc_manual.pdf" ) ;
    gs.limitCols( 55 ) ;
    wcout << gs << endl ;
     - - -> The manual is located at:
            http://cdn.funcom.com/aoc/pdf/
    
    Note that there are 25 display columns for the first line, (the newline 
    character requires no column), and 30 columns remain on the second line.
    

  • int32_t shiftChars ( int32_t shiftCount, wchar_t padChar = L’ ’ ) ;
      Input  :
         shiftCount: < ZERO: shift data to the left, discarding the
                             specified number of characters from the
                             beginning of the array
                     > ZERO: shift data to the right, padding the vacated
                             positions on the left with 'padChar'
                     ==ZERO: do nothing
         padChar   : (optional, SPACE character, 0x20 by default)
                     when shifting data to the right, use this character
                     to fill the vacated character positions
                     NOTE: Specify a one-column character ONLY as the
                     padding character. (multi-column characters ignored)
      Returns:
         number of characters in adjusted array
    

    Shift text data by the specified number of characters.

    Note for writers of RTL (right-to-left) languages:
     In the above descriptions, the terms 'left' and 'right' are used for 
     convenience, but actually 'left' refers to the head of the data 
     and 'right' refers to the tail.
    

    Example

    gString gs( "Your balance is: 2,521,697.56 USD" ) ;
    gs.shiftChars( -17 ) ;
    wcout << gs << endl ;
     - - -> 2,521,697.56 USD
    gs.shiftChars( 5, L'#' ) ;
    wcout << gs << endl ;
     - - -> #####2,521,697.56 USD
    
    Note: For this example, the optional fill character used for 
    right-shift is L'#'. The default fill character is space (L' ').
    

  • int32_t shiftCols ( int32_t shiftCount, wchar_t padChar = L’ ’ ) ;
      Input  :
         shiftCount: < ZERO: shift data to the left, discarding the
                             number of characters equivalent to the
                             specified number of display columns
                             NOTE: May discard one extra column if count
                             falls within a multi-column character.
                     > ZERO: shift data to the right, padding the vacated
                             positions on the left with 'padChar'
                     ==ZERO: do nothing
         padChar   : (optional, SPACE character, U+0020 by default)
                     when shifting data to the right, use this character
                     to fill the vacated column positions
                     NOTE: Specify a one-column character ONLY as the
                     padding character. (multi-column characters ignored)
      Returns:
         number of display columns in adjusted array
    

    Shift text data by the specified number of display columns.

    Note for writers of RTL (right-to-left) languages:
     In the above descriptions, the terms 'left' and 'right' are used for 
     convenience, but actually 'left' refers to the head of the data 
     and 'right' refers to the tail.
    

    Example

    gString gs( "您的帐户余额是500元" ) ; // "Your account balance is 500 yuan"
    gs.shiftCols( -14 ) ;
    wcout << gs << endl ;
     - - -> 500元
    gs.shiftCols( 5, L'.' ) ;
    wcout << gs << endl ;
     - - -> .....500元
    
    Note: Most Chinese characters are two display columns wide, 
    therefore we shift 14 columns (7 characters) out on the left.
    For this example, the optional fill character used for 
    right-shift is L'.' (U+002E, ASCII full stop). 
    The Chinese full-stop (U+3002) MAY NOT be used as a fill 
    character because it is a two-column character.
    

  • int32_t padCols ( int32_t fieldWidth, wchar_t padChar = L’ ’, bool centered = false );
      Input  :
         fieldWidth: number of columns for the adjusted data array
         padChar   : (optional, ASCII space 0x20 by default)
                     character with which to pad the data
                     Note: Specify a one-column character ONLY.
                           Multi-column characters will be ignored.
         centered  : (optional, 'false' by default)
                     if 'false', all padding will be appended at the end 
                                 of the existing data
                     if 'true',  padding will be equally divided between 
                                 the beginning and end of the existing data
    
      Returns:
         number of display columns in the adjusted array
    
    

    Append padding to the existing string data to achieve the specified number of display columns.

    If the 'centered' flag is set, the padding will be divided equally between the beginning and end of the data to achieve the specified field width. Note that if an odd number of columns is to be added, then the odd column will be placed at the end of the data.

    The ‘padCols’ method calculates the number of padding characters needed to fill out the specified field width.

    The default padding character is the ASCII space character (20 hex).
    Any single-column character may be specified as an alternate padding character.

    Padding will be added until either the specified number of columns is reached, OR until the array contains the maximum number of characters (storage limit).

    If the fieldWidth specified is <= the current width of the data, then the data will not be modified.

    Example

    gString gs( "This is a test." );
    
    gs.padCols( 30 );              ==> "This is a test.               "
    gs.padCols( 30, L'#' );        ==> "This is a test.###############"
    gs.padCols( 30, L'#', true );  ==> "#######This is a test.########"
    
    // To create a right-justified field, use 'shiftCols' (see above):
    gs.shiftCols( (30 - gs.gscols()) );
                                   ==> "               This is a test."
    gs.shiftCols( (30 - gs.gscols()), L'#' );
                                   ==> "###############This is a test."
    

  • int32_t strip ( bool leading = true, bool trailing = true );
      Input  :
         leading  : (optional, 'true' by default)
                    if 'true', strip leading whitespace
                    if 'false', leading whitespace unchanged
         trailing : (optional, 'true' by default)
                    if 'true', strip trailing whitespace
                    if 'false', trailing whitespace unchanged
    
      Returns:
         number of characters in modified string (incl. NULL terminator)
    

    Strip (remove) leading and/or trailing whitespace from the string data.

    Whitespace characters are defined as:
    0x0020 single-column ASCII space
    0x3000 two-column CJK space
    0x0A linefeed character
    0x0D carriage-return character
    0x09 horizontal-tab character
    0x0B vertical-tab character
    0x0C formfeed character

    “Leading” whitespace is space characters from the beginning of the data to the first non-space character. “Trailing” whitespace is from after the last non-space character through the end of the data.

    By default, both leading and trailing whitespace will be removed; however, this action may be modified by resetting the appropriate parameter to ‘false’.


  • int32_t erase ( const gString& src, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t erase ( const wchar_t* src, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t erase ( const char* src, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t erase ( wchar_t src, int32_t offset = 0, bool casesen = false, bool all = false );
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  NOTE: This is a character index, NOT a byte offset.
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
         all    : (optional, 'false' by default)
                  if 'false', then only the first instance of the substring
                              will be deleted
                  if 'true',  then all instances of the specified substring
                              from 'offset' forward will be deleted
    
      Returns:
         index of first character following the deleted sequence
         Note: This is the wchar_t character index, NOT a byte index
         Returns (-1) if:
           a) no matching substring found
           b) 'offset' out-of-range
           c) 'src' is an empty string or a NULL character
    
    

    Scan the data for a matching substring, and if found, erase (delete) the first occurance of the substring, or optionally all instances of the substring from 'offset' onward.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison.


  • int32_t erase ( int32_t offset = 0, int32_t length = gsALLOCMAX );
      Input  :
         offset : (optional, ZERO by default)
                  index of first character of sequence to be erased
                  NOTE: This is a character index, NOT a byte offset.
         length : (optional, gsALLOCMAX by default)
                  if not specified, then erase all characters from
                     'offset' to end of data
                  if specified, then erase the specified number of
                     characters beginning at 'offset'
    
      Returns:
         index of first character following the deleted sequence
         Note: This is the wchar_t character index, NOT a byte index
         Returns (-1) if:
           a) offset < ZERO
           b) offset >= number of characters in data
           c) length <= ZERO
         (data will not be modified)
    
    

    Erase (delete) the data sequence specified by ’offset’ and ’length’.

    ’offset’ is the index of the first character to be deleted, and ’length’ specifies the number of characters to delete.

    Example

    • If the ’length’ parameter specifies deletion of more characters than remain, then ’erase’ has the same effect as calling the 'limitChars' method (i.e. truncates the string at 'offset').
    • If the defaults for both 'offset' and length' are used, then the 'erase' method has the same effect as calling the 'clear' method (gString data are reset to an empty string).
    • Note that the NULL terminator will never be deleted.

    Examples for 'erase'

    For the gString object containing the following verbal 
    exchange, erase all occurances of the word 'crapweasel'.
    
    gString gs( "Ross : Are you familiar with the word crapweasel?\n"
                "Paolo: No, I don't know crapweasel.\n"
                "Ross : You're a huge crapweasel!" );
    int index = ZERO ;
    while ( (index = gs.erase( " crapweasel", index )) >= ZERO ) ;
    
    Yields:
      "Ross : Are you familiar with the word?\n"
      "Paolo: No, I don't know.\n"
      "Ross : You're a huge!"
    
    Find the substring, "the the" and erase the extra "the".
    gString gs1( L"There are seven candidates in the the Primary Election." );
    gString gs2( L"the " );
    int index = gs1.find( "the the" );
    gs1.erase( index, (gs2.gschars() - 1) );
    
    Yields:
      "There are seven candidates in the Primary Election."
    

  • int32_t replace ( const wchar_t* src, const wchar_t* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const wchar_t* src, const char* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const wchar_t* src, const wchar_t newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const char* src, const wchar_t* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const char* src, const char* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const char* src, const wchar_t newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const wchar_t src, const wchar_t* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const wchar_t src, const char* newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
  • int32_t replace ( const wchar_t src, const wchar_t newtxt, int32_t offset = 0, bool casesen = false, bool all = false );
      Input  :
         src    : source data to be matched
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a single, wchar_t character
         newtxt : data to overwrite existing text
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  NOTE: This is a character index, NOT a byte offset.
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
         all    : (optional, 'false' by default)
                  if 'false', then replace only the first occurance found
                  if 'true',  then replace all occurances of the specified
                              substring
    
      Returns:
         'true' if successful
         returns 'false' if error (existing data not modified):
           a) no matching source substring found
           b) 'src' is a empty string or a null character
           c) offset < ZERO or offset is beyond existing data
           d) modifying the data would cause buffer overflow
    

    Replace the specified source substring, or optionally all matching substrings with the provided substring.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison.

    Examples for 'replace'

    Correct the spelling errors in the following data:
    gString gs( "The land near hare is full of Heres, hoping her and ther." ) ;
    
    bool okiday = gs.replace( L"hare", L"here" ) ;
    Yields:
      "The land near here is full of Heres, hoping her and ther."
    
    okiday = gs.replace( "Here", "Hare", ZERO, true ) ;
    Yields:
      "The land near here is full of Hares, hoping her and ther."
    
    okiday = gs.replace( L'p', L"pp" ) ;
    Yields:
      "The land near here is full of Hares, hopping her and ther."
    
    int index = gs.find( "her " ) ;
    okiday = gs.replace( "her", L"here", index, true, true ) ;
    Yields:
      "The land near here is full of Hares, hopping here and there."
    
    Then, replace all spaces ' ' with underscores '_'.
    okiday = gs.replace( L' ', L'_', ZERO, false, true ) ;
    Yields:
      "The_land_near_here_is_full_of_Hares,_hopping_here_and_there."
    

  • int32_t loadChars ( const wchar_t* wsrc, int32_t charLimit, bool append = false );
  • int32_t loadChars ( const char* usrc, int32_t charLimit, bool append = false );

    Load the specified number of characters from the source data. This is useful for extracting data from fixed-width fields when the contents of the field is unknown.
       gs.loadChars( StreetAddress1, 48 );

    By default, the new text REPLACES the existing text; however, the ’append’ parameter allows the new text to be appended to the existing text.

    The functionality is equivalent to the gString constructor which loads only the specified number of characters.
    See gString Instantiation.


      Input  :
         usrc     : pointer to a UTF-8 encoded string
         wsrc     : pointer to a wchar_t encoded string
         charLimit: maximum number of characters (not bytes) from source
                    array to load. Range: 1 through target storage capacity.
                    The count should not include the NULL terminator
         append   : (optional, 'false' by default)
                    if 'false', replace existing text with specified text
                    if 'true', append new text to existing text
    
      Returns:
         number of characters in modified string (incl. NULL terminator)
    

    Examples for 'loadChars'

    Replace the existing text with the specified number of characters 
    from the source text.
    
    const char* Piggy =  // (56 characters)
               "Spider-pig, Spider-pig does whatever a Spider-pig does." ;
    gString gs( "Existing text." ) ;
    gs.loadChars( Piggy, 36 ) ;
    Yields:
      "Spider-pig, Spider-pig does whatever"
    
    Append specified number of characters from the source text to the 
    existing text.
    
    gs.loadChars( " it's told, because it's stupid.", 11, true ) ;
    Yields:
      "Spider-pig, Spider-pig does whatever it's told,"
    

  • int32_t textReverse ( bool punct = false, bool para = false, bool rjust = false );

    Reverse the order of characters in the text string.

    If RTL text data displayed by your application is not formatted as desired, then this method may be used to reverse the character order before writing to the display. This is useful for manipulating both RTL (Right-To-Left) language text and mixed RTL/LTR text.

    The ’para’ parameter is useful for outputting columns of numeric data or when the column labels are in an RTL language (see example below).

    Although modern web browsers (Firefox, Opera, Chromium, etc.) usually handle RTL text correctly, other applications often do not. This is especially true of terminal emulator software.

    See also a discussion of multiple-language support in the NcDialog API.

      Input  :
         punct : (optional, 'false' by default)
                 if 'false', invert entire string
                 if 'true' AND if a punctuation mark is seen at either end
                    of the string, invert everything except the punctuation
                    mark(s). typically one of the following:
                    '.' ',' '?' '!' ';' ':' but see note below.
         para  : (optional, 'false' by default)
                 if 'false', invert data as a single character stream
                 if 'true',  invert data separately for each logical line
                             (line data are separated by newlines ('\n')
         rjust : (optional, 'false' by default)
                 if 'false', do not insert right-justification padding
                 if 'true',  insert padding for each logical line to
                             right-justify the data
                             (used to right-justify LTR output)
                             Note that right-justification is performed
                             ONLY if the 'para' parameter is true.
                             Otherwise, 'rjust' will be ignored.
    
      Returns:
         number of wchar_t characters in gString object
         (if return value >= storage limit, data may have been truncated)
    

    Examples for 'textReverse'

    Note that the const Hebrew strings are written canonically in the source, but are displayed incorrectly in the terminal window by the info reader and by the HTML browser (sorry about that).

    // Hebrew: "Are you ready for lunch?"
    // Sometimes the terminal will inappropriately move the punctuation 
    // to the opposite end of the string.
    // To prevent this, use the 'punct' option.
    const wchar_t* const Lunch = L"?םיירהצ תחוראל ןכומ התא םאה" ;
    
    gString gs( Lunch ) ;
    gs.textReverse() ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT (correct except punctuation) : האם אתה מוכן לארוחת צהריים?
    
    gs = Lunch ;
    gs.textReverse( true ) ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT (correct): ?האם אתה מוכן לארוחת צהריים
    
    // When the 'punct' flag is set, both leading and trailing punctuation 
    // are identified.
    // Questions in Spanish use a leading inverted question mark (U+00BF), 
    // and a trailing ASCII question mark (U+003F).
    // Reverse the internal text but do not reverse the terminal punctuation.
    gs = "¿ozreumla le arap atsil sátsE?" ;
    gs.textReverse( true ) ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT : ¿Estás lista para el almuerzo?
    
     = = = = =
     
    // Reverse multi-line text (paragraph formatting).
    // Ordinary ASCII text is used for this example 
    // to demonstrate the reversal.
    // The example outputs to an NcDialog window referenced by NcDialog *dp
    const char* Vaccine = "Have you received your covid-19 vaccination yet?\n
                          "Protect yourself and others,\n"
                          "get your vaccination today!" ;
    gs = Vaccine ;
    // Write unmodified text as LTR data:
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: Have you received your covid-19 vaccination yet?
            Protect yourself and others,
            get your vaccination today!
    
    // Reverse the data, without punctuation processing or 
    // right-justification. (Note: all parameters are optional, 
    // but are shown here for clarity.)
    gs.textReverse( false, true, false ) ;
    
    // Write reversed text as LTR data:
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            ,srehto dna flesruoy tcetorP
            !yadot noitaniccav ruoy teg
    
    // Write the same data as RTL (note the X origin of 48):
    dp->WriteParagraph ( 1, 48, gs, nc.grR, true, true ) ;
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
                                ,srehto dna flesruoy tcetorP
                                 !yadot noitaniccav ruoy teg
    
    // Reload the source data and reverse it without punctuation processing,
    // but WITH right-justification padding.
    gs = Vaccine ;
    gs.textReverse( false, true, true ) ;
    
    // Write reversed text as LTR data.
    // Note that the padding character is ASCII space ' ',
    // however the '-' character is used here to show the padding position.
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            --------------------,srehto dna flesruoy tcetorP
            ---------------------!yadot noitaniccav ruoy teg
    
    // Write the same data as RTL (note the X origin of 48):
    dp->WriteParagraph ( 1, 48, gs, nc.grR, true, true ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            --------------------,srehto dna flesruoy tcetorP
            ---------------------!yadot noitaniccav ruoy teg
    
    // Combine RTL text with numeric (LTR) data.
    // Create an ASCII numeric time string.
    // Reverse the numeric string.
    // Insert a Hebrew label: "The current time is: "
    const wchar_t *timeLabel = L"השעה הנוכחית היא: " ; //(displayed incorrectly)
    short hours = 12, minutes = 32, seconds = 45 ;
    gs.compose( "%02hd:%02hd:%02hd", &hours, &minutes, &seconds ) ;
    gs.textReverse( false, true, false ) ;
    gs.insert( timeLabel ) ;
    
    // Write the data as RTL (note the X origin of 26):
    dp->WriteParagraph ( 1, 26, gs, nc.grR, true, true ) ;
    
    OUTPUT:
       12:32:45 :השעה הנוכחית היא
    
    

    Technical Note On Punctuation

    Determining which characters are and are not punctuation is locale specific (see the 'ispunct()' C-language function).
    Rather than rely on the locale used by the application calling this method, we test against a list of the most common punctuation characters used in modern languages. If a punctuation character used by your application is not recognized as punctuation, please send us a note including the Unicode codepoint and we will add it to the list.


  • int32_t formatParagraph ( int32_t maxRows, int32_t maxCols, bool trunc = true, bool hypenbrk = false,
                           int32_t *truncIndex = NULL );
      Input  : maxRows  : maximum rows for message (>= 1)
               maxCols  : maximum columns on any message row (>= 4)
               trunc    : (optional, 'true' by default)
                          if 'true',  truncate the data if necessary to ensure
                                      that the data do not extend beyond the
                                      specified target area
                          if 'false', format the entire source text array, even
                                      if doing so requires violating the specified
                                      height of the target area (maxRows).
                                      (see also 'truncIndex' parameter)
               hyphenbrk: (optional, 'false' by default)
                          if 'false', automatic line breaks occur at space ' '
                                      characters only (20h)
                                      Special Case: the following CJK ideographs
                                      are also recognized as whitespace for
                                      purposes of line formatting (see below)
                                       '、' comma U+3001 and
                                       '。' full stop U+3002
                          if 'true',  in addition to the space ' ' characters,
                                      enable line break at:
                                       ASCII hyphen    '-' (2dh)
                                       Unicode &mdash; '—' (2014h)
                                       Unicode &ndash; '–' (2013h)
                                       Unicode &minus; '−' (2212h)
                                       Unicode &shy;       (00ADh) (soft hyphen)
               truncIndex: (optional, null pointer by default)
                           (referenced _only_ when the 'trunc' flag is reset)
                           If specified, points to a variable to receive the
                           index at which the data _would_have_been_ truncted to
                           fit the specified height of the target area if the
                           'trunc' flag had been set.
                           a) A positive value indicates that the data have not
                              been truncated, AND that the data extend beyond
                              the specified target area.
                           b) A negative value indicates that it was not necessary
                              to truncate the data i.e. the data fit entirely
                              within the target area.
    
      Returns: number of text rows in the formatted output
    

    Reformat the text so that when written to the display, it will fit within the specified rectangular area.

    The dimensions of the formatted text is specified as the number of text rows and columns, with a minimum of one(1) row by four(4) columns.



    Technical Description of the Line Break Algorithm

    Formatting is done in three steps:

    1. Remove all newlines from the text.
    2. Perform word wrap to limit the width of each row of the paragraph.
    3. If necessary, truncate the text to limit the height of the paragraph. (but see the 'trunc' and 'truncIndex' parameters)

    Token: A ‘token’ as used here is a series of printing characters delimited by whitespace characters. If line breaks on hyphen-like characters is enabled, those characters are also interpreted as token delimiters.

    1. Line breaks occur after the space character at the end of a token, i.e. a word. This means that the line is broken by placing the newline after the identified space character.
      Note: Currently the ASCII space character (20h) and the two-column CJK space (U+3000) are the only whitespace characters recognized.

      Special Case: CJK text seldom contains any spaces at all within a sentence, or even between sentences. Two-column punctuation is designed to provide a visual spacing effect. For this reason, we have made the design decision to process the following CJK ideographs as whitespace:
                 '、' comma U+3001  and '。' full stop U+3002

      Caution: Tab characters (’\t’) _are not_ recognized as whitespace characters because they are effectively variable width characters. Don’t use them!
      For safety, if Tab characters are present in the text, they are silently removed.

      Optionally, the ASCII hyphen and the Unicode hyphen-like characters can be treated as if they were whitespace characters for purposes of the algorithm. (see the 'hyphenbrk' parameter.) The characters included within this group are:

      NAME HTML TAG GLYPH UNICODE CODEPOINT ASCII hyphen '-' U+002D mDASH &mdash; '—' U+2014 nDASH &ndash; '–' U+2013 minus &minus; '−' U+2212 soft hyphen &shy; U+00AD
    2. When reformatting the data, it is possible that the text will be pushed beyond the specified number of rows. In this case, we have two options:
      a) Truncate the text after the last specified row is filled.
      b) Alert the caller about the number of rows actually required.
      Optionally, we can indicate the index of where we would have truncated the text so that caller can manually truncate the text if desired. (see the 'trunc' and 'truncIndex' parameters)
    3. It is possible that a single token (word) will be longer than the width of the target area. Handling this (unlikely) scenario complicates the line-break algorithm, but could come into play; for instance: filespecs, URLs, or some German words. :-)

      Filespecs and URLs should be parsed using specialized formatting methods.
      Long words can be a challenge during parsing of the data. Our solution is to define “long” tokens as those which are more than half the specified area width ('maxCols').

      This method can optionally break after hyphens, but this may sometimes cause unintended breaks or confusing output. Use the 'hyphenbrk' option wisely.

    4. Notes on automatic hyphenation:
      Technically, hyphens should be placed between syllables, but that would require a full dictionary of the target language.
      “Can open... worms everywhere.” (Thank you, Chandler Bing.)
      • The hyphen used is the the Unicode &ndash; U+2013. This facilitates stripping them from the text if the text is copied and pasted elsewhere.
        Note that ideally we would use the "soft hyphen," Unicode &shy; U+00AD, but unfortunately most non-word procesing applications interpret this as a zero-width character making it invisible under most circumstances.
      • Programmer’s Note: If the current character is the same width as the hyphen, then the hyphen will be at the right edge of the target area. Otherwise there will be a one-column gap at the end of the line.
      • Special case: For multi-column characters, it is assumed that the characters belong to the CJK character groups. (This may not be true, but multi-column characters seldom appear in Romance languages. Therefore, for multi-column characters only, hyphens are not inserted after mid-token line breaks because there is no way of knowing if we are breaking in mid-word or between words unless we have access to dictionaries for those languages. Again, “...worms everywhere.”



Comparisons

  • int32_t compare ( const char* uStr, bool casesen = true, int32_t length = -1, int32_t offset = 0 ) const;
  • int32_t compare ( const uint8_t* uStr, bool casesen = true, int32_t length = -1, int32_t offset = 0 ) const;
  • int32_t compare ( const wchar_t* wStr, bool casesen = true, int32_t length = -1, int32_t offset = 0 ) const;
  • int32_t compare ( const gString& gs, bool casesen = true ) const;
      Input  :
         uStr     : (UTF-8 string) to be compared
           OR
         wStr     : (wchar_t string) to be compared
           OR
         gs       : (by reference) gString object containing
                    data to be compared
         casesen  : (optional, 'true' by default)
                    if 'true' perform case-sensitive comparison
                    if 'false' perform case-insensitive comparison
         length   : (optional, -1 by default. i.e. compare to end)
                    maximum number of characters to compare
         offset   : (optional, ZERO by default)
                    If specified, equals the character offset into the
                    gString character array at which to begin comparison.
    
      Returns: 
         return value uses the rules of the 'wcsncmp' (or 'wcsncasecmp') 
         library function (see string.h):
           ZERO, text data are identical
         > ZERO, first differing char of gString object is numerically larger.
         < ZERO, first differing char of gString object is numerically smaller.
    

    Compares the text content of the gString object with the specified text.

    The comparison is performed against the gString objects’ wchar_t character arrays. The character comparisons are numerical, not locale-based. This includes the relationship between upper-case and lower-case characters.
    For locale-specific comparisons, please see compcoll method below.


  • int32_t compare ( const gString& gs, bool casesen = true ) const ;
      Input  :
         gs       : (by reference) object whose text is to be compared
         casesen  : (optional, 'true' by default)
                    if 'true' perform case-sensitive comparison
                    if 'false' perform case-insensitive comparison
    
      Returns:
         return value uses the rules of the 'wcsncmp' (or 'wcsncasecmp') 
         library function (see string.h):
           ZERO, text data are identical
         > ZERO, first differing char of gString object is numerically larger.
         < ZERO, first differing char of gString object is numerically smaller.
    

    Compares the text content of two gString objects.

    The comparison is performed against the gString objects’ wchar_t character arrays. The character comparisons are numerical, not locale-based. This includes the relationship between upper-case and lower-case characters.
    For locale-specific comparisons, please see the compcoll method below.


  • int32_t compcoll ( const wchar_t* wStr ) const;
  • int32_t compcoll ( const char* uStr ) const;
  • int32_t compcoll ( const gString& bs ) const;
      Input  : wStr     : (UTF-32 string) to be compared
                  OR
               uStr     : (UTF-8 string) to be compared
                  OR
               gs       : (gString object by reference) data to be compared
    
      Returns: using the rules of the 'wcscoll' library function (see wchar.h):
          ZERO, text data are identical
        > ZERO, first differing char of stored data is larger.
        < ZERO, first differing char of stored data is smaller.
    

    Compares the text content of the gString object with the specified text.

    The comparison is performed according to the active "locale" within the application. This comparison is known as the “collation order” or “dictionary order” comparison where lowercase letters are considered as less than their uppercase equivalents. Characters with diacritical markers and unitary character sequence (e.g. Spanish 'll') will also be sorted according to the rules of the active locale.

    1. If the application has set the locale, the rules of that locale will be used for the comparison.
      Note that the comparison is always case sensitive.
    2. If the application has not specified the locale, then the so-called POSIX C/C++ pseudo-locale will be used.
    3. Note that for some languages, Chinese and Japanese for instance, there is no "dictionary" order for the characters. Instead, dictionaries rely on sound markers, e.g. pinyin transliteration, stroke count, or other criteria. The 'compcoll' method is not effective for these languages.
    4. Note that if the source data contain mixed languages, the ’wcscoll’ function will be ineffective and misleading.
    5. Technical Note: The ’wcscoll’ librarary function is also considerably slower than the 'wcscmp' (numeric comparison) function because it scans the data multiple times.

    Please see how to set locale for additional information. Note: Regardless of source data format, this method uses the C-library 'wcscoll' function exclusively to perform the comparison.

    Technical Note:

    In the ASCII world, the letter 'A' is numerically less than the letter 'a'; however, in the world of locale-specific comparison, English lowercase is “less than” uppercase, so 'a' is less than 'A'.

    Examples of string comparison using the collation algorithm:

    'abc' == 'abc' 'abc' < 'ABC' 'ABC' < 'abc' 'abc' < 'aBc' 'abc' < 'abC' 'abc' < 'abcd' 'abcd' > 'abC' 'abcd' < 'abCd' 'abC' < 'abcd' This would suggest that the collation algoorithm is essentially useless unless your project is to create a dictionary.

  • bool operator == ( const gString& gs2 ) const ;
      Input  :
         gs2   : (by reference)
                 gString object containing string to be compared
      Returns:
         'true' if the strings are identical, else 'false'
    

    Comparison operator: Compares the text content of two gString objects.

    The comparison is performed against the wchar_t character arrays of the two objects. The comparison is is case-sensitive.



  • bool operator != ( const gString& gs2 ) const ;
      Input  :
         gs2   : (by reference)
                 gString object containing string to be compared
      Returns:
         'true' if the strings are different, else 'false'
    

    Comparison operator: Compares the text content of two gString objects.

    The comparison is performed against the wchar_t character arrays of the two objects. The comparison is case-sensitive.


  • int32_t find ( const char* src, int32_t offset=0, bool casesen=false, int32_t maxcmp= -1 ) const;
  • int32_t find ( const wchar_t* src, int32_t offset=0, bool casesen=false, int32_t maxcmp= -1 ) const ;
  • int32_t find ( const gString& src, int32_t offset=0, bool casesen=false, int32_t maxcmp= -1 ) const;
  • int32_t find ( const wchar_t src, int32_t offset=0, bool casesen=false ) const;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  -- if out-of-range, then same as if not specified
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
         maxcmp : (optional, (-1) by default)
                  -- if not specified, then scan for a match of all
                     characters in 'src' (not including null terminator)
                  -- if specified, then scan for a match of only the first
                     'maxcmp' characters of 'src'
                  -- if out-of-range, then same as if not specified
      Returns:
         index of matching substring or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data for a matching substring and if found, return the index at which the first substring match begins.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character is not included in the comparison.


  • int32_t findlast ( const char* src, bool casesen=false ) const;
  • int32_t findlast ( const wchar_t* src, bool casesen=false ) const;
  • int32_t findlast ( const gString& src, bool casesen=false ) const;
  • int32_t findlast ( const wchar_t src, bool casesen=false ) const;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
    
      Returns:
         index of last matching substring or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data for the last occurance of the matching substring and if found, return the index at which the substring match occurs.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character is not included in the comparison.


  • int32_t after ( const char* src, int32_t offset=0, bool casesen=false ) const;
  • int32_t after ( const wchar_t* src, int32_t offset=0, bool casesen=false ) const;
  • int32_t after ( const gString& src, int32_t offset=0, bool casesen=false ) const;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a gString object containing the source (by reference)
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  -- if out-of-range, then same as if not specified
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
      Returns:
         index of the character which follows the matching substring
           or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    This method is very similar to the ’find()’ method above, but instead of returning the index to the beginning of the substring, returns the index of the character which FOLLOWS the substring.


  • int32_t findr ( const gString& src, int32_t offset = -1, bool casesen = false ) const;
  • int32_t findr ( const wchar_t *src, int32_t offset = -1, bool casesen = false ) const;
  • int32_t findr ( const char *src, int32_t offset = -1, bool casesen = false ) const;
  • int32_t findr ( const wchar_t& src, int32_t offset = -1, bool casesen = false ) const;
      Input  : src    : source data to be matched, one of the following:
                        -- pointer to a UTF-8 string
                        -- pointer to a wchar_t string
                        -- a gString object containing the source (by reference)
                        -- a single, wchar_t character (by reference)
               offset : (optional, by default: index of null terminator minus one)
                        character index at which to begin search
                        -- if out-of-range, then same as if not specified
               casesen: (optional, 'false' by default)
                        if 'false', then scan IS NOT case sensitive
                        if 'true, then scan IS case sensitive
    
      Returns: index of the first character (closest to head of data) of the 
               matching substring or (-1) if no match found
               Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data beginning at the specified offset and moving toward the head of the string (toward offset zero). Locate the matching substring and if found, return the index of the first match.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison.


  • int32_t findx ( wchar_t srcChar = L' ', int32_t offset=0 ) const;
      Input  :
         srcChar: (optional, L' ' by default) character to be skipped over
         offset : (optional, ZERO by default)
                  if specified, equals the character offset into the 
                  character array at which to begin the scan.
      Returns:
         a) If successful, returns the index of first character which 
            DOES NOT match specified character.
         b) If the scan finds no character which is different from the 
            specified character, OR if 'offset' is out of range, OR if the 
            specified character is the null character, the return value 
            indexes the null terminator of the array.
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the text and locate the first character which DOES NOT match the specified character.

    This method is used primarily to scan past the end of a sequence of space (' ') characters (0x0020), but may be used to skip over any sequence of a single, repeated character. Note that the character specified must be a ‘wide’ (wchar_t) character.

    See also the ’scan’ method below which scans to the first non-whitespace character.


  • int32_t scan ( int32_t offset=0 ) const ;
      Input  :
         offset : (optional, ZERO by default)
                  if specified, equals the character offset into the 
                  character array at which to begin the scan.
      Returns:
         a) If successful, returns the index of first non-whitespace 
            character
         b) If the scan finds no non-whitespace character OR if 'offset' 
            is out of range, the return value indexes the null terminator 
            of the array.
         Note: This is the wchar_t character index, NOT a byte index.
    

    Scans the text array and returns the index of the first non-whitespace character found.

    Whitespace characters are defined as:
    0x0020 single-column ASCII space
    0x3000 two-column CJK space
    0x0A linefeed character
    0x0D carriage-return character
    0x09 horizontal-tab character
    0x0B vertical-tab character
    0x0C formfeed character


Examples

gString str1( "Toys for Tots" ) ;
gString str2( "The Tin Drum" ) ;

// compare with UTF-8 string
int result = str1.compare( "Toys for Tots" ) ;

// compare with wchar_t string
int result = str2.compare( L"The Tin Drum" ) ;

// compare gString objects
if ( str1 == str2 ) { /* do stuff */ }
if ( str1 != str2 ) { /* do stuff */ }


gString gs( L"Tiny Tim had a titillating tour of Times Square." ) ;

// find first instance of substring "tim" (not case sensitive)
int tIndex = gs.find( "tim" ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Tim had a titillating tour of Times Square.

// find the next instance of "tim"
gString gsSub( "tim" ) ;      // search string in a gString object
tIndex = gs.find( gsSub, tIndex + 1 ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Times Square.

// find first instance of substring "ti" (case sensitive)
tIndex = gs.find( L"ti", 0, true ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> titillating tour of Times Square.

// match the first three characters of the search string
tIndex = gs.find( L"squirrel", 0, false, 3 ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Square.

// find first instance of L'R' (not case sensitive)
tIndex = gs.find( L'R' ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> r of Times Square.

// extract the filename from path/filename string
gString gs( "/home/sam/SoftwareDesign/NcDialog/Dialog1/gString.hpp" ) ;
if ( (tIndex = gs.findlast( L'/' )) >= 0 )
   wcout << &gs.gstr()[tIndex + 1] << endl ;
 - - -> gString.hpp

// insert text after first instance of substring
gString gs( "I think that a parrot would be an ideal pet." ) ;
int pIndex = gs.after( L"would" ) ;
gs.insert( " NOT", pIndex ) ;
 - - -> I think that a parrot would NOT be an ideal pet.

For more examples of using gString-class methods, please refer to Test #6 of the ’Dialogw’ test application.




Extract Formatted Data

  • int32_t gscanf ( const wchar_t* fmt, ... ) const ;
  • int32_t gscanf ( const char* fmt, ... ) const ;
      Input  :
         fmt  : a format specification template in the style of swscanf()
                or sscanf() and related C/C++ functions
                Template may be either a const char* OR a const wchar_t*
    
         ...  : optional arguments
                Each optional argument is a POINTER to (address of) the
                variable to receive the formatted data.
                - Important Note: There must be AT LEAST as many optional
                  arguments as the number of format specifiers defined in
                  the formatting template. Excess arguments will be
                  ignored; however, too few arguments will return an
                  error condition. (see below)
    
      Returns: 
         number of items captured and converted
         returns 0 if:
           a) number of format specifications in 'fmt' > gsFormMAXARGS
           b) number of format specifications in 'fmt' > number of
              optional arguments (pointers to target variables) provided
    

    Scan the text data contained in the gString object and extract data according to the specified formatting template.
    (This is an implementation of the standard C-library ’swscanf’ function.)

    The formatting template may be either a 'const wchar_t*' as in a call to 'swscanf', or a 'const char*' as in a call to 'sscanf'.

    The formatting template may contain between zero(0) and gsFormMAXARGS format specifiers.

    The optional arguments are pointers to the target variables which will receive the formatted data.

    As with 'swscanf', the number of optional arguments must be equal to or greater than the number of format specifiers:
          (%d, %63s, %X %lld, %24lls, %[, etc.).
    Excess arguments will be ignored; however, unlike the 'swscanf' function, 'gscanf' counts the number of format specifiers and optional arguments, and if there are too few arguments, the scan will be aborted to avoid an application crash due to memory-access violation. (You’re welcome :-)


  • int32_t gscanf ( int32_t offset, const wchar_t* fmt, ... ) const ;
  • int32_t gscanf ( int32_t offset, const char* fmt, ... ) const ;
      Input  :
         offset: wide-character offset at which to begin the scan
                 Important Note: This IS NOT a byte offset.
                 If offset < 0 || offset > length of data, offset will
                 be silently set to 0.
    
         fmt   : a format specification template in the style of swscanf()
                 or sscanf() and related C/C++ functions
                 Template may be either a const char* OR a const wchar_t*
    
         ...   : optional arguments
                 Each optional argument is a POINTER to (address of) the
                 variable to receive the formatted data.
                 - Important Note: There must be AT LEAST as many optional
                   arguments as the number of format specifiers defined in
                   the formatting template. Excess arguments will be
                   ignored; however, too few arguments will return an
                   error condition. (see below)
    
      Returns: 
         number of items captured and converted
         returns 0 if:
           a) number of format specifications in 'fmt' > gsFormMAXARGS
           b) number of format specifications in 'fmt' > number of
              optional arguments (pointers to target variables) provided
    

    This is the same method as described above except that the scan of the data begins at the specified character offset.


    Examples

    int  sval1, sval2, sval3, sval4 ;
    int  ival ;
    long long int llval ;
    double dval ;
    char   str1[64], str2[16], str3[16], ch1 ;
    gString gs( "17 % 18 21 22 48A2B 720451 24.325 A country song "
                "is three chords and the truth. - Willie Nelson" ) ;
    int32_t cnt = 
       gs.gscanf( L"%d %% %d %d %d %X %lld %lf %63[^-] %c %16s %16s", 
                  &sval1, &sval2, &sval3, &sval4, &ival, &llval, 
                  &dval, str1, &ch1, str2, str3 ) ;
    
    gString gsOut( "items    : %d\n"
                   "numeric  : %d  %d  %d  %d  0x%04X  %lld  %4.6lf\n"
                   "text data: \"%s\" %c %s %s\n",
                   &cnt, &sval1, &sval2, &sval3, &sval4, &ival, &llval, &dval, 
                   str1, &ch1, str2, str3 ) ;
    dp->WriteParagraph ( 1, 1, gsOut, dColor ) ;
    
    This yields:
    items    : 11
    numeric  : 17  18  21  22  0x48A2B  720451  24.325000
    text data: "A country song is three chords and the truth. " - Willie Nelson
    
    A working example may be found in the NcDialog API package, 
    (Dialogw test application, Test #6).
    
     - - - - -
    
    To begin the scan from a particular offset into the text, the unneeded 
    initial data may either be scanned into a throw-away buffer or the 
    scan offset may be specified directly. For example, to discard the 
    first sixteen(16) characters:
    
    wchar_t junk[gsALLOCMIN] ;
    gs = "Useless garbage: 17 21 34" ;
    
    gs.gscanf( L"%16C %d %d %d", junk, &sval1, &sval2, &sval3 ) ;
      _or_
    gs.gscanf ( 16, L"%d %d %d", &sval1, &sval2, &sval3 ) ;
    
    
    Programmer’s Note: It is good practice to specify a maximum length for all scanned strings in order to avoid overrun of the target buffer. THIS: %64s %128lls %256S %32[a-z] NOT THIS: %s %lls %S %[a-z]



Statistical Info

  • int32_t gString::gschars ( void ) const ;
      Input  :
         none
      Returns:
         number of characters in the string
    

    Returns the number of characters in the string including the null terminator.


  • int32_t utfbytes ( void ) const ;
      Input  :
         none
      Returns:
         number of bytes in UTF-8 string
    

    Returns the number of bytes in the UTF-8-encoded string including the null terminator.


  • int32_t gscols ( void ) const ;
     Input  :
        none
     Returns:
        number of columns needed to display the data
    

    Returns the number of columns required to display the string.


  • const short* gscols ( short& charCount ) const ;
      Input  :
         charCount : (by reference, initial value ignored)
                     on return, contains number of characters in 
                     the string including the null terminator.
      Returns:
         pointer to number of columns needed for display of each character
    

    Returns a pointer to an array of column counts, one for each character of the text data. Note that number of array elements equals number of characters (plus meta-characters, if any).

    Example

    This example is from the FileMangler utility. It trims the head of 
    the provided path/filename string to fit within a dialog window.
    
    void FmConfig::ecoPathFit ( gString& gsPath, short colsAvail )
    {
       if ( (gsPath.gscols()) > colsAvail )
       {
          gString gst = gsPath ;
          int32_t width = gst.gscols(),
                  offset = ZERO,
                  charCount ;
          const short* colArray = gst.gscols( charCount ) ;
    
          while ( width > (colsAvail - 3) )
             width -= colArray[offset++] ;
          gsPath.compose( L"...%S", &gst.gstr()[offset] ) ;
       }
    }  //* End ecoPathFit() *
    

  • bool isASCII ( void ) ;
      Input  :
         none
      Returns:
         'true' if data are pure, 7-bit ASCII, else 'false'
    

    Scan the data to determine whether it is pure, 7-bit ASCII.




gString Miscellaneous

  • void gString::clear ( void );
      Input  : none
      Returns: nothing
    

    Reset contents to an empty string i.e. "". The data will consist of a single, NULLCHAR character (L'\0').
    The character and byte counts are set to 1 (one), and the column count is zero.


  • int32_t wAlloc ( void ) const;
  • int32_t uAlloc ( void ) const;
      Input  : none
    
      Returns: size of dynamic storage allocation in wchar_t words
               OR
               size of dynamic storage allocation in uint8_t (char) bytes
    

    Report data storage allocation for the gString object:
    Note that this the maximum UTF-32 (wchar_t) characters,
    (or UTF-8 byte) capacity, NOT the current number of
    characters or bytes stored.
      wAlloc : returns the storage allocation for
               UTF-32 (wchar_t) characters
      uAlloc : returns the storage allocation for UTF-8 bytes


  • int32_t freeSpace ( void ) const;
      Input  : none
    
      Returns: available buffer space
               Note: If zero(0) is returned, data may have been
                     truncated to fit.
    

    Reports the amount of free buffer space available, expressed as a count of UTF-32 (wchar_t) characters.


  • int32_t reAlloc ( int32_t steps, bool clear = false );
      Input  :
         steps : number of steps to increase or decrease in buffer size.
                 Range: +/-999
                 If value out of range, will be set to upper/lower limit.
         clear : (optional, 'false' by default)
                 if 'false', if possible, retain the existing contents
                             of the buffer. (Note that if buffer size is
                             reduced, some data may be truncated.)
                 if 'true',  erase any existing text in the buffer
                             (same as 'clear' method, above)
    
      Returns: new buffer capacity expressed as the number of wchar_t characters
    

    Increase or decrease the size of storage buffer by the specified number of gsALLOCMIN increments.

    The buffer is sized in multiples of gsALLOCMIN in the range:
             gsALLOCMIN <= size <= gsALLOCMAX
             1024 chars            1,024,000 chars


  • const char* Get_gString_Version ( void ) const ;
      Input  :
         none
      Returns:
         pointer to version string
    

    Return a pointer to gString class version number string.


  • void gString::dbMsg ( gString& gsmsg ) const;
      Input  :
         gsmsg : (caller's object, by reference)
                 receives most recent debug message
      Returns:
         nothing
    

    FOR DEBUGGING ONLY! Application can retrieve most recent debug message.
    Note: This method is visible only if the DEBUG_GSTRING flag is set in gString.hpp.


  • void dumpGstring ( const wchar_t*& gstr, const char*& ustr,
                       const short*& cwid, int32_t& gsw,
                       int32_t& gsu, int32_t& bsch, int32_t& bsco,
                       int32_t& utfb, bool& isas ) const;
      Input  :
         gstr   : array of wchar_t characters
         ustr   : array of UTF-8 bytes
         cwid   : array of column widths
         gsw    : size of gstr array
         gsu    : size of ustr array
         bsco   : number of display columns required for stored data
         bsch   : number of wchar_t characters stored (incl. nullchar)
         utfb   : number of UTF-8 bytes stored (incl.nullchar)
         isas   : set if stored data are pure ASCII data
    
      Returns: nothing
    

    FOR DEBUGGING ONLY! Get a snapshot of the private gString data members.
    Note: This method is visible only if the DEBUG_GSTRING flag is set in gString.hpp.




gString Examples

The NcDialog API test application ’Dialogw’ contains extensive examples of gString usage, including working copies of the examples used in this chapter.
The other NcDialog API test applications also use gString in various ways.

Here, we show just a sample of some basic uses for the gString class.

  1. Convert a UTF-8 (8-bit) character string to a wchar_t (32-bit) character string.
    const char* some_UTF-8_data = "I want to buy an hamburger." ;
    wchar_t some_wide_data[gsALLOCDFLT] ;
    
    gString gs( some_UTF-8_data ) ;
    gs.copy( some_wide_data, gsALLOCDFLT ) ;
    
  2. Convert a wchar_t (32-bit) character string to a UTF-8 (8-bit) character string.
    const wchar_t* some_wide_data = L"I want to buy an hamburger." ;
    char some_UTF-8_data[gsALLOCDFLT] ;
    
    gString gs( some_wide_data ) ;
    gs.copy( some_UTF-8_data, gsALLOCDFLT ) ;
    
  3. Concatenate strings.
    const char* Head = "Where" ;
    const wchar_t* Tail = L"is Carmen Sandiego?" ;
    gString gs( L" in the world " ) ;
    gs.insert( Head, ZERO ) ;
    gs.append( Tail ) ;
    wcout << gs << endl ;
     - - ->  Where in the world is Carmen Sandiego?
    
  4. Create formatted string data.
    const char* utf8String = "We present" ;
    const wchar_t* wideString = L"for your enjoyment:" ;
    const char utf8Char = 'W' ;
    const wchar_t wideChar = L'C' ;
    short int ways = 100 ;
    double dish = 17.57 ;
    gString gs ;
    
    gs.compose( "%s %S %hd %cays to %Cook %Chicken,\n"
                "and %.2lf side dishes and beverages!",
                utf8String,
                wideString,
                &ways,
                &utf8Char,
                &wideChar, &wideChar,
                &dish ) ;
    
    wcout << gs << endl ;
     - - ->  We present for your enjoyment: 100 Ways to Cook Chicken,
             and 17.57 side dishes and beverages!
    

    Important Note: All parameters are pointers to the data: For strings (and pointers), the address is the name of the variable. For all other data types, including single characters, use the address-of ('&') operator.

  5. Count display columns to make data fit the window.

    This is a formatting method taken from the ’Dialogx’ test application. It breaks a text stream into lines which fit within the dianostic window. It’s not speedy (or particularly smart), but it demonstrates the use of ’gString’ to calculate the space needed to display text data and then formatting the data to fit the space.

    //*  FormatOutput   *
    //* Input  : gsOut     : semi-formatted source data
    //*          wpos      : start position for display
    //*          lineCount : maximum display lines before truncation
    //* Returns: nothing
    
    void dXClip::FormatOutput ( const gString& gsOut, 
                                winPos& wpos, int32_t lineCount )
    {
       int tWidth = (ddCOLS - 2),        // inside width of target window
           maxWidth = tWidth * lineCount,// max columns for message
           tLines = ZERO ;               // loop counter
    
       gString gsw( gsOut.gstr() ), gso ;
       if ( gsw.gscols() > maxWidth )
       { // truncate the string if necessary
          gsw.limitCols ( maxWidth - 3 ) ;
          gsw.append( L"..." ) ;
       }
       do
       {  // break the source into convenient widths
          gso = gsw ;
          gso.limitCols( tWidth ) ;
          gso.append( L'\n' ) ;
          gsw.shiftCols( -tWidth ) ;
          this->dpd->ClearLine ( wpos.ypos ) ; // clear target display line
          wpos = this->dpd->WriteParagraph ( wpos, gso, 
                                             dxPtr->dColor, true ) ;
       }
       while ( ++tLines <= lineCount && gsw.gschars() > 1 ) ;
    
    }  //* End FormatOutput() *
    

    The example above is just a simple demonstration of using column widths. To see it in action, please refer to the ’Dialogw’ NcDialog API Test Application, test seven (7).

    For a more sophisticated text-parsing algorithm, see formatParagraph which parses text data by token rather than simply counting columns.




gsmeter Test App

            GSMETER             
     gsmeter Introduction      
     gsmeter Options      
     Building gsmeter      

'gsmeter' Introduction

The original design of the gString class was based upon a fixed buffer size of PATH_MAX, which is the maximum length of a file specification under the POSIX standard. This is is defined as s GNU compiler constant, (4096 bytes), which is greater-than-or-equal-to the number of bytes (or characters), depending on the character set used for filespecs.
This buffer size was chosen specifically to balance storage capacity with performance. The original gString design has performed far beyond expectations for over fifteen years, and it is expected to meet or exceed most daily needs of applications into the future.

In October of 2024, a beta-test branch of the gString class, bString was created. The design critera for this class were to retain the original functionality of the gString class with as few modifications as possible, while converting the static character buffers to dynamically-resized character and byte buffers.

This compares favorably with std::string and std::wstring which are cute in their own way and semi-functional; but seriously overweight. While it is not our intention to be fat-shaming the C++ language, std::string support for text data is functionally inconvenient, entirely too English-centric and practically-speaking, is often mis-used (or ignored completely) by developers due to its opaque implementation, which is based on a solid idea, but clearly, was designed by committee.

Because this new design touches every aspect of the original gString functionality, the 'gsmeter' test utility was created to perform exhaustive regression testing, with a special focus on the algorithms for allocation and re-allocation of dynamic memory.

The beta branch has now been re-integrated into the mainline gString class, so the test application is now being included as part of the gString distribution. Have fun!




'gsmeter' Command-line Options

While this application lacks professional polish, it does provide a simple menu structure and the usual --help and --version options.

gsmeter: version:0.0.05  gString v:0.0.37 - (c) 2024-2025 The Software Samurai
------------------------------------------------------------------------------
Test Suite for gString Class  Usage: gsmeter TEST [SUBTEST]
   C[a|b] --------- Constructors  :  S ------- Statistics   
   I[a|b] --------- Initializers  :  O ------- Output group 
   M[a|b|c|d]------ Modifiers     :  B[a|b] -- Buffer size 
   A[a|b|c|d|e|f]-- Analyzers     :  R[a|b] -- Reallocation
   --help  --version              :  P ------- Playground

      Example Invocations:
         gsmeter              # No arguments, invokes the options menu
         gsmeter Ca           # Constructor Group, section 'a'
         gsmeter Ac           # Analyzer Group, section 'c'
         gsmeter --version    # Report version numbers, current locale
                              # and copyright info

The tests are grouped according to functionality, and are sub-divided into smaller groups so that each test run will fit within one terminal window of 39 rows x 150 columns.

There are seven(7) test groups with eighteen(18) tests, plus a “Sandbox” for creating ad-hoc tests.

Test Groups

  1. Constructors
    A: Constructors A, default, specify allocation, char*, wchar_t*, format specification B: Constructors B: integer-formatting constructors
  2. Initializers
    A: Initializers A, and basic reporting methods: wAlloc(), uAlloc(), gstr(), ustr(), gschars(), uftbytes(), gscols(), isASCII(), operator=, loadChars(), formatInt() B: Initializers B: compose(format, ...), compose(binary extension)
  3. Modifiers
    A: append(), limitChars(), limitCols(), insert(), erase() B: replace(), shiftChars(), shiftCols() C: rightJustify(), padCols(), strip(), textReverse() D: formatParagraph(), clear()
  4. Analyzers
    A: operator==, operator!=, compare(gString&), compare() B: compcoll(), find() C: findlast(), findx(), scan() D: after() E: findr() F: gscanf()
  5. Statistics
    Info Dump, Public Stats, Character Widths, Class Version Number
  6. Output group
    "operator<<" for both UTF-32 (wide) and UTF-8 (narrow) streams
  7. Reallocation tests
    A: Reallocation tests, Group A B: Reallocation tests, Group B
  8. Buffer Size/Resize (attempted overflow) tests
    A: Constructors and Assignments Overflow tests B: Insert and Append Overflow tests

Reporting of test results is split between the terminal window and a file (gsmdbg.txt) created in the current working directory.


Note on Application Locale

gsmeter automatically enables the “locale” specified in the terminal environment. While the gString class is written to be locale-independent, some of the gsmeter tests rely upon the locale to function properly.
Specifically, the gString compcoll (compare-collated) method invokes the locale-aware C-language function,'wcscoll'. This function uses a dictonary-style comparison which is dependent on the language and locale involved. Please see compcoll method for details.




Building 'gsmeter'

gsmeter is a very simple application, so building it is also simple.

There are four(4) ways to invoke the Makefile:

gmake # compile the application module and link # to the existing gString library gmake buildlib # delete the existing gString.lib link library # and rebuild it gmake clean # Remove the object files, the gString library # and the executable gmake all # Execute the 'clean', 'buildlib' and # 'gmake' options

Important Note: Some tests include a snapshot of the gString object’s internal configuration. This report requires that a conditional-compile declaration be enabled: #define DEBUG_GSTRING (1) This declaration is located near the top of gString.hpp. This declaration should be enabled only for use by gsmeter, and should be disabled for all other applications. #define DEBUG_GSTRING (0)





Technical Support

Please Note: All trademarks and service marks mentioned in this
document are the entirely-too-proprietary property of their
respective owners, and this author makes no representation of
affiliation with or ownership of any of the damned things.

Contact

The gString class, demonstration apps and all associated Texinfo documentation were written and are maintained by: Mahlon R. Smith, The Software Samurai Beijing University of Technology on the web at: www.SoftwareSam.us For bugs, suggestions, periodic updates, or possible praise, please post a message to the author via website. The author wishes to thank everyone for their intelligent, kind and thoughtful responses. (ranters I can live without)


By the same author

The NcDialog-class link library, the FileMangler file management utility, the AnsiCmd library and other utilities by the same author are also available through the website.

These are primarily C++ applications and function libraries for use by students and by any code warriors who believe that the source of all magical possibilities is at the Linux command line.





Index

Jump to:   0   1  
B   C   D   F   G   L   M   R   S   T  
Index EntrySection

0
07 gString Text ToolgString Text Tool
07.01 Introduction to gStringIntroduction to gString
07.02 gString Public MethodsgString Public Methods
07.03 gString InstantiationgString Instantiation
07.04 Assignment OperatorsAssignment Operators
07.05 Formatted AssignmentsFormatted Assignments
07.06 Integer FormattingInteger Formatting
07.07 Data AccessData Access
07.08 Copying DataCopying Data
07.09 Modifying Existing DataModifying Existing Data
07.10 ComparisonsComparisons
07.11 Extract Formatted DataExtract Formatted Data
07.12 Statistical InfoStatistical Info
07.13 gString MiscellaneousgString Miscellaneous
07.14 gsmeter Test Appgsmeter Test App
07.14 gString ExamplesgString Examples
09 Technical SupportTechnical Support

1
10 Copyright NoticeCopyright Notice
10.01 GNU General Public LicenseGNU General Public License
10.02 GNU Free Documentation LicenseGNU Free Documentation License

B
BiDi textModifying Existing Data

C
contact infoTechnical Support
contact informationTechnical Support

D
dynamic alloc. gStringgString Instantiation

F
fiUnits enumerated typeInteger Formatting
formatInt field overflowInteger Formatting

G
gString DocsTop
gString memory alloc.gString Instantiation
gString methodsgString Public Methods
gString text conversiongString Text Tool

L
locale, setInteger Formatting

M
memory, alloc. gStringgString Instantiation
methods, gStringgString Public Methods

R
ram memory, gStringgString Instantiation
RTL textModifying Existing Data

S
set localeInteger Formatting
supportTechnical Support
swscanf emulationExtract Formatted Data

T
text conversion, gStringgString Text Tool