#include <ccfile.h>
Inheritance diagram for CCLexFile:
Public Member Functions | |
CCLexFile (BOOL ErrorReporting=TRUE, BOOL ExceptionThrowing=FALSE) | |
Initialise a CCLexFile object. | |
~CCLexFile () | |
Deinitialises a CCLexFile object (calls DeinitLexer() for you). | |
BOOL | InitLexer (BOOL IsSeekingRequired=FALSE, BOOL DoIgnoreStringEscapeCodes=FALSE) |
Prepare the file object for performing lexical analysis on the input stream. This allocates the buffers and sets the various character sets to the default values: Whitespace: spaces and tabs Delimiters: none Comment: # String: Delimited by " and " (i.e. like C) Also resets the newline indent output used by PutNewLine() to 0;. | |
void | DeinitLexer () |
Frees up the dynamic objects allocated by CCLexFile::InitLexer. Can be called at any point, and should be called before the next call to InitLexer(), even if the last InitLexer() call has failed. | |
void | SetWhitespace (char *) |
This function changes the set of characters that the lexical analyser treats as white space. Note that it defaults to tabs and spaces, but you can specify any set of characters you want - for example you could even pass in " tf" and it would treat spaces, lower case t's and lower case f's as whitespace characters. Tokens extracted by CCLexFile::GetToken never contain any whitespace characters. | |
void | SetDelimiters (char *) |
Change the set of characters that are treated as 'delimiters'. A delimiter character indicates that a token has finished. A delimiter character *will* be extracted by CCLexFile::GetToken, and returned on its own. For example, for the line "hello; there ();", processed with the set of delimiters ";()", you would get the following tokens out of CCLexFile::GetToken: MonoOn "hello" ";" "there" "(" ")" ";" MonoOff. | |
void | SetCommentMarker (char) |
Change the character used to denote a comment token. Comments start from the comment character until the end of the line. So, for example, to set the lexer up for PostScript style comments, pass in '' as the comment marker; for assembler style comments, pass in ';'. Multi-line comments are not supported. | |
void | SetStringDelimiters (char *) |
Change the characters used to delimit a string. The lexer will handle strings as a special token type (see CCLexFile::GetToken). You can specify which characters delimit the strings - you pass in a two-character string; the first character is used to start the string,and the second is used to terminate it. The two characters can be the same. So, for example, for PostScript strings, use "()", or for C-style strings, use "\"\"". | |
BOOL | GetToken () |
Extract the next lexical token from the input stream of this file. Tokens are separated by whitespace, delimiters, line-breaks, or any combination of the three. Delimiters and line-breaks are returned as tokens (and may be ignored if desired); whitespace is never returned. Call CCLexFile::GetTokenType() to find out what kind of token was extracted. You can examine the token directly by looking at the buffer returned by CCLexFile::GetTokenBuf() (this is a const buffer so you may examine it, but not alter it). This buffer address will not change unless CCLexFile::InitLexer is called again, so it is safe to cache the return value of CCLexFile::GetTokenBuf. | |
BOOL | GetHexToken () |
Extract the next lexical token from the input stream of this file. The token is expected to be a string of hexadecimal digits, and its length should be an even number of characters. If whitespace or a delimiter is encountered then the token is returned ok. If any other character is found which is not a legal hex digit thenFALSE is returned. | |
BOOL | GetSimpleToken () |
A simplified interface onto GetToken(). The caller is not informed of token types TOKEN_COMMENT and TOKEN_EOL. The function returns TRUE if a TOKEN_NORMAL or TOKEN_STRING is read. FALSE is returned if TOKEN_EOF is read, or if FALSE is returned by GetToken(). | |
BOOL | GetLineToken () |
Read in a token in a line-based manner. If the current input position is in the middle of a line, the data up until the end of the line is read. If at the start of a line, the whole line is returned. The token is still examined to work out what it is, but obviously it may not match any proper construct, in which case the type of the token is set to TOKEN_LINE. (NB this is the only time that TOKEN_LINE is used). | |
void | UngetToken () |
Put the current token back onto the input stream - i.e. the next time GetToken() is called it will return the token it returned last time. This is not nestable - you can only put back one token at once. | |
UINT32 | GetLineNumber () |
UINT32 | GetCharOffset () |
virtual INT32 | GetCharsRead () |
LexTokenType | GetTokenType () |
const TCHAR * | GetTokenBuf () |
BOOL | IsLexerInitialised () |
BOOL | PutString (const StringBase &str, UINT32 length=0, char *Sep=" ") |
Outputs the string to the file enclosed in the string delimiting chars. (see CCLexFile::SetStringDelimiters for more details) Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters. | |
BOOL | PutToken (const StringBase &str, UINT32 length=0, char *Sep=" ") |
Outputs the string to the file. Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters. | |
BOOL | PutToken (const TCHAR *buf, char *Sep=" ") |
Outputs the string to the file. Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters. | |
BOOL | PutToken (INT32 n, char *Sep=" ") |
Outputs the number to the file. Writes out 'n', then finishes off by writing 'Sep' out as a bunch of Sep characters. | |
BOOL | PutNewLine () |
Outputs a new line, followed by a number of spaces. The number of spaces output can be changed using IncIndent() and DecIndent(); InitLexer resets the number of indent spaces to 0. | |
void | IncIndent () |
Increases the number of spaces written by PutNewLine() at the start of the next line. It is increased by IndentDelta, which can be altered using SetIndentDelta(). | |
void | DecIndent () |
Decreases the number of spaces written by PutNewLine() at the start of the next line. It is decreased by IndentDelta, which can be altered using SetIndentDelta(). | |
void | SetIndentDelta (UINT32 d) |
void | GetLine () |
Read in the next line from the file - used when extracted tokens with CCLexFile::GetToken(). Handles the current line count, number of chars read count, etc. | |
void | GetCh () |
Gets the next character from the input stream, reading in a new line if necessary. | |
const TCHAR * | GetLineBuf () |
virtual void | SetDontFail (BOOL state) |
BOOL | GetHTMLToken (BOOL fIgnoreEOL=TRUE, BOOL fCorrectCase=TRUE) |
Get the next HTML token from the file. | |
char * | GetHTMLTokenBuffer () |
BOOL | IsHTMLTag () |
BOOL | IsEndOfHTMLFile () |
String_256 | GetHTMLTagName () |
This function reads the name of the tag in the buffer. | |
String_256 | GetHTMLParameterValue (const String_256 &strParameterName, BOOL fCorrectCase=TRUE) |
Finds the value of the specified parameter. | |
Protected Member Functions | |
void | InitHTMLLexer () |
Initialises all the variables used in the HTML parsing code. | |
void | DeleteHTMLBuffer () |
Deletes the HTML buffer and sets all the member variables that refer to it to appropriate defaults. | |
void | AddToHTMLBuffer (char cToAdd) |
Adds cToAdd to the buffer, extending the buffer if necessary. | |
char | PeekNextHTMLChar () |
In theory, this function finds out what the next character that will be read from the file will be. | |
char | ReadNextHTMLChar () |
This function reads the next character from the file. Well, actually, the way it works is slightly more complicated. | |
BOOL | IsWhitespace (char cToTest) |
Tests the character passed to see if it is a white-space character. | |
PCSTR | FindStringWithoutCase (PCSTR strToSearch, PCSTR strToFind) |
Finds strToFind within strToSearch. | |
BOOL | IsDelim () |
Test the current character (as read in by CCLexFile::GetCh) to see if it is a delimiter. | |
BOOL | IsWhitespace () |
Test the current character (as read in by CCLexFile::GetCh) to see if it is a white-space character. | |
BOOL | GetStringToken () |
Extracts a string token from the input stream, as defined by the currently installed string delimiters. | |
Protected Attributes | |
char * | m_pcHTMLBuffer |
INT32 | m_iCharsInHTMLBuffer |
INT32 | m_iLengthOfHTMLBuffer |
BOOL | m_fIsTag |
BOOL | m_fIsCharacterWaiting |
char | m_cWaitingCharacter |
BOOL | m_fEndOfHTMLFile |
BOOL | EOFFound |
BOOL | DontFail |
BOOL | DelimiterFound |
BOOL | TokenIsCached |
String_256 * | LineBuf |
TCHAR * | Buf |
TCHAR * | TokenBuf |
LexTokenType | TokenType |
UINT32 | Line |
UINT32 | Ofs |
char | Ch |
INT32 | CharsRead |
FilePos | LastLinePos |
BOOL | SeekingRequired |
BOOL | IgnoreStringEscapeCodes |
char * | WhitespaceChars |
char * | DelimiterChars |
char | CommentMarker |
char * | StringDelimiters |
INT32 | IndentSpaces |
UINT32 | IndentDelta |
Private Member Functions | |
CC_DECLARE_DYNAMIC (CCLexFile) | |
Private Attributes | |
BOOL | LexerInitialised |
Definition at line 316 of file ccfile.h.
|
Initialise a CCLexFile object.
Definition at line 802 of file ccfile.cpp. 00803 : CCFile(ErrorReporting, ExceptionThrowing), 00804 CharsRead(0) 00805 { 00806 // Ensure that the buffers are NULL so that DeinitLexer() can be called 00807 // safely even if called after a failed InitLexer() call. 00808 LineBuf = NULL; 00809 Buf = NULL; 00810 TokenBuf = NULL; 00811 00812 LexerInitialised = FALSE; 00813 TokenIsCached = FALSE; 00814 00815 SeekingRequired = FALSE; 00816 IgnoreStringEscapeCodes = FALSE; 00817 00818 DontFail = FALSE; 00819 00820 //Graham's HTML parsing variables 00821 m_pcHTMLBuffer=NULL; 00822 }
|
|
Deinitialises a CCLexFile object (calls DeinitLexer() for you).
Definition at line 834 of file ccfile.cpp. 00835 { 00836 DeinitLexer(); 00837 }
|
|
Adds cToAdd to the buffer, extending the buffer if necessary.
Definition at line 1345 of file ccfile.cpp. 01346 { 01347 //First, do we have an HTML buffer? 01348 if( NULL == m_pcHTMLBuffer ) 01349 { 01350 //No. So create one now 01351 m_iLengthOfHTMLBuffer = TokenBufSize; 01352 01353 m_pcHTMLBuffer = (char *)malloc( m_iLengthOfHTMLBuffer ); 01354 01355 m_iCharsInHTMLBuffer = 0; 01356 } 01357 01358 //Now, is there enough room in our HTML buffer to store 01359 //our new character? 01360 if( m_iCharsInHTMLBuffer >= m_iLengthOfHTMLBuffer ) 01361 { 01362 //No. So double the size of the buffer 01363 m_iLengthOfHTMLBuffer *= 2; 01364 01365 m_pcHTMLBuffer = (char *)realloc( m_pcHTMLBuffer, m_iLengthOfHTMLBuffer ); 01366 } 01367 01368 ERROR3IF(m_pcHTMLBuffer==NULL, "AddToHTMLBUffer - realloc error"); 01369 01370 //And finally add our character 01371 m_pcHTMLBuffer[m_iCharsInHTMLBuffer] = cToAdd; 01372 01373 //And increase the number of characters in the buffer 01374 m_iCharsInHTMLBuffer++; 01375 }
|
|
|
|
Decreases the number of spaces written by PutNewLine() at the start of the next line. It is decreased by IndentDelta, which can be altered using SetIndentDelta().
Definition at line 2517 of file ccfile.cpp. 02518 { 02519 IndentSpaces -= IndentDelta; 02520 if (IndentSpaces < 0) IndentSpaces = 0; 02521 }
|
|
Frees up the dynamic objects allocated by CCLexFile::InitLexer. Can be called at any point, and should be called before the next call to InitLexer(), even if the last InitLexer() call has failed.
Definition at line 942 of file ccfile.cpp. 00943 { 00944 // Deallocate our buffers 00945 if (LineBuf != NULL) delete LineBuf; 00946 if (TokenBuf != NULL) delete [] TokenBuf; 00947 00948 LineBuf = NULL; 00949 TokenBuf = NULL; 00950 Buf = NULL; 00951 00952 DeleteHTMLBuffer(); 00953 00954 LexerInitialised = FALSE; 00955 }
|
|
Deletes the HTML buffer and sets all the member variables that refer to it to appropriate defaults.
Definition at line 1317 of file ccfile.cpp. 01318 { 01319 if (m_pcHTMLBuffer) 01320 free( m_pcHTMLBuffer ); 01321 01322 m_pcHTMLBuffer=NULL; 01323 01324 m_iCharsInHTMLBuffer=0; 01325 m_iLengthOfHTMLBuffer=0; 01326 01327 }
|
|
Finds strToFind within strToSearch.
Definition at line 1815 of file ccfile.cpp. 01816 { 01817 //We assume strToSearch and strToFind are valid 01818 01819 //First set up a pointer to search strToSearch with 01820 PCSTR pcThisChar = strToSearch; 01821 01822 //Now, while pcThisChar is valid 01823 while( pcThisChar != NULL && *pcThisChar != '\0' ) 01824 { 01825 //Is strToSearch from pcThisChar onwards the same 01826 //as strToFind, doing a case insensitive comparision? 01827 if( 0 == _strnicmp( pcThisChar, strToFind, strlen( strToFind ) ) ) 01828 { 01829 //Yes. So return TRUE 01830 return pcThisChar; 01831 } 01832 01833 // pcThisChar = camStrinc(pcThisChar); 01834 pcThisChar++; 01835 } 01836 01837 //Otherwise return NULL 01838 return NULL; 01839 }
|
|
Gets the next character from the input stream, reading in a new line if necessary.
Definition at line 2233 of file ccfile.cpp. 02234 { 02235 // Get a character from the stream. 02236 02237 // If end of file, there is no effect 02238 if (EOFFound) 02239 return; 02240 02241 // If at the end of a line, get a new line first 02242 if ((Ch == 0) && (Ofs != 0)) 02243 GetLine(); 02244 02245 // Test again as GetLine() might encounter the end of the file... 02246 // NB the above comment inserted for the hard of thinking (e.g. Justin and Simon) 02247 if (EOFFound) 02248 { 02249 Ch = 0; 02250 return; 02251 } 02252 02253 Ch = Buf[Ofs++]; 02254 }
|
|
Definition at line 358 of file ccfile.h. 00358 { return Ofs; }
|
|
Reimplemented in CCStreamFile. Definition at line 359 of file ccfile.h. 00359 { return CharsRead; }
|
|
Extract the next lexical token from the input stream of this file. The token is expected to be a string of hexadecimal digits, and its length should be an even number of characters. If whitespace or a delimiter is encountered then the token is returned ok. If any other character is found which is not a legal hex digit thenFALSE is returned.
Definition at line 1863 of file ccfile.cpp. 01864 { 01865 // Check to see if the client has put a token back onto the input stream 01866 if (TokenIsCached) 01867 { 01868 // Yes - just return success to the caller and they will use the current token 01869 // again. 01870 // But for hex token we must check that it is a legal hex token... 01871 INT32 i = 0; 01872 while (TokenBuf[i] != 0) 01873 { 01874 if (!camIsxdigit(TokenBuf[i])) 01875 // Not valid hex data 01876 return FALSE; 01877 01878 // Try next character 01879 i++; 01880 } 01881 01882 // Valid hex data - return success. 01883 TokenIsCached = FALSE; 01884 return TRUE; 01885 } 01886 01887 DelimiterFound = FALSE; 01888 01889 // Keep ignoring whitespace/comments etc 01890 for(;;) 01891 { 01892 // No effect if at EOF 01893 if (EOFFound) 01894 { 01895 // Did not find a hex token 01896 TokenType = TOKEN_EOF; 01897 TokenBuf[0] = 0; 01898 return FALSE; 01899 } 01900 01901 // Ignore white space until we get something useful 01902 while (!EOFFound && IsWhitespace()) 01903 GetCh(); 01904 01905 if (EOFFound) 01906 { 01907 // Did not find a hex token 01908 TokenType = TOKEN_EOF; 01909 TokenBuf[0] = 0; 01910 return FALSE; 01911 } 01912 01913 if (Ch == 0) 01914 { 01915 // Skip line breaks 01916 GetCh(); 01917 } 01918 else if (Ch == CommentMarker) 01919 { 01920 // It's a comment - ignore the rest of this line 01921 GetLine(); 01922 GetCh(); 01923 } 01924 else 01925 { 01926 // Ok - is it a hex digit? 01927 if (camIsxdigit(Ch)) 01928 // Yes - fall through to next bit of code to decode it 01929 break; 01930 else 01931 // No - so we can't find a hex string so return an error 01932 return FALSE; 01933 } 01934 } 01935 01936 // Read until not a hexadecimal digit, and analyse result. 01937 // NB. 0-terminator ends a token => tokens cannot be split over a line. 01938 INT32 i = 0; 01939 01940 // This loop is hugely critical (e.g. importing large bitmaps), so we resort to direct 01941 // buffer access and register variables. (sorry!) 01942 register TCHAR *pHexCh = Buf + Ofs - 1; 01943 01944 while (camIsxdigit(*pHexCh)) 01945 { 01946 TokenBuf[i++] = *pHexCh++; 01947 } 01948 01949 // Update the offset variable and Ch variable so we maintain lookahead. 01950 Ofs += i; 01951 Ch = Buf[Ofs-1]; 01952 01953 // Set up token correctly. 01954 TokenBuf[i] = 0; 01955 TokenType = TOKEN_NORMAL; 01956 01957 // End of hex data - is it a legal or illegal end? 01958 if ((i % 2) != 0) 01959 { 01960 // Not an even number of hex digits => illegal 01961 return FALSE; 01962 } 01963 01964 // Inform caller if the hex data was terminated legally 01965 // (i.e. delimiter, whitespace or line-end) 01966 01967 return (IsDelim() || IsWhitespace() || (Ch == 0)); 01968 }
|
|
Finds the value of the specified parameter.
For example, if we search for the value of "ALT" and the buffer contains the text
Then the string returned will be Alt This is done as follows: first we make up a composite string consisting of the strParameterName with an equals sign on the end (e.g. "ALT="). Then we search for that string. If we find it: Then if the character after that string is a ", we return everything up until the next ". Otherwise, if the character after that string is not a ", we return everything up until the next > or whitespace character.
Definition at line 1696 of file ccfile.cpp. 01697 { 01698 //Create an empty string to return 01699 String_256 strToReturn; 01700 01701 //And, if there's less than two characters in the buffer, 01702 //return now 01703 if (m_iCharsInHTMLBuffer<2) 01704 return strToReturn; 01705 01706 //So search for the string we have been told to search for 01707 PCSTR pcFound; 01708 #if 0 != wxUSE_UNICODE 01709 { 01710 size_t cchParam = camWcstombs( NULL, (const TCHAR *)strParameterName, 0 ) + 1; 01711 PSTR pszParam = PSTR( alloca( cchParam ) ); 01712 camWcstombs( pszParam, (const TCHAR *)strParameterName, cchParam ); 01713 pcFound = FindStringWithoutCase( m_pcHTMLBuffer, pszParam ); 01714 } 01715 #else 01716 pcFound = FindStringWithoutCase( m_pcHTMLBuffer, (const TCHAR *)strParameterName ); 01717 #endif 01718 01719 //Have we found something? 01720 if (pcFound) 01721 { 01722 //Yes. So move our found pointer past the string we've found 01723 // pcFound = camStrninc( pcFound, strParameterName.Length() ); 01724 pcFound = pcFound + strParameterName.Length(); 01725 01726 //We're now pointing to the character after the parameter name 01727 01728 //If we're pointing to a whitespace character, move forward 01729 //to the first character that's not whitespace 01730 while (*pcFound!='\0' && IsWhitespace(*pcFound)) 01731 { 01732 // pcFound = camStrinc( pcFound ); 01733 pcFound++; 01734 } 01735 01736 //If that character is not an equals, return an empty string 01737 if (*pcFound!='=') 01738 return strToReturn; 01739 01740 //And, again, move our found pointer to the next character that's not a space 01741 do 01742 { 01743 // pcFound = camStrinc( pcFound ); 01744 pcFound++; 01745 } 01746 while (*pcFound!='\0' && IsWhitespace(*pcFound)); 01747 01748 //Note that there is no way that pcFound can 01749 //have been advanced past the end of the buffer. 01750 //It may, possibly, have been advanced to the NULL 01751 //character at the end of the buffer, and I'll handle that... 01752 01753 //Now, this variable will tell us whether the parameter value 01754 //is in quotes. Set to FALSE by default. 01755 BOOL fInQuotes=FALSE; 01756 01757 //And is the parameter value in quotes? 01758 if (*pcFound=='\"') 01759 { 01760 //Yes. So make a note of that 01761 fInQuotes=TRUE; 01762 01763 //And move on to the character after the quotes 01764 // pcFound=camStrinc(pcFound); 01765 pcFound++; 01766 } 01767 01768 //Now, copy everything from the character we are pointing at 01769 //until... 01770 //IF the parameter value is in quotes, until the next " 01771 //OTHERWISE, until the next > or whitespace character 01772 while (*pcFound!='\0' && 01773 ((fInQuotes && *pcFound!='\"') 01774 || (!fInQuotes && (!IsWhitespace(*pcFound) && *pcFound!='>')))) 01775 { 01776 strToReturn+=*pcFound; 01777 // pcFound=camStrinc(pcFound); 01778 pcFound++; 01779 } 01780 } 01781 01782 //And correct the case if necessary 01783 if (fCorrectCase) 01784 strToReturn.toUpper(); 01785 01786 //And return the string 01787 //Note that if nothing was found, this string will be empty 01788 return strToReturn; 01789 }
|
|
This function reads the name of the tag in the buffer.
Also, if the token in the buffer is not a tag, this function returns FALSE.
Definition at line 1604 of file ccfile.cpp. 01605 { 01606 //Create an empty string to return 01607 String_256 strToReturn = ""; 01608 01609 //If this isn't a tag, return that empty string 01610 if (!IsHTMLTag()) 01611 return strToReturn; 01612 01613 //And, if there's less than two characters in the buffer, 01614 //return now 01615 if (m_iCharsInHTMLBuffer<2) 01616 return strToReturn; 01617 01618 //Otherwise, start copying from the buffer to the string 01619 //Start with the second character (remember that the second 01620 //character is m_pcHTMLBuffer[1]) and finish 01621 //when either we find a whitespace character or we go 01622 //past the end of the string 01623 INT32 i; 01624 for( i = 1; 01625 ( i < 255 && 01626 i < ( m_iCharsInHTMLBuffer - 1 ) && 01627 !IsWhitespace( m_pcHTMLBuffer[i] ) && 01628 m_pcHTMLBuffer[i] != _T('>') ); 01629 i++ ) 01630 { 01631 strToReturn+=m_pcHTMLBuffer[i]; 01632 } 01633 01634 // test for overflow 01635 if (i == 255) 01636 return ""; 01637 01638 //Correct the string to upper case 01639 strToReturn.toUpper(); 01640 01641 //And return our string 01642 return strToReturn; 01643 01644 }
|
|
Get the next HTML token from the file.
If the next character read is not a <, then the token will contain all the characters until the next <. So the token you get back from this function will either be an HTML tag:
Or some text: This is some text Note that we read into a variable length buffer. This is because HTML tokens could be any length at all. So, as soon as you have called GetHTMLToken, you should call GetHTMLTokenBuffer to get a pointer to the string that has just been read.
Definition at line 1427 of file ccfile.cpp. 01428 { 01429 //First delete our buffer 01430 DeleteHTMLBuffer(); 01431 01432 //Now, are we reading a tag or some text? 01433 //It depends on whether the first character we read is a < 01434 m_fIsTag=(PeekNextHTMLChar()=='<'); 01435 01436 //Create a two-character buffer to read characters into 01437 //(Why is it a buffer, not a single character? Because the 01438 //function _tcsupr needs to be passed a null-terminated 01439 //string rather than a single character) 01440 TCHAR pcReadCharacter[2]; 01441 01442 01443 //But when does the HTML token end? If the token is a tag, it ends after the 01444 //next > is read. If the token is text, it ends just *before* the next < 01445 //is read. This condition is contained in the following while statement... 01446 while (!eof() 01447 && !(m_fIsTag && pcReadCharacter[0]=='>') 01448 && !(!m_fIsTag && PeekNextHTMLChar()=='<')) 01449 { 01450 //Read in a character 01451 pcReadCharacter[0]=ReadNextHTMLChar(); 01452 01453 //And null terminate the string 01454 pcReadCharacter[1]='\0'; 01455 01456 //Correct the string to uppercase if necessary 01457 if( fCorrectCase ) 01458 CharUpper( pcReadCharacter ); 01459 01460 //If we just reached the end of the file, break 01461 if (eof()) 01462 break; 01463 01464 //Now, are we ignoring end of line characters? 01465 //And is the character we've read an end of line character? 01466 if (!(fIgnoreEOL && (pcReadCharacter[0]=='\r' || pcReadCharacter[0]=='\n'))) 01467 { 01468 //No. So add the character to the buffer 01469 AddToHTMLBuffer(pcReadCharacter[0]); 01470 } 01471 } 01472 01473 //If we reached the end of the file 01474 if (eof()) 01475 { 01476 //Then set our member flag 01477 m_fEndOfHTMLFile=TRUE; 01478 } 01479 01480 //And NULL terminate the buffer 01481 AddToHTMLBuffer('\0'); 01482 01483 //And return TRUE 01484 return TRUE; 01485 }
|
|
Definition at line 400 of file ccfile.h. 00401 { 00402 return m_pcHTMLBuffer; 00403 }
|
|
Read in the next line from the file - used when extracted tokens with CCLexFile::GetToken(). Handles the current line count, number of chars read count, etc.
Definition at line 2168 of file ccfile.cpp. 02169 { 02170 // End of file? 02171 if (eof()) 02172 { 02173 EOFFound = TRUE; 02174 return; 02175 } 02176 02177 if(SeekingRequired) 02178 { 02179 // It would be much better to do it via variables rather than doing a tellin() 02180 //LastLinePos += CharsRead; //tellIn(); 02181 //FilePos StartLastLinePos = LastLinePos; 02182 02183 /* static FilePos LastLinePosTmp = 0; 02184 static BOOL LastLinePosBool = FALSE; 02185 02186 if(LastLinePosTmp == 0) 02187 { 02188 LastLinePosTmp = LastLinePos + Ofs - 1 + 2; 02189 LastLinePosBool = FALSE; 02190 } 02191 else 02192 { 02193 LastLinePosTmp += Ofs + 1 + 2; 02194 if(LastLinePosBool == FALSE) 02195 { 02196 LastLinePosTmp -= 2; 02197 LastLinePosBool = TRUE; 02198 } 02199 } 02200 02201 LastLinePos = LastLinePosTmp;*/ 02202 02203 //LastLinePos += Ofs + 1; 02204 02205 // Store position of start of line 02206 LexerInitialised = FALSE; 02207 LastLinePos = tellIn(); 02208 LexerInitialised = TRUE; 02209 } 02210 else 02211 { 02212 LastLinePos = 0; 02213 } 02214 02215 // Get a new line from the file. 02216 read( LineBuf ); 02217 Line++; 02218 Ofs = 0; 02219 }
|
|
Definition at line 388 of file ccfile.h. 00388 { return Buf; }
|
|
Definition at line 357 of file ccfile.h. 00357 { return Line; }
|
|
Read in a token in a line-based manner. If the current input position is in the middle of a line, the data up until the end of the line is read. If at the start of a line, the whole line is returned. The token is still examined to work out what it is, but obviously it may not match any proper construct, in which case the type of the token is set to TOKEN_LINE. (NB this is the only time that TOKEN_LINE is used).
Definition at line 2008 of file ccfile.cpp. 02009 { 02010 // Check to see if the client has put a token back onto the input stream 02011 if (TokenIsCached) 02012 { 02013 TokenIsCached = FALSE; 02014 02015 if (Ofs == 1) 02016 { 02017 // We're at the start of the line, so the last token must have been the 02018 // wholeof the rest of the line, so just return it. 02019 return TRUE; 02020 } 02021 02022 // Otherwise, we should append the rest of the current line to the token, 02023 // and work out the token type again. 02024 camStrcpy( TokenBuf + camStrlen( TokenBuf ), Buf + Ofs - 1 ); 02025 02026 if (TokenBuf[0] == CommentMarker) 02027 { 02028 TokenType = TOKEN_COMMENT; 02029 } 02030 else 02031 { 02032 TokenType = TOKEN_LINE; 02033 } 02034 02035 GetLine(); 02036 GetCh(); 02037 02038 // We have a token 02039 return TRUE; 02040 } 02041 02042 // No effect if at EOF 02043 if (EOFFound) 02044 { 02045 TokenType = TOKEN_EOF; 02046 TokenBuf[0] = 0; 02047 return TRUE; 02048 } 02049 02050 // Ignore white space until we get something useful 02051 while (!EOFFound && IsWhitespace()) 02052 GetCh(); 02053 02054 if (EOFFound) 02055 { 02056 TokenType = TOKEN_EOF; 02057 TokenBuf[0] = 0; 02058 return TRUE; 02059 } 02060 02061 // Check for line breaks 02062 if (Ch == 0) 02063 { 02064 GetCh(); 02065 TokenType = TOKEN_EOL; 02066 return TRUE; 02067 } 02068 02069 // Otherwise, just get the rest of the line... 02070 camStrcpy( TokenBuf, Buf + Ofs - 1 ); 02071 02072 GetLine(); 02073 GetCh(); 02074 02075 if (TokenBuf[0] == CommentMarker) 02076 { 02077 TokenType = TOKEN_COMMENT; 02078 } 02079 else 02080 { 02081 TokenType = TOKEN_LINE; 02082 } 02083 02084 // Token extracted ok 02085 return TRUE; 02086 }
|
|
A simplified interface onto GetToken(). The caller is not informed of token types TOKEN_COMMENT and TOKEN_EOL. The function returns TRUE if a TOKEN_NORMAL or TOKEN_STRING is read. FALSE is returned if TOKEN_EOF is read, or if FALSE is returned by GetToken().
Definition at line 1073 of file ccfile.cpp. 01074 { 01075 BOOL ok; 01076 LexTokenType TokenType; 01077 01078 do 01079 { 01080 do 01081 { 01082 ok = GetToken(); 01083 } while (DelimiterFound); 01084 01085 TokenType = GetTokenType(); 01086 } while (ok && (TokenType != TOKEN_NORMAL) && 01087 (TokenType != TOKEN_STRING) && 01088 (TokenType != TOKEN_EOF)); 01089 01090 return (ok && (TokenType != TOKEN_EOF)); 01091 }
|
|
Extracts a string token from the input stream, as defined by the currently installed string delimiters.
Definition at line 2102 of file ccfile.cpp. 02103 { 02104 INT32 i = 0; 02105 02106 TokenType = TOKEN_STRING; 02107 02108 // Extract the string. 02109 // The first time through the loop causes the 'open string' character character to be 02110 // discarded. 02111 do 02112 { 02113 GetCh(); 02114 TokenBuf[i++] = Ch; 02115 if (!IgnoreStringEscapeCodes) 02116 { 02117 // We are NOT ignoring '\' characters, so have we found one? 02118 if (Ch == '\\') 02119 { 02120 // Backslash character - ignore it and insert the next character in its place. 02121 GetCh(); 02122 if(Ch == 'r') 02123 { 02124 TokenBuf[i-1] = '\r'; 02125 Ch = 32; 02126 } 02127 else 02128 { 02129 TokenBuf[i-1] = Ch; 02130 02131 // Set Ch to 32, in case this is a string delimiter - it's been escaped (by 02132 // a backslash) so we don't want it to terminate the loop. 02133 // (Space (ASCII 32) is a 'safe' value - 0 is out as it causes GetCh() to 02134 // read in the next line. 02135 Ch = 32; 02136 } 02137 } 02138 } 02139 02140 // If string is too long to fit into our buffer, this is classed as an error 02141 ERRORIF(i >= TokenBufSize, _R(IDT_LEX_STRINGTOOLONG), FALSE); 02142 02143 } while (!EOFFound && (Ch != StringDelimiters[1])); 02144 02145 // Null terminate the token string 02146 TokenBuf[i-1] = 0; 02147 02148 // Discard the 'close string' character 02149 GetCh(); 02150 02151 // String extracted ok 02152 return TRUE; 02153 }
|
|
Extract the next lexical token from the input stream of this file. Tokens are separated by whitespace, delimiters, line-breaks, or any combination of the three. Delimiters and line-breaks are returned as tokens (and may be ignored if desired); whitespace is never returned. Call CCLexFile::GetTokenType() to find out what kind of token was extracted. You can examine the token directly by looking at the buffer returned by CCLexFile::GetTokenBuf() (this is a const buffer so you may examine it, but not alter it). This buffer address will not change unless CCLexFile::InitLexer is called again, so it is safe to cache the return value of CCLexFile::GetTokenBuf.
Definition at line 1124 of file ccfile.cpp. 01125 { 01126 // Check to see if the client has put a token back onto the input stream 01127 if (TokenIsCached) 01128 { 01129 // Yes - just return success to the caller and they will use the current token 01130 // again. 01131 TokenIsCached = FALSE; 01132 return TRUE; 01133 } 01134 01135 DelimiterFound = FALSE; 01136 01137 // No effect if at EOF 01138 if (EOFFound) 01139 { 01140 TokenType = TOKEN_EOF; 01141 TokenBuf[0] = 0; 01142 return TRUE; 01143 } 01144 01145 // Ignore white space until we get something useful 01146 while (!EOFFound && IsWhitespace()) 01147 GetCh(); 01148 01149 if (EOFFound) 01150 { 01151 TokenType = TOKEN_EOF; 01152 TokenBuf[0] = 0; 01153 return TRUE; 01154 } 01155 01156 // Check for line breaks 01157 if (Ch == 0) 01158 { 01159 GetCh(); 01160 TokenType = TOKEN_EOL; 01161 return TRUE; 01162 } 01163 else if (Ch == StringDelimiters[0]) 01164 { 01165 // It's a string 01166 return GetStringToken(); 01167 } 01168 else if (Ch == CommentMarker) 01169 { 01170 // It's a comment - ignore the rest of this line 01171 camStrcpy( TokenBuf, Buf + Ofs - 1 ); 01172 TokenType = TOKEN_COMMENT; 01173 01174 GetLine(); 01175 GetCh(); 01176 } 01177 // Extract the next token 01178 else 01179 { 01180 // Read until delimiter or white-space, and analyse result. 01181 // NB. 0-terminator ends a token => tokens cannot be split over a line. 01182 INT32 i = 0; 01183 01184 if (IsDelim()) 01185 { 01186 // Found a delimiter token - just return it 01187 TokenBuf[i++] = Ch; 01188 GetCh(); 01189 DelimiterFound = TRUE; 01190 } 01191 else 01192 { 01193 do 01194 { 01195 TokenBuf[i++] = Ch; 01196 GetCh(); 01197 } while (!EOFFound && !IsDelim() && !IsWhitespace() && (Ch != 0)); 01198 } 01199 01200 // Terminate the token 01201 TokenBuf[i] = 0; 01202 TokenType = TOKEN_NORMAL; 01203 } 01204 01205 // Token extracted ok 01206 return TRUE; 01207 }
|
|
Definition at line 361 of file ccfile.h. 00361 { return TokenBuf; }
|
|
Definition at line 360 of file ccfile.h. 00360 { return TokenType; }
|
|
Increases the number of spaces written by PutNewLine() at the start of the next line. It is increased by IndentDelta, which can be altered using SetIndentDelta().
Definition at line 2495 of file ccfile.cpp. 02496 { 02497 IndentSpaces += IndentDelta; 02498 }
|
|
Initialises all the variables used in the HTML parsing code.
Definition at line 1284 of file ccfile.cpp. 01285 { 01286 //Initialise the "character waiting" variables 01287 m_cWaitingCharacter=0; 01288 m_fIsCharacterWaiting=FALSE; 01289 01290 //And delete the buffer 01291 DeleteHTMLBuffer(); 01292 01293 //And say that we haven't reached the end of the file 01294 m_fEndOfHTMLFile=FALSE; 01295 01296 //And say that the current token is not a tag 01297 m_fIsTag=FALSE; 01298 01299 }
|
|
Prepare the file object for performing lexical analysis on the input stream. This allocates the buffers and sets the various character sets to the default values: Whitespace: spaces and tabs Delimiters: none Comment: # String: Delimited by " and " (i.e. like C) Also resets the newline indent output used by PutNewLine() to 0;.
Definition at line 873 of file ccfile.cpp. 00874 { 00875 // If we've already been here before, get out of town. 00876 if (LexerInitialised) return TRUE; 00877 00878 // Allocate buffer for reading in each line 00879 LineBuf = new String_256; 00880 if (LineBuf == NULL) 00881 return FALSE; 00882 00883 // Allocate space for storing tokens while decoding a line 00884 TokenBuf = new TCHAR[TokenBufSize]; 00885 if (TokenBuf == NULL) 00886 { 00887 delete LineBuf; 00888 LineBuf = NULL; 00889 return FALSE; 00890 } 00891 00892 // Get ready to parse tokens. 00893 TokenBuf[0] = 0; 00894 Ofs = 0; 00895 Ch = 0; 00896 Line = 0; 00897 00898 // Make buffer pointer point to line buffer. 00899 Buf = (TCHAR *) (*LineBuf); 00900 00901 CharsRead = 0; 00902 LastLinePos = (FilePos) 0; 00903 EOFFound = FALSE; 00904 00905 // Set character classes to reasonable defaults. 00906 WhitespaceChars = " \t"; 00907 DelimiterChars = NULL; 00908 CommentMarker = '#'; 00909 StringDelimiters = "\"\""; 00910 00911 // reset the current indent count and delta value 00912 IndentSpaces = 0; 00913 IndentDelta = 4; 00914 00915 // If true then seeking and tellin lex files will work - slows things down a bit 00916 SeekingRequired = IsSeekingRequired; 00917 00918 // Ignore '\' characters in string tokens? 00919 IgnoreStringEscapeCodes = DoIgnoreStringEscapeCodes; 00920 00921 InitHTMLLexer(); 00922 00923 LexerInitialised = TRUE; 00924 00925 return TRUE; 00926 }
|
|
Test the current character (as read in by CCLexFile::GetCh) to see if it is a delimiter.
Definition at line 2270 of file ccfile.cpp. 02271 { 02272 if (DelimiterChars == NULL) 02273 return FALSE; 02274 02275 return( strchr( DelimiterChars, Ch ) != NULL); 02276 }
|
|
Definition at line 410 of file ccfile.h. 00411 { 00412 return m_fEndOfHTMLFile; 00413 }
|
|
Definition at line 405 of file ccfile.h. 00406 { 00407 return m_fIsTag; 00408 }
|
|
Definition at line 362 of file ccfile.h. 00362 { return LexerInitialised; }
|
|
Test the current character (as read in by CCLexFile::GetCh) to see if it is a white-space character.
Definition at line 2292 of file ccfile.cpp. 02293 { 02294 // 0-terminators are not whitespace 02295 if (Ch == 0) 02296 return FALSE; 02297 02298 return( strchr( WhitespaceChars, Ch ) != NULL ); 02299 }
|
|
Tests the character passed to see if it is a white-space character.
Definition at line 2316 of file ccfile.cpp. 02317 { 02318 // 0-terminators are not whitespace 02319 if (cToTest == 0) 02320 return FALSE; 02321 02322 return( strchr( WhitespaceChars, cToTest ) != NULL ); 02323 }
|
|
In theory, this function finds out what the next character that will be read from the file will be.
So this is what we do. We read the character, but we put it into our member variable m_cWaitingCharacter. We also set our member flag m_fIsCharacterWaiting to TRUE. Then the function ReadNextHTMLCharacter will know there is a character waiting to be 'read' in m_cWaitingCharacter, and will return that character instead of actually reading one from the file. Not perfect but it works.
Definition at line 1516 of file ccfile.cpp. 01517 { 01518 //Is there a character waiting in m_cWaitingCharacter? 01519 if (!m_fIsCharacterWaiting) 01520 { 01521 //No. So read the next character into our "Waiting Character" buffer 01522 //We use "get" because "read" will throw an exception 01523 //if you try to read past the end of the file. 01524 get(m_cWaitingCharacter); 01525 01526 //And remember there's a character waiting 01527 m_fIsCharacterWaiting=TRUE; 01528 } 01529 01530 //Now return the waiting character 01531 return m_cWaitingCharacter; 01532 01533 }
|
|
Outputs a new line, followed by a number of spaces. The number of spaces output can be changed using IncIndent() and DecIndent(); InitLexer resets the number of indent spaces to 0.
Definition at line 2469 of file ccfile.cpp. 02470 { 02471 write("\n"); 02472 for (INT32 n=0; n < IndentSpaces;n++) 02473 write(" "); 02474 02475 return (good()); 02476 }
|
|
Outputs the string to the file enclosed in the string delimiting chars. (see CCLexFile::SetStringDelimiters for more details) Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters.
Definition at line 2347 of file ccfile.cpp. 02348 { 02349 write( StringDelimiters[0] ); 02350 write( str, length ); 02351 write( StringDelimiters[1] ); 02352 if( strlen( pszSep ) > 0 ) 02353 write( pszSep, (UINT32)strlen( pszSep ) ); 02354 02355 return (good()); 02356 }
|
|
Outputs the number to the file. Writes out 'n', then finishes off by writing 'Sep' out as a bunch of Sep characters.
Definition at line 2439 of file ccfile.cpp. 02440 { 02441 char buf[256]; 02442 _snprintf( buf, 256, "%d", n ); 02443 02444 UINT32 length = strlen( buf ); 02445 write( buf, length ); 02446 if( strlen( Sep ) > 0 ) 02447 write( Sep, (UINT32)strlen(Sep) ); 02448 02449 return( good() ); 02450 }
|
|
Outputs the string to the file. Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters.
Definition at line 2404 of file ccfile.cpp. 02405 { 02406 #if 0 != wxUSE_UNICODE 02407 size_t cch = camWcstombs( NULL, (const TCHAR *)buf, 0 ) + 1; 02408 PSTR psz = PSTR( alloca( cch ) ); 02409 camWcstombs( psz, (const TCHAR *)buf, cch ); 02410 write( psz, cch ); 02411 #else 02412 UINT32 length = camStrlen( buf ); 02413 write( buf, length ); 02414 #endif 02415 if( strlen( Sep ) > 0 ) 02416 write( Sep, (UINT32)strlen( Sep ) ); 02417 02418 return (good()); 02419 }
|
|
Outputs the string to the file. Writes out 'length' chars of str, then finishes off by writing 'Sep' out as a bunch of Sep characters.
Definition at line 2377 of file ccfile.cpp. 02378 { 02379 write( str, length ); 02380 if( strlen( pszSep ) > 0 ) 02381 write( pszSep, (UINT32)strlen( pszSep ) ); 02382 02383 return( good() ); 02384 }
|
|
This function reads the next character from the file. Well, actually, the way it works is slightly more complicated.
See PeekNextHTMLChar for a better explanation.
Definition at line 1557 of file ccfile.cpp. 01558 { 01559 //Do we have a character waiting? 01560 if (m_fIsCharacterWaiting) 01561 { 01562 //Yes. So remember we no longer have a character waiting 01563 m_fIsCharacterWaiting=FALSE; 01564 01565 //And return our waiting character 01566 return m_cWaitingCharacter; 01567 01568 } 01569 else 01570 { 01571 //No. So simply read a new character from the file 01572 //We use "get" because "read" will throw an exception 01573 //if you try to read past the end of the file. 01574 char cToRead; 01575 get( cToRead ); 01576 01577 return cToRead; 01578 } 01579 }
|
|
Change the character used to denote a comment token. Comments start from the comment character until the end of the line. So, for example, to set the lexer up for PostScript style comments, pass in '' as the comment marker; for assembler style comments, pass in ';'. Multi-line comments are not supported.
Definition at line 1027 of file ccfile.cpp. 01028 { 01029 CommentMarker = ch; 01030 }
|
|
Change the set of characters that are treated as 'delimiters'. A delimiter character indicates that a token has finished. A delimiter character *will* be extracted by CCLexFile::GetToken, and returned on its own. For example, for the line "hello; there ();", processed with the set of delimiters ";()", you would get the following tokens out of CCLexFile::GetToken: MonoOn "hello" ";" "there" "(" ")" ";" MonoOff.
Definition at line 1006 of file ccfile.cpp. 01007 { 01008 DelimiterChars = Str; 01009 }
|
|
Reimplemented in CCAsynchDiskFile. Definition at line 393 of file ccfile.h. 00393 { DontFail=state;};
|
|
Definition at line 375 of file ccfile.h. 00375 { IndentDelta = d; }
|
|
Change the characters used to delimit a string. The lexer will handle strings as a special token type (see CCLexFile::GetToken). You can specify which characters delimit the strings - you pass in a two-character string; the first character is used to start the string,and the second is used to terminate it. The two characters can be the same. So, for example, for PostScript strings, use "()", or for C-style strings, use "\"\"".
Definition at line 1051 of file ccfile.cpp. 01052 { 01053 StringDelimiters = Str; 01054 }
|
|
This function changes the set of characters that the lexical analyser treats as white space. Note that it defaults to tabs and spaces, but you can specify any set of characters you want - for example you could even pass in " tf" and it would treat spaces, lower case t's and lower case f's as whitespace characters. Tokens extracted by CCLexFile::GetToken never contain any whitespace characters.
Definition at line 974 of file ccfile.cpp. 00975 { 00976 WhitespaceChars = Str; 00977 }
|
|
Put the current token back onto the input stream - i.e. the next time GetToken() is called it will return the token it returned last time. This is not nestable - you can only put back one token at once.
Definition at line 1984 of file ccfile.cpp. 01985 { 01986 TokenIsCached = TRUE; 01987 }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|