Hi,
I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods:
private void setUpTokenizerForFvar(StreamTokenizer tokenizer)
{
// Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment
tokenizer.resetSyntax();
tokenizer.wordChars((int)'a', (int)'z');
tokenizer.wordChars((int)'A', (int)'Z');
tokenizer.wordChars(128 + 32, 255);
tokenizer.whitespaceChars(0, (int)' ');
tokenizer.commentChar((int)'/');
tokenizer.parseNumbers();
// Attribute names in fvar can include underscores, and spaces!
tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE);
tokenizer.wordChars(SPACE, SPACE);
tokenizer.ordinaryChar(DOUBLE_QUOTE);
}
private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer)
{
// Reset the tokenizer to treat everything as a word except the double quote char and the escape char
tokenizer.resetSyntax();
tokenizer.wordChars(0, 127);
tokenizer.ordinaryChar(ESCAPE);
tokenizer.ordinaryChar(DOUBLE_QUOTE);
}
// Because StreamTokenizer does not parse quoted strings that contain newlines properly
// we have to do it ourselves. Reads everything up until a matching closing quote
// ignoring any that are preceded by an escape char '\'
private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) {
StringBuilder value = new StringBuilder();
setUpTokenizerForQuotedValue(tokenizer);
def nextToken = tokenizer.nextToken();
boolean escapedQuote = false;
while (escapedQuote || nextToken != openQuote) {
escapedQuote = false;
if (nextToken == StreamTokenizer.TT_WORD)
{
value.append(tokenizer.sval);
}
else if (nextToken == ESCAPE)
{
escapedQuote = true;
value.append((char)nextToken);
}
else if (nextToken == openQuote)
{
value.append((char)nextToken);
}
nextToken = tokenizer.nextToken();
}
setUpTokenizerForFvar(tokenizer);
return value.toString();
}
used in some code like this:
[...]
nextToken = tokenizer.nextToken();
if (nextToken == DOUBLE_QUOTE) {
String value = parseQuotedString(nextToken, tokenizer);
[...]
Hope that is some value to you.
Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.