[QDOX-82] multiline'd tag attribute values not working anymore - jira.codehaus.org

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.10
Component/s: Parser
Labels:
None

Number of attachments :
2

Description

Some undefined time ago, it was possible to parse the following source
with qdox and retrieve a sensible value for the "foo" attribute of the
"bar.baz" tag -> "this is multilined"

/**

@bar.baz foo="this is
multilined"
*/

with the latest snapshot, this unfortunately doesn't work anymore.

I haven't found an open related jira issue, but before creating one, I
wanted to make sure this wasn't on purpose ?

I think allowing this makes sense, for instance for xdoclet because
some attributes might have longish values, like the "description"
elements for servlets and such.

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Download All

Attachments

MultineLineAttributeValuesWithQDoxTestCase.java

07/Feb/05 5:08 PM

1 kB

Grégory Joseph (old account)
qdox82-test.patch

07/Feb/05 5:31 PM

2 kB

Grégory Joseph (old account)

Activity

Hide

Permalink

Joe Walnes added a comment - 04/Feb/05 8:55 AM

Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.

Show

Joe Walnes added a comment - 04/Feb/05 8:55 AM Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.

Joe Walnes made changes - 04/Feb/05 8:55 AM

Field	Original Value	New Value
Fix Version/s		1.6 [ 10814 ]
Affects Version/s	1.6 [ 10814 ]

Hide

Permalink

Mike Williams added a comment - 06/Feb/05 5:19 AM

I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request (~~QDOX-78~~) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD.

Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?

Show

Mike Williams added a comment - 06/Feb/05 5:19 AM I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request ( QDOX-78 ) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD. Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?

Mike Williams made changes - 06/Feb/05 5:45 AM

Assignee

Mike Williams [ mdub ]

Hide

Permalink

Mike Williams added a comment - 06/Feb/05 5:47 AM

I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.

Show

Mike Williams added a comment - 06/Feb/05 5:47 AM I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.

Mike Williams made changes - 06/Feb/05 5:47 AM

Resolution		Cannot Reproduce [ 5 ]
Status	Open [ 1 ]	Resolved [ 5 ]

Hide

Permalink

Brian Topping added a comment - 06/Feb/05 6:53 AM

This was happening in CVS HEAD. More data to come.

Show

Brian Topping added a comment - 06/Feb/05 6:53 AM This was happening in CVS HEAD. More data to come.

Brian Topping made changes - 06/Feb/05 6:53 AM

Assignee	Mike Williams [ mdub ]
Status	Resolved [ 5 ]	Reopened [ 4 ]
Resolution	Cannot Reproduce [ 5 ]

Hide

Permalink

Brian Topping added a comment - 06/Feb/05 4:40 PM

Just spent most of the day tracking this down and getting things organized. Here's the regression:

Check out xdoclet-plugins project
Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
Build xdoclet-plugin-web project

You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix.

Thanks for looking at this.

-b

Show

Brian Topping added a comment - 06/Feb/05 4:40 PM Just spent most of the day tracking this down and getting things organized. Here's the regression: Check out xdoclet-plugins project Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1 Build xdoclet-plugin-web project You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix. Thanks for looking at this. -b

Hide

Permalink

Grégory Joseph (old account) added a comment - 06/Feb/05 7:29 PM

Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.

Show

Grégory Joseph (old account) added a comment - 06/Feb/05 7:29 PM Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.

Hide

Permalink

Joe Walnes added a comment - 07/Feb/05 2:19 AM

Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please.

Thought to all... given the example above...

/**

@bar.baz foo="this is
multilined"
*/

... what would you expect the value that qdox parses?

"this is multilined"
or
"this is\n multilined"
or
"this is\n multilined"

The first one is more convenient, but does it lose potentially useful data (as was the case with ~~QDOX-78~~)?

Show

Joe Walnes added a comment - 07/Feb/05 2:19 AM Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please. Thought to all... given the example above... /** @bar.baz foo="this is multilined" */ ... what would you expect the value that qdox parses? "this is multilined" or "this is\n multilined" or "this is\n multilined" The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78 )?

Hide

Permalink

Mike Williams added a comment - 07/Feb/05 5:18 AM

In answer to Joe's question, I'd expect:

"this is\nmultilined"

with the leading space trimmed.

More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.

Show

Mike Williams added a comment - 07/Feb/05 5:18 AM In answer to Joe's question, I'd expect: "this is\nmultilined" with the leading space trimmed. More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.

Hide

Permalink

Grégory Joseph (old account) added a comment - 07/Feb/05 10:49 AM

"this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again

Show

Grégory Joseph (old account) added a comment - 07/Feb/05 10:49 AM "this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again

Hide

Permalink

Grégory Joseph (old account) added a comment - 07/Feb/05 5:08 PM

Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac.
I'll let you guys make the path to the test source portable

Show

Grégory Joseph (old account) added a comment - 07/Feb/05 5:08 PM Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac. I'll let you guys make the path to the test source portable

Grégory Joseph (old account) made changes - 07/Feb/05 5:08 PM

Attachment

MultineLineAttributeValuesWithQDoxTestCase.java [ 13932 ]

Hide

Permalink

Grégory Joseph (old account) added a comment - 07/Feb/05 5:31 PM

or maybe as a patch if you prefer.
(same remark wrt the path to the source)

Show

Grégory Joseph (old account) added a comment - 07/Feb/05 5:31 PM or maybe as a patch if you prefer. (same remark wrt the path to the source)

Grégory Joseph (old account) made changes - 07/Feb/05 5:31 PM

Attachment

qdox82-test.patch [ 13933 ]

Hide

Permalink

Mike Williams added a comment - 12/Feb/05 4:52 AM

Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).

Show

Mike Williams added a comment - 12/Feb/05 4:52 AM Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).

Hide

Permalink

Mike Williams added a comment - 12/Feb/05 6:17 AM

This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline:

/**

@bar.baz foo="this is\
multilined"
*/

I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing (~~QDOX-45~~, ~~QDOX-50~~), and it was released in QDox-1.5.

I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined.

Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc?

Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.

Show

Mike Williams added a comment - 12/Feb/05 6:17 AM This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline: /** @bar.baz foo="this is\ multilined" */ I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing ( QDOX-45 , QDOX-50 ), and it was released in QDox-1.5. I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined. Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc? Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.

Hide

Permalink

Joe Walnes added a comment - 23/Sep/06 3:39 AM

Trying to resolve old issues.... Gregory, was this problem ever solved for you?

thanks
-Joe

Show

Joe Walnes added a comment - 23/Sep/06 3:39 AM Trying to resolve old issues.... Gregory, was this problem ever solved for you? thanks -Joe

Hide

Permalink

Grégory Joseph added a comment - 24/Sep/06 11:46 AM

I was going to run the tests (see attachments) against the latest sources to check this but err,
http://qdox.codehaus.org/cvs-usage.html

(I'm guessing i'll find qdox on codehaus's svn though )

Show

Grégory Joseph added a comment - 24/Sep/06 11:46 AM I was going to run the tests (see attachments) against the latest sources to check this but err, http://qdox.codehaus.org/cvs-usage.html (I'm guessing i'll find qdox on codehaus's svn though )

Hide

Permalink

Grégory Joseph added a comment - 24/Sep/06 11:56 AM

I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html

So I got the trunk from svn and my test sill doesn't pass.

Show

Grégory Joseph added a comment - 24/Sep/06 11:56 AM I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html So I got the trunk from svn and my test sill doesn't pass.

Hide

Permalink

Christian Hargraves added a comment - 21/Jan/07 11:40 AM

<pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug?

It looks like Bug #/~~QDOX-78~~ covers EOL characters, but not normal whitespaces used for indention?

Show

Christian Hargraves added a comment - 21/Jan/07 11:40 AM <pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug? It looks like Bug #/ QDOX-78 covers EOL characters, but not normal whitespaces used for indention?

Hide

Permalink

Mauro Talevi added a comment - 27/May/07 10:48 AM

Changed fix version to 1.7

Show

Mauro Talevi added a comment - 27/May/07 10:48 AM Changed fix version to 1.7

Mauro Talevi made changes - 27/May/07 10:48 AM

Fix Version/s		1.7 [ 11160 ]
Fix Version/s	1.6 [ 10814 ]

Hide

Permalink

Dimitri BAELI added a comment - 11/Jan/08 8:31 AM

A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected.
It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed.
Suggestion, the TagParser should be reimplemented.

Show

Dimitri BAELI added a comment - 11/Jan/08 8:31 AM A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected. It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed. Suggestion, the TagParser should be reimplemented.

Hide

Permalink

Paul Hammant added a comment - 21/Feb/08 1:46 AM

Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ?

it might be OK to substitute newlines for some other char, might it not ?

Paul

Show

Paul Hammant added a comment - 21/Feb/08 1:46 AM Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ? it might be OK to substitute newlines for some other char, might it not ? Paul

Paul Hammant made changes - 22/Dec/08 9:56 PM

Fix Version/s		1.8 [ 14826 ]
Fix Version/s	1.7 [ 11160 ]

Mauro Talevi made changes - 02/Jan/09 4:28 AM

Component/s

QDox-Attributes [ 10531 ]

Paul Hammant made changes - 31/Jan/09 12:26 PM

Fix Version/s	1.8 [ 14826 ]
Fix Version/s		1.9 [ 14956 ]

Paul Hammant made changes - 21/Feb/09 10:06 AM

Fix Version/s		1.10 [ 15020 ]
Fix Version/s	1.9 [ 14956 ]

Robert Scholte made changes - 25/Apr/09 11:59 AM

Fix Version/s	1.10 [ 15020 ]
Fix Version/s		1.9.1 [ 15252 ]

Robert Scholte made changes - 04/May/09 11:48 AM

Fix Version/s	1.9.1 [ 15252 ]
Fix Version/s		1.10 [ 15020 ]

Hide

Permalink

Mark Jenner added a comment - 01/Jun/09 8:58 AM

Hi,

I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods:

private void setUpTokenizerForFvar(StreamTokenizer tokenizer)

{ // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer)

{ // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

// Because StreamTokenizer does not parse quoted strings that contain newlines properly
// we have to do it ourselves. Reads everything up until a matching closing quote
// ignoring any that are preceded by an escape char '\'
private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) {
StringBuilder value = new StringBuilder();
setUpTokenizerForQuotedValue(tokenizer);
def nextToken = tokenizer.nextToken();
boolean escapedQuote = false;
while (escapedQuote || nextToken != openQuote) {
escapedQuote = false;
if (nextToken == StreamTokenizer.TT_WORD)

{ value.append(tokenizer.sval); }

else if (nextToken == ESCAPE)

{ escapedQuote = true; value.append((char)nextToken); }

else if (nextToken == openQuote)

{ value.append((char)nextToken); }

nextToken = tokenizer.nextToken();
}
setUpTokenizerForFvar(tokenizer);
return value.toString();
}

used in some code like this:
[...]
nextToken = tokenizer.nextToken();
if (nextToken == DOUBLE_QUOTE) {
String value = parseQuotedString(nextToken, tokenizer);
[...]

Hope that is some value to you.

Show

Mark Jenner added a comment - 01/Jun/09 8:58 AM Hi, I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods: private void setUpTokenizerForFvar(StreamTokenizer tokenizer) { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } // Because StreamTokenizer does not parse quoted strings that contain newlines properly // we have to do it ourselves. Reads everything up until a matching closing quote // ignoring any that are preceded by an escape char '\' private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) { StringBuilder value = new StringBuilder(); setUpTokenizerForQuotedValue(tokenizer); def nextToken = tokenizer.nextToken(); boolean escapedQuote = false; while (escapedQuote || nextToken != openQuote) { escapedQuote = false; if (nextToken == StreamTokenizer.TT_WORD) { value.append(tokenizer.sval); } else if (nextToken == ESCAPE) { escapedQuote = true; value.append((char)nextToken); } else if (nextToken == openQuote) { value.append((char)nextToken); } nextToken = tokenizer.nextToken(); } setUpTokenizerForFvar(tokenizer); return value.toString(); } used in some code like this: [...] nextToken = tokenizer.nextToken(); if (nextToken == DOUBLE_QUOTE) { String value = parseQuotedString(nextToken, tokenizer); [...] Hope that is some value to you.

Hide

Permalink

Robert Scholte added a comment - 03/Sep/09 3:46 PM

This one should be resolved by implementing ~~QDOX-168~~

Show

Robert Scholte added a comment - 03/Sep/09 3:46 PM This one should be resolved by implementing QDOX-168

Robert Scholte made changes - 03/Sep/09 3:46 PM

Assignee		Robert Scholte [ rfscholte ]
Resolution		Fixed [ 1 ]
Status	Reopened [ 4 ]	Resolved [ 5 ]

People

Assignee:

Robert Scholte

Reporter:

Grégory Joseph (old account)

Votes:

1 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

03/Feb/05 12:53 PM

Updated:

03/Sep/09 3:46 PM

Resolved:

03/Sep/09 3:46 PM