QDox
  1. QDox
  2. QDOX-82

multiline'd tag attribute values not working anymore

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10
    • Component/s: Parser
    • Labels:
      None
    • Number of attachments :
      2

      Description

      Some undefined time ago, it was possible to parse the following source
      with qdox and retrieve a sensible value for the "foo" attribute of the
      "bar.baz" tag -> "this is multilined"

      /**

      • @bar.baz foo="this is
      •       multilined"
        */

      with the latest snapshot, this unfortunately doesn't work anymore.

      I haven't found an open related jira issue, but before creating one, I
      wanted to make sure this wasn't on purpose ?

      I think allowing this makes sense, for instance for xdoclet because
      some attributes might have longish values, like the "description"
      elements for servlets and such.

      1. MultineLineAttributeValuesWithQDoxTestCase.java
        1 kB
        Grégory Joseph (old account)
      2. qdox82-test.patch
        2 kB
        Grégory Joseph (old account)

        Activity

        Hide
        Joe Walnes added a comment -

        Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.

        Show
        Joe Walnes added a comment - Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.
        Joe Walnes made changes -
        Field Original Value New Value
        Fix Version/s 1.6 [ 10814 ]
        Affects Version/s 1.6 [ 10814 ]
        Hide
        Mike Williams added a comment -

        I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request (QDOX-78) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD.

        Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?

        Show
        Mike Williams added a comment - I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request ( QDOX-78 ) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD. Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?
        Mike Williams made changes -
        Assignee Mike Williams [ mdub ]
        Hide
        Mike Williams added a comment -

        I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.

        Show
        Mike Williams added a comment - I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.
        Mike Williams made changes -
        Resolution Cannot Reproduce [ 5 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Hide
        Brian Topping added a comment -

        This was happening in CVS HEAD. More data to come.

        Show
        Brian Topping added a comment - This was happening in CVS HEAD. More data to come.
        Brian Topping made changes -
        Assignee Mike Williams [ mdub ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Cannot Reproduce [ 5 ]
        Hide
        Brian Topping added a comment -

        Just spent most of the day tracking this down and getting things organized. Here's the regression:

        • Check out xdoclet-plugins project
        • Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
        • Build xdoclet-plugin-web project

        You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix.

        Thanks for looking at this.

        -b

        Show
        Brian Topping added a comment - Just spent most of the day tracking this down and getting things organized. Here's the regression: Check out xdoclet-plugins project Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1 Build xdoclet-plugin-web project You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix. Thanks for looking at this. -b
        Hide
        Grégory Joseph (old account) added a comment -

        Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.

        Show
        Grégory Joseph (old account) added a comment - Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.
        Hide
        Joe Walnes added a comment -

        Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please.

        Thought to all... given the example above...

        /**

        • @bar.baz foo="this is
        • multilined"
          */

        ... what would you expect the value that qdox parses?

        "this is multilined"
        or
        "this is\n multilined"
        or
        "this is\n multilined"

        The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78)?

        Show
        Joe Walnes added a comment - Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please. Thought to all... given the example above... /** @bar.baz foo="this is multilined" */ ... what would you expect the value that qdox parses? "this is multilined" or "this is\n multilined" or "this is\n multilined" The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78 )?
        Hide
        Mike Williams added a comment -

        In answer to Joe's question, I'd expect:

        "this is\nmultilined"

        with the leading space trimmed.

        More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.

        Show
        Mike Williams added a comment - In answer to Joe's question, I'd expect: "this is\nmultilined" with the leading space trimmed. More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.
        Hide
        Grégory Joseph (old account) added a comment -

        "this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again

        Show
        Grégory Joseph (old account) added a comment - "this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again
        Hide
        Grégory Joseph (old account) added a comment -

        Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac.
        I'll let you guys make the path to the test source portable

        Show
        Grégory Joseph (old account) added a comment - Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac. I'll let you guys make the path to the test source portable
        Grégory Joseph (old account) made changes -
        Hide
        Grégory Joseph (old account) added a comment -

        or maybe as a patch if you prefer.
        (same remark wrt the path to the source)

        Show
        Grégory Joseph (old account) added a comment - or maybe as a patch if you prefer. (same remark wrt the path to the source)
        Grégory Joseph (old account) made changes -
        Attachment qdox82-test.patch [ 13933 ]
        Hide
        Mike Williams added a comment -

        Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).

        Show
        Mike Williams added a comment - Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).
        Hide
        Mike Williams added a comment -

        This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline:

        /**

        • @bar.baz foo="this is\
        • multilined"
          */

        I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing (QDOX-45, QDOX-50), and it was released in QDox-1.5.

        I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined.

        Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc?

        Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.

        Show
        Mike Williams added a comment - This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline: /** @bar.baz foo="this is\ multilined" */ I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing ( QDOX-45 , QDOX-50 ), and it was released in QDox-1.5. I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined. Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc? Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.
        Hide
        Joe Walnes added a comment -

        Trying to resolve old issues.... Gregory, was this problem ever solved for you?

        thanks
        -Joe

        Show
        Joe Walnes added a comment - Trying to resolve old issues.... Gregory, was this problem ever solved for you? thanks -Joe
        Hide
        Grégory Joseph added a comment -

        I was going to run the tests (see attachments) against the latest sources to check this but err,
        http://qdox.codehaus.org/cvs-usage.html

        (I'm guessing i'll find qdox on codehaus's svn though )

        Show
        Grégory Joseph added a comment - I was going to run the tests (see attachments) against the latest sources to check this but err, http://qdox.codehaus.org/cvs-usage.html (I'm guessing i'll find qdox on codehaus's svn though )
        Hide
        Grégory Joseph added a comment -

        I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html

        So I got the trunk from svn and my test sill doesn't pass.

        Show
        Grégory Joseph added a comment - I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html So I got the trunk from svn and my test sill doesn't pass.
        Hide
        Christian Hargraves added a comment -

        <pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug?

        It looks like Bug #/QDOX-78 covers EOL characters, but not normal whitespaces used for indention?

        Show
        Christian Hargraves added a comment - <pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug? It looks like Bug #/ QDOX-78 covers EOL characters, but not normal whitespaces used for indention?
        Hide
        Mauro Talevi added a comment -

        Changed fix version to 1.7

        Show
        Mauro Talevi added a comment - Changed fix version to 1.7
        Mauro Talevi made changes -
        Fix Version/s 1.7 [ 11160 ]
        Fix Version/s 1.6 [ 10814 ]
        Hide
        Dimitri BAELI added a comment -

        A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected.
        It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed.
        Suggestion, the TagParser should be reimplemented.

        Show
        Dimitri BAELI added a comment - A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected. It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed. Suggestion, the TagParser should be reimplemented.
        Hide
        Paul Hammant added a comment -

        Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ?

        it might be OK to substitute newlines for some other char, might it not ?

        • Paul
        Show
        Paul Hammant added a comment - Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ? it might be OK to substitute newlines for some other char, might it not ? Paul
        Paul Hammant made changes -
        Fix Version/s 1.8 [ 14826 ]
        Fix Version/s 1.7 [ 11160 ]
        Mauro Talevi made changes -
        Component/s QDox-Attributes [ 10531 ]
        Paul Hammant made changes -
        Fix Version/s 1.8 [ 14826 ]
        Fix Version/s 1.9 [ 14956 ]
        Paul Hammant made changes -
        Fix Version/s 1.10 [ 15020 ]
        Fix Version/s 1.9 [ 14956 ]
        Robert Scholte made changes -
        Fix Version/s 1.10 [ 15020 ]
        Fix Version/s 1.9.1 [ 15252 ]
        Robert Scholte made changes -
        Fix Version/s 1.9.1 [ 15252 ]
        Fix Version/s 1.10 [ 15020 ]
        Hide
        Mark Jenner added a comment -

        Hi,

        I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods:

        private void setUpTokenizerForFvar(StreamTokenizer tokenizer)

        { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

        private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer)

        { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

        // Because StreamTokenizer does not parse quoted strings that contain newlines properly
        // we have to do it ourselves. Reads everything up until a matching closing quote
        // ignoring any that are preceded by an escape char '\'
        private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) {
        StringBuilder value = new StringBuilder();
        setUpTokenizerForQuotedValue(tokenizer);
        def nextToken = tokenizer.nextToken();
        boolean escapedQuote = false;
        while (escapedQuote || nextToken != openQuote) {
        escapedQuote = false;
        if (nextToken == StreamTokenizer.TT_WORD)

        { value.append(tokenizer.sval); }

        else if (nextToken == ESCAPE)

        { escapedQuote = true; value.append((char)nextToken); }

        else if (nextToken == openQuote)

        { value.append((char)nextToken); }

        nextToken = tokenizer.nextToken();
        }
        setUpTokenizerForFvar(tokenizer);
        return value.toString();
        }

        used in some code like this:
        [...]
        nextToken = tokenizer.nextToken();
        if (nextToken == DOUBLE_QUOTE) {
        String value = parseQuotedString(nextToken, tokenizer);
        [...]

        Hope that is some value to you.

        Show
        Mark Jenner added a comment - Hi, I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods: private void setUpTokenizerForFvar(StreamTokenizer tokenizer) { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } // Because StreamTokenizer does not parse quoted strings that contain newlines properly // we have to do it ourselves. Reads everything up until a matching closing quote // ignoring any that are preceded by an escape char '\' private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) { StringBuilder value = new StringBuilder(); setUpTokenizerForQuotedValue(tokenizer); def nextToken = tokenizer.nextToken(); boolean escapedQuote = false; while (escapedQuote || nextToken != openQuote) { escapedQuote = false; if (nextToken == StreamTokenizer.TT_WORD) { value.append(tokenizer.sval); } else if (nextToken == ESCAPE) { escapedQuote = true; value.append((char)nextToken); } else if (nextToken == openQuote) { value.append((char)nextToken); } nextToken = tokenizer.nextToken(); } setUpTokenizerForFvar(tokenizer); return value.toString(); } used in some code like this: [...] nextToken = tokenizer.nextToken(); if (nextToken == DOUBLE_QUOTE) { String value = parseQuotedString(nextToken, tokenizer); [...] Hope that is some value to you.
        Hide
        Robert Scholte added a comment -

        This one should be resolved by implementing QDOX-168

        Show
        Robert Scholte added a comment - This one should be resolved by implementing QDOX-168
        Robert Scholte made changes -
        Assignee Robert Scholte [ rfscholte ]
        Resolution Fixed [ 1 ]
        Status Reopened [ 4 ] Resolved [ 5 ]

          People

          • Assignee:
            Robert Scholte
            Reporter:
            Grégory Joseph (old account)
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: