QDox
  1. QDox
  2. QDOX-39

java.lang.Character$UnicodeBlock fields are not correctly parsed

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: Parser
    • Labels:
      None
    • Environment:
      Latest CVS version
      J2SDK 1.4.2_03
      WinXP Pro
    • Number of attachments :
      0

      Description

      JavaClassBuilder fails to successfully parse the UnicodeBlock java.lang.Character inner class.

      In this class, fields are not defined as usual but like this :

      public static final UnicodeBlock
      BASIC_LATIN
      = new UnicodeBlock("BASIC_LATIN"),
      LATIN_1_SUPPLEMENT
      = new UnicodeBlock("LATIN_1_SUPPLEMENT"),
      /* cut */
      HALFWIDTH_AND_FULLWIDTH_FORMS
      = new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS"),
      SPECIALS
      = new UnicodeBlock("SPECIALS");

      From this code block, QDOX only finds the BASIC_LATIN field, and only 4 fields from the entire class (BASIC_LATIN, SYRIAC, blockStarts and blocks).

        Activity

        Hide
        Mike Williams added a comment -

        Tricky.

        This happens because the lexer discards everything between the first "=" and the ";".

        Discarding "un-interesting" tokens during lexical analysis is what makes QDox so speedy. In other words, it will be difficult to fix this bug without a big performance hit.

        Show
        Mike Williams added a comment - Tricky. This happens because the lexer discards everything between the first "=" and the ";". Discarding "un-interesting" tokens during lexical analysis is what makes QDox so speedy. In other words, it will be difficult to fix this bug without a big performance hit.
        Hide
        Aslak Hellesøy added a comment -

        This is a matter of whether we want to support ugly C-style declarations like:

        int i,j,k;

        It's in the Java spec, so I don't think we should "won't fix" this.

        Show
        Aslak Hellesøy added a comment - This is a matter of whether we want to support ugly C-style declarations like: int i,j,k; It's in the Java spec, so I don't think we should "won't fix" this.
        Hide
        Mike Williams added a comment -

        After a bit more thought, I think there actually is a way to handle this without a performance hit.

        If we keep track of paren/brace nesting within the ASSIGNMENT, we can get the lexer to recognise and return the COMMA, while still discarding the uninteresting tokens.

        Show
        Mike Williams added a comment - After a bit more thought, I think there actually is a way to handle this without a performance hit. If we keep track of paren/brace nesting within the ASSIGNMENT, we can get the lexer to recognise and return the COMMA, while still discarding the uninteresting tokens.
        Mike Williams made changes -
        Field Original Value New Value
        Assignee Mike Williams [ mdub ]
        Mike Williams made changes -
        Fix Version/s 1.4 [ 10304 ]
        Description JavaClassBuilder fails to successfully parse the UnicodeBlock java.lang.Character inner class.

        In this class, fields are not defined as usual but like this :

                public static final UnicodeBlock
                    BASIC_LATIN
                        = new UnicodeBlock("BASIC_LATIN"),
                    LATIN_1_SUPPLEMENT
                        = new UnicodeBlock("LATIN_1_SUPPLEMENT"),
                     /* cut */
                    HALFWIDTH_AND_FULLWIDTH_FORMS
                        = new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS"),
                    SPECIALS
                        = new UnicodeBlock("SPECIALS");

        From this code block, QDOX only finds the BASIC_LATIN field, and only 4 fields from the entire class (BASIC_LATIN, SYRIAC, blockStarts and blocks).
        JavaClassBuilder fails to successfully parse the UnicodeBlock java.lang.Character inner class.

        In this class, fields are not defined as usual but like this :

                public static final UnicodeBlock
                    BASIC_LATIN
                        = new UnicodeBlock("BASIC_LATIN"),
                    LATIN_1_SUPPLEMENT
                        = new UnicodeBlock("LATIN_1_SUPPLEMENT"),
                     /* cut */
                    HALFWIDTH_AND_FULLWIDTH_FORMS
                        = new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS"),
                    SPECIALS
                        = new UnicodeBlock("SPECIALS");

        From this code block, QDOX only finds the BASIC_LATIN field, and only 4 fields from the entire class (BASIC_LATIN, SYRIAC, blockStarts and blocks).
        Hide
        Aslak Hellesøy added a comment -

        Agree. That's the way to do it

        Show
        Aslak Hellesøy added a comment - Agree. That's the way to do it
        Hide
        Eric Dechaux added a comment -

        If implemanting this cause too much slowdowns, it may be possible to have to different lexers. One "fast" and a "slow", that could be choosen at runtime or at compiletime...

        Show
        Eric Dechaux added a comment - If implemanting this cause too much slowdowns, it may be possible to have to different lexers. One "fast" and a "slow", that could be choosen at runtime or at compiletime...
        Hide
        Mike Williams added a comment -

        I was wrong: we managed to fix this without a performance hit.

        Show
        Mike Williams added a comment - I was wrong: we managed to fix this without a performance hit.
        Mike Williams made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Aslak Hellesøy added a comment -

        Awesome work Mike!

        Show
        Aslak Hellesøy added a comment - Awesome work Mike!
        Hide
        Eric Dechaux added a comment -

        I agree, great work

        it works great

        Show
        Eric Dechaux added a comment - I agree, great work it works great
        Mike Williams made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Mike Williams
            Reporter:
            Eric Dechaux
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: