• Bug#1091179: libconfig-model-dpkg-perl: scan-copyrigths fails on texliv

    From Walter Lozano@21:1/5 to Dominique Dumont on Thu Dec 26 14:10:01 2024
    Hi Dominique,

    First, let me thank you for your prompt response and quick fix at this
    time of the year.

    On 12/23/24 12:32, Dominique Dumont wrote:
    Hi

    This failure is due to a combination of issues.

    Some author of texlive-extra got creative to specify their copyright years:

    - 2023 -20** by Romain NOEL <romainoel@free.fr>
    - 2011-.. Maïeul Rouquette

    This leads to error when parsing copyright ranges and this triggers the error you've seen.

    I'll change Software::Copyright to cope with these new ways of specifying a copyright.

    Thank you! Yes, I noticed those range definition which looked very
    suspicious, but I didn't have time (and skills) to further investigate
    it to propose a fix.


    Another problem is that some copyrights are not utf-8:

    texmf-dist/source/latex/beamertheme-gotham/gotham.dtx LPPL-1.3c 2023 -20** by Romain NOEL <romainoel@free.fr> texmf-dist/source/latex/beamertheme-gotham/gotham.ins LPPL-1.3c 2008 Romain NOÃL <romainoel@free.fr> texmf-dist/tex/latex/beamertheme-gotham/beamercolorthemegotham.sty
    UNKNOWN 2023 -20** by Romain NOÃL <romainoel@free.fr> texmf-dist/tex/latex/beamertheme-gotham/beamerfontthemegotham.sty
    UNKNOWN 2023 -20** by Romain NOÃL <romainoel@free.fr> texmf-dist/tex/latex/beamertheme-gotham/beamerinnerthemegotham.sty
    UNKNOWN 2023 -20** by Romain NOÃL <romainoel@free.fr> texmf-dist/tex/latex/beamertheme-gotham/beamerouterthemegotham.sty
    UNKNOWN 2023 -20** by Romain NOÃL <romainoel@free.fr> texmf-dist/tex/latex/beamertheme-gotham/beamerthemegotham.sty UNKNOWN 2023 -20** by Romain NOÃL <romainoel@free.fr>

    This also triggers the error you've seen.

    I'm going to modify lib/Dpkg/Copyright/Grant/ByDir.pm to tolerate these issues, but the resulting copyright may not be as accurate as possible.

    I see, thanks for clarifying. I wonder if there is a kind of
    specification which mentions that copyright notice should be utf-8 or if
    it is just the common case.


    I'd suggest you talk with upstream to fix the encoding of the source files.

    Thank you, yes, I will discuss this with upstream.

    I understand that some of these corner cases are triggered by the new
    features in scan-copyrights which tries to get better scanning results
    which is something I really appreciate. In this context, I wonder if in
    general when trying to parse a copyright notice and some "strange data"
    is found the tool should print a warning and report "UNKNOWN"

    Thanks,

    Walter


    All the best





    --
    Walter Lozano
    Collabora Ltd.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dominique Dumont@21:1/5 to All on Fri Dec 27 19:50:01 2024
    On Thursday, 26 December 2024 13:52:02 CET Walter Lozano wrote:
    I see, thanks for clarifying. I wonder if there is a kind of
    specification which mentions that copyright notice should be utf-8 or if
    it is just the common case.

    AFAIK, there's no specification. But this makes licensecheck and cme much more reliable. Currently, cme receives parsed copyrights and licences from the stdout of licensecheck. Currently I cannot detect which part is utf-8 or something else. There's no reliable way to detect this. So I prefer to push upstream to clean up.

    I'd suggest you talk with upstream to fix the encoding of the source
    files.

    Thank you, yes, I will discuss this with upstream.

    Thanks

    I understand that some of these corner cases are triggered by the new features in scan-copyrights which tries to get better scanning results
    which is something I really appreciate.

    I happy to hear that :-)

    In this context, I wonder if in
    general when trying to parse a copyright notice and some "strange data"
    is found the tool should print a warning and report "UNKNOWN"

    cme emits a warning with the unexpected copyright year range and discard them.

    Encoding issues should be detected by licensecheck. I don't think it can do that. Hence, garbage in, garbage out.

    All the best

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)