#136620 Swedish: Fails to recognize words with initial hyphen

Package:
ispell
Source:
ispell
Description:
International Ispell (an interactive spelling corrector)
Submitter:
peter karlsson
Date:
2025-01-29 17:33:02 UTC
Severity:
normal
#136620#5
Date:
2002-03-03 13:02:17 UTC
From:
To:
In Swedish, you can "continue" on the previous word stem by starting with a
hyphen, for example:

   bilförare, -ägare och -passagerare

("car drivers, owners and passengers"), where "-" indicates that the words
continue "bil" ("car"). "Ägare" and "passagerare" are valid words in
Swedish, which makes ispell suggest that I should replace "-ägare" with
"ägare". This is a bug in ispell, since removing the initial hyphen changes
the meaning of the word.

IIRC, previous versions of ispell did not have this problem (at least I have
not noticed it until more recent versions).

#136620#10
Date:
2002-03-21 13:44:03 UTC
From:
To:
Hi,

I think this problem is because the wordlist changed, e.g. wht word
costituent chars changed.  You could check by removing the hyphen from
the .aff file and rebuilding the dictionary.  I'll try this later
myself though.

Reassign it if this is the case.

/Micce (maintainer of iswedish)

PS How do I add myself to this bug?  I remember it is or will be
possible...

#136620#15
Date:
2002-03-21 15:51:16 UTC
From:
To:
Try http://people.debian.org/~micce/iswedish_1.4.3_i386.deb, with this
we cannot spell ABC-vapen correctly, but maybe we can live with that.
The compound word thing also needs work.

#136620#20
Date:
2002-03-22 05:59:41 UTC
From:
To:
With this wordlist I get

$ ispell test.txt
Word 'användar-id' contains illegal characters
Word 'ASCII-text' contains illegal characters
Word 'cd-avbildning' contains illegal characters
Word 'cd-avbildningar' contains illegal characters
Word 'cd-avbildningarna' contains illegal characters
Word 'cd-avbildningsfiler' contains illegal characters
Word 'cd-rom' contains illegal characters
Word 'Debian-cd' contains illegal characters
Word 'Debian-produkt' contains illegal characters
Word 'dpkg-sviten' contains illegal characters
Word 'dpkg-verktyg' contains illegal characters
Word 'e-post' contains illegal characters
Word 'e-postbrev' contains illegal characters
Word 'e-postprogramvara' contains illegal characters
[...]

when I start. I.e, error messages for all the compound words in my
personal wordlist.

#136620#25
Date:
2002-03-22 10:11:07 UTC
From:
To:
The hyphen should only be considered part of the word if it is not beginning
or ending the word, I think. This is not a perfect soultion either, since
some partial constructs will not work anyway, but some of them I think are
beyond "easy" algorithmics that can be employed by a spelling checker, they
would need some much more advanced linguistic analysis.

#136620#30
Date:
2002-03-22 10:22:03 UTC
From:
To:
peter karlsson writes:
 > > Yepp.  So how do we fix this?  '-' should be a word constituent only
 > > sometimes...
 >
 > The hyphen should only be considered part of the word if it is not beginning
 > or ending the word, I think. This is not a perfect soultion either, since
 > some partial constructs will not work anyway, but some of them I think are
 > beyond "easy" algorithmics that can be employed by a spelling checker, they
 > would need some much more advanced linguistic analysis.

But AFAIK, there is no way to do this in an ispell dictionary,
unfortunately.  If there is, pleas enlight me, and I fix it.  Maybe it
could be done by saying the word '-' could be combined with any word,
both pre- and post-fix?  (And making it a wc again, also).

/Micce

#136620#35
Date:
2002-03-22 09:32:38 UTC
From:
To:
peter karlsson writes:
 > > Try http://people.debian.org/~micce/iswedish_1.4.3_i386.deb, with this
 > > we cannot spell ABC-vapen correctly, but maybe we can live with that.
 > > The compound word thing also needs work.
 >
 > With this wordlist I get
 >
 > $ ispell test.txt
 > Word 'användar-id' contains illegal characters
 > Word 'ASCII-text' contains illegal characters
 > Word 'cd-avbildning' contains illegal characters

 > when I start. I.e, error messages for all the compound words in my
 > personal wordlist.

Yepp.  So how do we fix this?  '-' should be a word constituent only
sometimes... Please have a look at the source if you like, it's called
swedish.  I have no suggestion for the moment.  Maybe there has to be
some change to ispell after all?

/Micce

#136620#40
Date:
2002-03-24 01:42:14 UTC
From:
To:
Mikael Hedin <mikael.hedin@irf.se> writes:

For what little it's worth, I've noticed this happens with
english wordlists as well -- all hyphenated words get rejected
with "contains illegal characters."  I've always ignored it,
since it's a feature of upstream ispell.

#136620#45
Date:
2003-06-18 06:46:30 UTC
From:
To:
Did you and/or Mikael come up with a successful workaround?

I have still not looked into this in ispell -- should I close
the bug report, or keep it on my list of things to do some day?

Thanks.

#136620#52
Date:
2003-06-20 06:59:52 UTC
From:
To:
No, I don't think we ever did.

Please.