#1063507 tt-rss: Upgrading to Bookworm results in old read articles to be refetched as unread

#1063507#5
Date:
2024-02-09 02:41:27 UTC
From:
To:
Dear Maintainer,

There was happily working tt-rss instalation in Bullseye, however after
upgrading the Bookworm, the first time updating script run, it re-fetched
a lot of older articles and marked them as unread.

Trying to debug the issue, I've found that the GUID seems to have changed,
probably due to PHP 8.2 in Bookworm changed not to quote integers fetched
from database:
https://www.php.net/manual/en/migration81.incompatible.php#migration81.incompatible.pdo.mysql

and the old packaged version of tt-rss apparently not having support for it.

That resulted in duplicate rows (in otherwise UNIQUE field "guid"), e.g.:

MariaDB [ttrss]> select id, title, guid, updated, date_entered, date_updated from ttrss_entries where title like 'Could the Sun%';
+--------+---------------------------------------+----------------------------------------------------------------------------+---------------------+---------------------+---------------------+
| id     | title                                 | guid                                                                       | updated             | date_entered        | date_updated        |
+--------+---------------------------------------+----------------------------------------------------------------------------+---------------------+---------------------+---------------------+
| 528463 | Could the Sun be hiding a black hole? | {"ver":2,"uid":"3","hash":"SHA1:dcf27dd8206c88fc25db2439fbfdbcc1113d826e"} | 2024-01-21 15:01:08 | 2024-01-22 00:05:00 | 2024-02-05 00:09:14 |
| 534207 | Could the Sun be hiding a black hole? | {"ver":2,"uid":3,"hash":"SHA1:dcf27dd8206c88fc25db2439fbfdbcc1113d826e"}   | 2024-01-21 15:01:08 | 2024-02-09 00:56:00 | 2024-02-09 00:56:08 |
+--------+---------------------------------------+----------------------------------------------------------------------------+---------------------+---------------------+---------------------+

That should not have happened, there should've been only first row existing, and
second row shouldn't have been created. The problem seems to be that "guid"
is not EXACTLY the same, before it said:
"uid":"3"
and now it says
"uid":3

while it points to exactly the same data, the strings are not the same, so
it fails to detect it as a duplicate.

I've worked around the problem by stopping updating services, restoring the
last tt-rss database backup before Bookworm upgrade, and running following
mysql command on ttrss database:

UPDATE ttrss_entries SET guid = REGEXP_REPLACE(guid, '"uid":"([0-9]+)"', '"uid":\\1');

that converted all old entries (which used quoted-integers) to a new format
(which does not quote integers), thus allowing subsequent tt-rss feed updates
not to create duplicates as even old entries are using new format.

Perhaps newer upstream version of tt-rss handles that as well as it does
other similar problems with string/integer (e.g. as it does in #1054608)