Jump to content
TNG Community

Fonts / character encoding


fluffy82

Recommended Posts

I am still struggling with some font issues. I hope someone can guide me towards the right direction.

In my tree, out of a purely historic interest, I have includes some branches with Ancient family trees, such as the Egyptian pharaohs or Persian kings. I try to be somewhat historically correct, and want to write not only their names as we know them today, but also as they appear on inscriptions - ie wherever possible I write their names in Aramaic, Demotic, Parthian, cuneiform, runes,...

For Greek, Arabic, Hebrew,... that's not an issue. Family Tree Maker exports it, TNG imports it, all is showing fine on my website.

But some alphabets are more problematic. For example cuneiform. I did some tests, and the conclusion is that the TNG database is the problem. It's a bit complicated, bare with me...

  • in Family Tree Maker, the display is rather inconsistent; on some tabs it will show correctly, on others it will show white squares
  • my FTM tree is synchronized with Ancestry; the characters show correctly on Ancestry

Conclusion 1: the FTM database contains the character codes, though it doesn't always display them

  • exporting a gedcom and opening that gedcom in a simple text editor, shows white squares with a "?" inside; it only does this for some of these special characters, others (eg Greek or Parthian) show correctly
  • when I import the gedcom into for example geneanet.org, the special characters all show correctly
  • doing a copy paste of the white squares in the gedcom into an app that displays the characters, they display correctly

Conclusion 2: even though some of them don't show in a text editor, the character code is still present in the gedcom file

  • when I import the exact same gedcom into TNG, all of the "special" characters except the modern ones (Greek, Russian, Hebrew, Arabic) turn into "?????????????"
  • this is also the case for the ones that did display in the text editor
  • doing a copy paste of the ????? in the gedcom into an app that displays the characters, does not display them anymore
  • going to my database through phpMyAdmin, I can see that the database as well only contains "???????????"
  • changing the font of that part of my website into something like Segoe UI does not change anything
  • doing a copy paste of the correct characters into my website through the inspect element screen, so without passing through the database, they are displayed correctly

Conclusion 3: somewhere during the import of the gedcom, the special characters get "lost" and are turned into "?????"

ย 

At first I thought it was becaus my database tables are in UTF-8. But the gedcom which is created and which does contain the right characters is also UTF-8, as well as the website that does display them (geneanet).

My database is in collation "utf8_general_ci". Would it make a difference to change that? Is it safe to change that? And what should I change it into? utf8_unicode_ci?

Link to comment
Share on other sites

UPDATE:

I did some research online. Apparently, utf8_general_ci is an old and obsolete collation which should not be used. It is missing characters, it's sorting alphabetically in a wrong manner etc. I was advised to use utf8mb4_unicode_ci in stead. So I manually changed all 31 tables (if I counted correctly).

I also know that the change is only for new entries, so I deleted all persons, events, titles and any other table that might contain such a character and re-imported my gedcom. But it didn't work. There's still "??????" in stead of words, not only on screen but also in the database tables.

When I copy paste the characters, they display correctly in for example a search field. It also displays correctly in the results page (obviously saying it didn't find anything). But the database itself is still not showing properly...

ย 

But....

  • when I search for these characters in the normal "name" field, it displays it correctly and just says it hasn't found any; this is normal --> they are not the preferred name, but an alternative name, which TNG changes into a custom event as it only recognises one preferred name
  • when searching these characters in the custom event field where they are, I get an error message:

Query: SELECT p.ID, p.personID, lastname, lnprefix, firstname, p.living, p.private, p.branch, nickname, prefix, suffix, nameorder, title, birthplace, birthdate, birthdatetr, deathplace, deathdate, altbirthdate, altbirthdatetr, altbirthplace, burialdate, burialplace, p.gedcom, treename FROM tng_people AS p LEFT JOIN tng_trees on p.gedcom = tng_trees.gedcom INNER JOIN tng_events as e1 ON p.gedcom = e1.gedcom AND p.personID = e1.persfamID WHERE ( e1.info LIKE "%๐Žญ๐Ž ๐Žผ๐Žน๐Žบ๐Žข๐%" AND e1.eventtypeID = "10" ) AND p.gedcom="huygensv" ORDER BY p.lastname, p.firstname, IF(p.birthdatetr, p.birthdatetr, p.altbirthdatetr) LIMIT 50

Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'

ย 

So the problem is that somewhere, TNG still wants to use the old and not-to-be-used utf8_general_ci. Any idea on how to change that? I looked through the genlib, tng_begin and tngdblib .php files, but didn't see anything. The admin pages only give you the choice to set "UTF8" and no other specifications.

ย 

When I go into the database and manually change the value of " ???????????????????????????? (d-a-r-y-v-u-ลก) " into "๐Žญ๐Ž ๐Žผ๐Žน๐Žบ๐Žข๐ (d-a-r-y-v-u-ลก)", I get the following error message:

Warning: #1366 Incorrect string value: '\xF0\x90\x8E\xAD\xF0\x90...' for column `huygens-vandemoortel-ancestry_be_wordpress`.`tng_events`.`info` at row 1

(What I did was : )

UPDATE `tng_events` SET `info` = '๐Žญ๐Ž ๐Žผ๐Žน๐Žบ๐Žข๐ (d-a-r-y-v-u-ลก)' WHERE `tng_events`.`eventID` = 14641;

ย 

Link to comment
Share on other sites

I think as you have discovered there are issues with using utf8mb4ย  I think the people recommending utf8mb4 are dealing with software that has not yet been released like PHP 8 which also results in an illegal mix of collation sequences, as you have seen

1 hour ago, fluffy82 said:

Illegal mix of col๏ปฟlations (utf8mb4_unicode_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'

I am not a database expert so I am not sure how you should fix this, but I would think the first thing would be to restore your database to what it was previously

Link to comment
Share on other sites

Tom,
ย  ย  ย You need to change line 11 of tngconnect.php to utf8mb4 from utf8 in order to use the additional characters.ย  I just changed my name toย ๐Žญ๐Ž ๐Žผ๐Žน๐Žบ๐Žข๐ (Darius in cuneiform) as a test and that change let it work.ย  Since utf8mb4 has been out for 10 years due to it being in MySql 5.5 on release, I knew it had to be a setting hidden somewhere.

Brent

Link to comment
Share on other sites

5 hours ago, bhemph said:

Tom,
ย  ย  ย You need to change line 11 of tngconnect.php to utf8mb4 from utf8 in order to use the additional characters.ย  I just changed my name toย ๐Žญ๐Ž ๐Žผ๐Žน๐Žบ๐Žข๐ (Darius in cuneiform) as a test and that change let it work.ย  Since utf8mb4 has been out for 10 years due to it being in MySql 5.5 on release, I knew it had to be a setting hidden somewhere.

Brent

That did the trick! Thanks, Brent. You're a star.

Link to comment
Share on other sites

9 hours ago, Ken Roy said:

I think as you have discovered there are issues with using utf8mb4ย  I think the people recommending utf8mb4 are dealing with software that has not yet been released๏ปฟ

As Brent said, utf8mb4 has been around for a decade, and actually is standard in WordPress since WP4.4 (2015). I never took any notice before as I didn't need to, but apparently all my website's tables are in utf8mb4, except the TNG ones. Maybe something Darrin would want to look into for a future release.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...