Please wait while loading

IRG Working Set 2021v6.0

Source: Henry CHAN
Date: Generated on 2024-04-25

Hide Deleted | Show comments from version: 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Unification

SnImage/SourceComment TypeDescription
03560
03560
虫 142.8.3
GKJ-00436
TS 14 · IDS
Unification
U+27342
Unifiable to 𧍂 (U+27342)?

U+27342 comes from Kangxi, with the given text: 【備考】【申集】【虫字部】 【五音篇海】音肴。又音豪。 which indicates they are the same character.
Unification
To add to Kushim's comment in #15443, here are the variants from MOE Dictionary:




Suggest China to update the glyph of 𧍂 (U+27342) to ⿰虫肴, unify GKJ-00436 to 𧍂 (U+27342).
Unification
U+27342
Even if the various forms of 𧍂 (U+27342) are stably used in dictionaries, the fact that there were historically two forms considered more or less canonical does not mean we need to disunify them.

First and foremost these are different canonical shapes in different dictionaries, and there is no contrast in shapes in the same dictionary.

Second there is also no contrast in meaning in the same running text.

There are variants without a doubt and the abstract shape is the same which is more than sufficient for unification.

If some digitization projects wish to display one canonical form over the other, that is solely within the realm of Ideographic Variation Sequences. ISO 10646 is a character standard, not a glyph standard.
Unification
In previous IRG meetings, we already have historical precedence of unification to an existing Extension B character, then subsequently updating the Extension B character to take the canonical shape. Some of these characters even involved the GHZ single sourced characters, as-is this case. IRG should be consistent in its handling of these cases.
Unification


We must also take into account that different dictionaries in different ages have different standard for what is considered "canonical". We cannot simply encode another variant as a different character because some dictionaries at some historical period considered another form as canonical. Otherwise, all the characters containing "歩" instead of "步" need to be disunified as "歩" is considered the canonical form historically in Tang dynasty while "步" is considered the canonical form in China in modern day.
Unification
Another case in WS2021 where we have unified the canonical shape to the non-canonical shape found in some authoritative dictionaries:

Unification
Another case in WS2017, which involves a taboo character, however it was not found as a head character.

Unification
Unification
04730
04730
鳥 196.9.1
GKJ-00739
TS 20 · IDS
Unification
U+2A13B
Unify to 𪄻

Level 2 UCV for 舂 and 春
Unification
Here are some examples from the MOE Dictionary:



The form using 春 instead of 舂 is particularly common in 《集韻》.
01343
01343
心 61.8.5
GKJ-00748
TS 11 · IDS
Unification
U+60F1
Unify to 惱.

This is a variant of 惱 without a doubt in the included evidences.

Potentially new UCV level 2 甾 & 𡿺.
Unification
U+60F1
Unify to 惱; and add level 2 UCV 甾 & 𡿺.

We have one existing disunification case in Extension B where U+254F2 is disunified from U+78AF. They are both variants of 瑙 U+7459.
U+78AF
U+254F2


We have another disunification case in Extension A where U+4409 is disunified from U+8166 because they are non-cognate.
U+8166
U+4409


Note, ROK has a normalization rule #190-1 in IRGN2573 which covers this exact case:



Therefore the variation should be systematic and pretty common in handwriting.
01789
01789
水 85.9.1
KC-05249
TS 12 · IDS
Unification
U+6DC5
Unifiable with 淅?
03165
03165
羊 123.11.1
SAT-06015
TS 17 · IDS
Unification
U+263BC
U+263AF
𦎼 (U+263BC) / 𦎯 (U+263AF).

> The two comments together list 22 disunification examples, comment #14276 8 examples and #14290 14 examples. Also that GHZR42524.09 was withdrawn. Withdrawing a character can be because the submitter does not agree to unification. The quote that says unifiable was not made by the submitter. The case for removing 殸 is very strong.

Examples are from Extension B, which are not considered valid prior examples of disunification by IRG.

> The inclusion of ⿱殸⬚ in UCV 312d seems unreasonable as it is not an example of "differences in relative length of strokes" (j-2). UCV 312d should only cover ⿱𣪊⬚ and ⿹𣪊⬚, and ⿱殸⬚ should be removed from the rule.

UCV #312d should be moved away from the section J-2 and moved into section j-3 Unification of similar shapes.

The fact that 𣪊 is often miswritten as 殸 is not disupted. Sufficient evidence also exists for this particular charcter. For China's case they may prefer to withdraw a form which is malformed, but SAT does not assign "it is an error or not" determination to a character. To suggest to encode a character via IVS implies the characters are unifiable.

Suggest to unify and encode as IVS, and keep the UCV rule as-is.
02629
02629
疒 104.10.5
UK-20388
TS 15 · IDS
Unification
U+24E3B
Unify to 𤸻 (U+24E3B) or potentially withdrawn.

The given evidence from UK and the evidence from Tao Yang suggest that the phonetic component should be 拏, not 挐. Even though in some sources 挐 is considered a variant of 拏, they are considered separate characters by various versions of Shuowen.


Attributes

SnImage/SourceComment TypeDescription
02305
02305
牛 93.1.5
UK-20572
TS 4 · IDS 𠃊
Radical
Change Radical to 5.0 (乙), SC=3, FS=5 after the glyph change.
00027
00027
丨 2.14.4
VN-F1C8D
TS 15 · IDS 𱷥
Radical
Radical 212.2, SC=4, FS=2


Evidence

SnImage/SourceComment TypeDescription
01474
01474
手 64.9.1
GDM-00269
TS 12 · IDS
Evidence
Based on the pronunciation, it is a potential misprint?

I suggest to postponed this character.
04695
04695
鳥 196.5.1
GKJ-00329
TS 16 · IDS
Unclear evidence
Per #11949, appears to be a misprint of 䳄. Are there more evidences for ⿰世鳥?
04775
04775
鳥 196.13.1
GKJ-00335
TS 24 · IDS
Evidence
This looks like a valid phonetic component replacement so I suggest keeping the character.
04712
04712
鳥 196.7.2
GKJ-00347
TS 18 · IDS
Evidence
Consider changing the source reference to the GZ one because the GZ one can be looked up directly.
02664
02664
鳥 196.5.5
皮 107.11.3
GKJ-00348
TS 16 · IDS
Evidence
Consider changing the source reference to the GZ one because the GZ one can be looked up directly.
02954
02954
竹 118.11.1
UK-20060
TS 17 · IDS 𨊸
Evidence
This character was marked as "No Change" without any record of any reply from UK.
04215
04215
長 168.9.3
VN-F1BD7
TS 16 · IDS
Evidence
The evidence from Andrew and Eiso seem to be a non-cognate but identical shaped character, maybe horizontal extension can be done.


Glyph Design & Normalization

SnImage/SourceComment TypeDescription
01942
01942
毛 82.5.4
UK-20228
TS 9 · IDS
Glyph design
Consider changing the glyph to ⿺毛玍 based on comment #8367
01713
01713
日 72.13.4
VN-F191D
TS 17 · IDS
Normalization
Potentially normalize the shape on the right to 淫 because as in the dictionary entries, 淫 is also a common form when used as a component.


Editorial

SnImage/SourceComment TypeDescription
01146
01146
山 46.8.4
GDM-00251
TS 11 · IDS
Editorial issue
Fixed the status of this character on 2024-03-18.
03607
03607
虫 142.10.1
SAT-07153
TS 16 · IDS 巿
Editorial issue
I checked and have updated the total stroke count to 16. (Previously I marked it as 17). I miscounted the number of strokes for 臣 as 7. Correct value for 臣 is 6.
02888
02888
禾 115.12.5
TE-7465
TS 17 · IDS
Editorial issue
It is a known issue that when the source reference is changed the ORT stops being able to track the glyph of the previous source reference. Sorry for the inconvenience.