It's difficult to clarify the semantics of GZ-2352301 and U+20F4F. Given that most instances of the pairs are already encoded separately, it seems better not to unify.
This character is also used in VIetnamese, with reading "mảnh", as shown on P. 786 of ĐTĐCN shown below. Vietnam will also propose horizontal extension to 2D3EB, using the form ⿰土莽, which is better attested.
The analysis in the evidence says that the shape is composed of "hắc (黑)" and "ảo (幻)". As shown below, 幻 read as nôm "ảo" has one meaning of dark, obscure, etc. This is the semantic in this character.
xxx Is a different word, "u" with a similar meaning of "dark"
Oppose Unification
[ Unresolved from v3.0 ]
I didn't finish the above, but meant to say that 黝 (U+9EDD) is basically a different word "u", used in the compound "u hắc", whereas VN-F0188 is "ảo", having a wider range of meanings: "not current; obscure, ephemeral, etc." They should be kept separate.
Unihan data, the ORT Attributes predictor, and most other candidates in WS2024 give 8. It would be better to be consistent.
Total Stroke Count
[ Unresolved from v1.0 ]
Given the variations across geographies and font designs, and the fact that unification precludes most shape-based determination of attributes, CJKJRG / IRG originally chose to use the Kangxi values, the most common denominator in dictionaries used by the CJKV countries. This avoided a lot of fruitless debate. Kangxi is 9 strokes, but as you point out, that later changed. I'm fine with either 8 or 9, but we should be consistent moving forward and change the ORT tools to support our decision. Otherwise, maybe we should just stop using TS.
Almost all cases of V-source characters with 光 on the left side have 10 as the primary radical. I'm not sure it's a good idea to change. Also, kRSUnicode for 光 is 10, not 42. We should change that.
The IRG Attributes Predictor counts 巨 as 5 strokes, Unihan has 4. We should discuss and document the stroke count we are going to use and fix the ORT if we decide it's 4. Otherwise keep TC=18.
The IDS proposed above seems confusing. 㓁 is a variant of rad. 122 and always appears above. If we merely want to reduce the # of strokes, U+5197 would be better since it can have the shape ⿱冖儿.
The attributes predictor tool gives 8 for the stroke count. https://hc.jsecs.org/irg/ws2021/app/attributes-predictor.php?ids=%E2%BF%B1亡目务&radical=109.0
18 is correct. According to the Attributes Predictor and Unihan data, 羊 is 6, not 7. Both give 10 strokes for 羞.
Total Stroke Count
[ Unresolved from v3.0 ]
The current Unihan data shows varying counts for the sheep radical in this position, although 7 does seem more common. It seems somewhat more intuitive to use 6, since the base character is 6 and the design of many glyphs use the base ⺶, not the ⿱𦍌丿. Either way, we need to update the unihan data and tools.
ĐTĐCN:0885, reading "nỉ". It is noteworthy that the structural analysis given is 口 + 己 (kỉ), which differs phonetically and semantically from 口 + 已 (dĩ) in Vietnamese.
This also appears in the Khó Chữ Hán Nôm Mã Hoá, the last published document with the status of a national standard.
As noted in the first evidence above, this is composed with a "nháy" reading mark. The problem is that Unicode does not allow variation of marks, so there is no way to encode this separate from another character VN-F0074 using IVS
The analysis in the evidence says that the left side is half of "huấn" (訓), hence the 川. We could consider revising the glyph so that it looks more like the right side of 訓.
This is also found in the Vietnamese standard: Kho chữ Hán Nôm mã hoá, as V+6056F, see image below
One thing to consider. Vietnamese uses both U+8862, in the original sense of road, intersection, with Sino-Viet reading "cù" and VN-F052A. Only VN-F052A is found with the reading "cò", meaning "stork" or "egret". The element feather, 羽, appears to distinguish the meaning.
This is currently the only example we have of this character. The component ⿻沈丶 is thought to derive from a simplified form of 㴷 (đắm: shipwrecked, see TĐCNTD p. 341), where 耽 has been reduced to the form V+60779 shown below:
The more common form is V+607C5 in the above, also shown here from the same source as VN-F2002:
The reading is shown in the transliteration that on the page that follows (in green). The note explains that the original text is in Vietnamese, so no translation is given. "phượng" is the legendary bird typically translated as "phoenix". It is more commonly written: 鳯.
The evidence is quite clear as to shape and meaning. As to the unusual shape, it's true that this is the only example we have found of 𫠓, used as a radical, so we could consider using the full form, 鳥.
This is the only example in Vietnamese of 類 as a component. The preferred form is 類 (V1-6C22), so it would be better to normalize the glyph to reflect that.
There are more than 40 glyphs using the same design in the NomNaTong font. It would be a significant effort to change them all. We would need to better understand the rationale for this design before making such a change.
There are 21 Vietnamese characters with 叕 as an immediate constituent. The distribution of the stroke shape in question is about half and half. We will investigate the issues with normalization.
The glyph already has that general shape, can you provide more detail?
Normalization
[ Unresolved from v3.0 ]
Note that the horizontal extension will are working on will include U+6447 摇, U+9065 遥, and U+7476 瑶, as well as U+7AB0 窰, the right side of VN-F04BE. U+7AB0 currently has the shape shown below.
VN-F04BE and U+7AB0 both already have the suggested general structure, ⿱爫缶. Is the desire here to move the 爫 one or two pixels up and to the left so it is the same as U+55C2, etc?
The phonetic, "dan" argues for U+67EC. Here is another analysis (Vũ Văn Kính, "Tự điễn chứ Nôm" p. 225) showing that the traditional and simplified forms both contain U+67EC, read "lan", as phonetic.
The element on the right is a simplification of the characters 沒 / 没, read "một", through these steps 没 > 𠬛 > 𠬠 or 𱥺 > 𠬠. There are 2 basic forms, 𠬠 and 𰰝. This is documented in the character definition shown in the image below from TĐCNTD p. 802
Below is an example of VN-F0CBC from "Lục Vân Tiên" showing a form somewhat between 𠬠 and 𰰝
Historically, there are many examples of 𰰝, but the current trend is to standardize on 𠬠, as shown in this the "BẢNG CHỮ HÁN NÔM CHUẨN THƯỜNG DÙNG" http://www.hannom-rcv.org/NS/bchnctd%20300623.pdf
About the comment #8629. I assume that the suggestion is to change the 亅 in the top 可 to 丨. That's a reasonable suggestion, but there are at least 6 other V-source characters already encoded (𣘁 U+24819, etc.) that use the current design. Since that's a majority, the lesser impact solution would be to normalize the remaining characters 哥 U+54E5, 歌 U+6B4C, U+2BC04, and VN-F176E (in WS2021) to use 亅.
This is not a significant difference. Most of the glyphs in NomNaTong with the 鬼 component retain the 厶. Over time, we will normalize to that shape. We can update the normalization guidelines
Both variants are found in Vietnamese, TĐCNDG entry shown below has U+79C3 秃. In the NomNaTong font there are 5 glyphs composed with U+79C3 秃 and 5 composed with U+79BF 禿. Of the characters with V-Source references, if we normalize to 禿, we would also want to change 𥟉 U+257C9 / V3-3531 and 𥟹 U+257F9 / V2-7F31. If we normalize to 秃, we would only change U+22B33 / VN-22B33
A reviewer, Luu Quang Truong, points out that closer inspection of the original font reveals the top element to be 主. See image below. This makes sense as a phonetic, and there are other examples: "gio": 𠰍 (giỏ), 𬚶 (giỏ), etc. We propose changing the glyph and attributes to reflect that.
SC=5, FS=4, TS=9
This was normalized incorrectly. As can be seen here
The original structure was ⿰黑⿱⿰夕丰木. The word "kịt" means dark, dense. Based on the phonetic value, "kịt", this should have been normalized to ⿰黑桀, with 桀 as phonetic.
What is the justification for labeling this as similar to U+31FC3?
Other
[ Unresolved from v2.0 ]
Evidence # 3 for UTC-03292, which has 逃入清化 (he fled into Thanh Hoá), parallels the phrase 奔清⿱花一 above and suggests that this character is a variant of 化 (U+5316, read hoá). Thanh Hoá is more commonly written 清化.
#9327 shows that we are already living with script-hybrid characters without any problem. Since the nature and attributes of these characters are not fundamentally different from CJK Ideographs, I see no need to postpone.
According to the analysis in the dictionary, 冫 is a mark indicating that this character is to be read differently from its standard Sino-Vietnamese reading of "cặp".
Both characters, U+23813 and VN-F0423 mean "a type of bamboo". The major sources, BTCN, ĐTĐCN, GĐNHV, KCHN, and Takeuchi, all show the form VN-F0423, with 竹, appropriately, as the radical. Since VN-F0423 appears to be the correct form, if we were to unify these, Vietnam would request changing the representative glyph for U+23813 to be that of VN-F0423.
IRG Working Set 2024v4.0
Source: Lee COLLINS
Date: Generated on 2026-01-15
Unification
Showing 5 comments.
xxx Is a different word, "u" with a similar meaning of "dark"
Attributes
Showing 143 comments.
Evidence
Showing 17 comments.
Ngũ Thiên Tự (五千字), p. 34, reading "dèm"
As noted in the first evidence above, this is composed with a "nháy" reading mark. The problem is that Unicode does not allow variation of marks, so there is no way to encode this separate from another character VN-F0074 using IVS
One thing to consider. Vietnamese uses both U+8862, in the original sense of road, intersection, with Sino-Viet reading "cù" and VN-F052A. Only VN-F052A is found with the reading "cò", meaning "stork" or "egret". The element feather, 羽, appears to distinguish the meaning.
The more common form is V+607C5 in the above, also shown here from the same source as VN-F2002:
An appropriate normalization would be V+607C5.
V4-407A is currently encoded as 𫢠 U+2B8A0. One solution would be to move V4-407A to WS2024:00144 and change kIRG_VSource for U+2B8A0 to VN-2B8A0.
Glyph Design & Normalization
Showing 25 comments.
VN-F04BE and U+7AB0 both already have the suggested general structure, ⿱爫缶. Is the desire here to move the 爫 one or two pixels up and to the left so it is the same as U+55C2, etc?
Below is an example of VN-F0CBC from "Lục Vân Tiên" showing a form somewhat between 𠬠 and 𰰝
Historically, there are many examples of 𰰝, but the current trend is to standardize on 𠬠, as shown in this the "BẢNG CHỮ HÁN NÔM CHUẨN THƯỜNG DÙNG" http://www.hannom-rcv.org/NS/bchnctd%20300623.pdf
SC=5, FS=4, TS=9
The original structure was ⿰黑⿱⿰夕丰木. The word "kịt" means dark, dense. Based on the phonetic value, "kịt", this should have been normalized to ⿰黑桀, with 桀 as phonetic.
Other
Showing 16 comments.