Unihan data, the ORT Attributes predictor, and most other candidates in WS2024 give 8. It would be better to be consistent.
Total Stroke Count
[ Unresolved from v1.0 ]
Given the variations across geographies and font designs, and the fact that unification precludes most shape-based determination of attributes, CJKJRG / IRG originally chose to use the Kangxi values, the most common denominator in dictionaries used by the CJKV countries. This avoided a lot of fruitless debate. Kangxi is 9 strokes, but as you point out, that later changed. I'm fine with either 8 or 9, but we should be consistent moving forward and change the ORT tools to support our decision. Otherwise, maybe we should just stop using TS.
The IRG Attributes Predictor counts 巨 as 5 strokes, Unihan has 4. We should discuss and document the stroke count we are going to use and fix the ORT if we decide it's 4. Otherwise keep TC=18.
The IDS proposed above seems confusing. 㓁 is a variant of rad. 122 and always appears above. If we merely want to reduce the # of strokes, U+5197 would be better since it can have the shape ⿱冖儿.
The attributes predictor tool gives 8 for the stroke count. https://hc.jsecs.org/irg/ws2021/app/attributes-predictor.php?ids=%E2%BF%B1亡目务&radical=109.0
18 is correct. According to the Attributes Predictor and Unihan data, 羊 is 6, not 7. Both give 10 strokes for 羞.
Total Stroke Count
[ Unresolved from v3.0 ]
The current Unihan data shows varying counts for the sheep radical in this position, although 7 does seem more common. It seems somewhat more intuitive to use 6, since the base character is 6 and the design of many glyphs use the base ⺶, not the ⿱𦍌丿. Either way, we need to update the unihan data and tools.
We don't currently have an original source for this. It is possibly a misprint. "cứt" means "excrement", so an appropriate semantic would be 屎 (thỉ), as in U+21CDB 𡳛. The analysis shown in the entry above says VN-F0156 is composed of "thỉ cát" (cát is the phonetic). There are no known readings for ⿸尸示 in Vietnamese and it's unlikely that it ever appears in a Vietnamese context. The author of the "Giúp đọc Nôm và Hán Việt" surely meant 屎 in his analysis.
I agree that the reading suggests some other component, but the analysis in the entry clearly says "kí" (記). Unfortunately, we no longer have access to Father Anthony's notes.
We don't have another example of this particular character on hand, but this simplified form is found in a number of characters. For example: U+529E 办 < 辦; U+32957 < 閒; U+31E33 𱸳 < 𧗱. Typically, it indicates that a surrounding component has been simplified. VN-F2040, is a dialect form of "lẽ", meaning "reason", ultimately from Chinese 理. In this case it's a simplification of 𨤧 U+28927 and the two strokes on either side of 里 represent the two strokes, 丿and 丶 on either side of the vertical in 尔. In retrospect, we could have normalized this to ⿻里八, as in other examples.
The analysis says it's composed of "thân" 抻 and "viên" 爰. "vươn" means "pull", so 抻 is an appropriate semantic. The problem is the phonetic. It's not a misprint of U+2B8F0, but the phonetic should be 爰 (U+7230), not 爱 (U+7231).
As you can see in this image from "Khó Chữ Hán Nôm Mã Hoá" p. 613, all other characters with the "vươn" phonetic show 爰:
Normalization
We propose to change the glyph to use 爰 (U+7230) as phonetic. New attributes will be: SC=14, TS=17. IDS will change to ⿰抻爰
This is the only example in Vietnamese of 類 as a component. The preferred form is 類 (V1-6C22), so it would be better to normalize the glyph to reflect that.
There are more than 40 glyphs using the same design in the NomNaTong font. It would be a significant effort to change them all. We would need to better understand the rationale for this design before making such a change.
There are 21 Vietnamese characters with 叕 as an immediate constituent. The distribution of the stroke shape in question is about half and half. We will investigate the issues with normalization.
The glyph already has that general shape, can you provide more detail?
Normalization
[ Unresolved from v3.0 ]
Note that the horizontal extension will are working on will include U+6447 摇, U+9065 遥, and U+7476 瑶, as well as U+7AB0 窰, the right side of VN-F04BE. U+7AB0 currently has the shape shown below.
VN-F04BE and U+7AB0 both already have the suggested general structure, ⿱爫缶. Is the desire here to move the 爫 one or two pixels up and to the left so it is the same as U+55C2, etc?
Rule 3-4 is meant to apply specifically to the entire component, 爭, since 𠂊 is not universally a simplification of 𫜵. The analysis in the evidence suggests that VN-F05B0 is composed of radical 162, "xích", and the character read "quýnh" in Sino-Vietnamese. Most Vietnamese sources show "quýnh" as 敻 (U+657B) or 夐 (U+5910), so we have decided to normalize to that shape. This is the same logic applied in the case of (U+32F78).
The phonetic, "dan" argues for U+67EC. Here is another analysis (Vũ Văn Kính, "Tự điễn chứ Nôm" p. 225) showing that the traditional and simplified forms both contain U+67EC, read "lan", as phonetic.
The element on the right is a simplification of the characters 沒 / 没, read "một", through these steps 没 > 𠬛 > 𠬠 or 𱥺 > 𠬠. There are 2 basic forms, 𠬠 and 𰰝. This is documented in the character definition shown in the image below from TĐCNTD p. 802
Below is an example of VN-F0CBC from "Lục Vân Tiên" showing a form somewhat between 𠬠 and 𰰝
Historically, there are many examples of 𰰝, but the current trend is to standardize on 𠬠, as shown in this the "BẢNG CHỮ HÁN NÔM CHUẨN THƯỜNG DÙNG" http://www.hannom-rcv.org/NS/bchnctd%20300623.pdf
This is not a significant difference. Most of the glyphs in NomNaTong with the 鬼 component retain the 厶. Over time, we will normalize to that shape. We can update the normalization guidelines
Both variants are found in Vietnamese, TĐCNDG entry shown below has U+79C3 秃. In the NomNaTong font there are 5 glyphs composed with U+79C3 秃 and 5 composed with U+79BF 禿. Of the characters with V-Source references, if we normalize to 禿, we would also want to change 𥟉 U+257C9 / V3-3531 and 𥟹 U+257F9 / V2-7F31. If we normalize to 秃, we would only change U+22B33 / VN-22B33
A reviewer, Luu Quang Truong, points out that closer inspection of the original font reveals the top element to be 主. See image below. This makes sense as a phonetic, and there are other examples: "gio": 𠰍 (giỏ), 𬚶 (giỏ), etc. We propose changing the glyph and attributes to reflect that.
SC=5, FS=4, TS=9
This was normalized incorrectly. As can be seen here
The original structure was ⿰黑⿱⿰夕丰木. The word "kịt" means dark, dense. Based on the phonetic value, "kịt", this should have been normalized to ⿰黑桀, with 桀 as phonetic.
Normalization
We have the updated glyph and can provide it at the next opportunity.
What is the justification for labeling this as similar to U+31FC3?
Other
[ Unresolved from v2.0 ]
Evidence # 3 for UTC-03292, which has 逃入清化 (he fled into Thanh Hoá), parallels the phrase 奔清⿱花一 above and suggests that this character is a variant of 化 (U+5316, read hoá). Thanh Hoá is more commonly written 清化.
#9327 shows that we are already living with script-hybrid characters without any problem. Since the nature and attributes of these characters are not fundamentally different from CJK Ideographs, I see no need to postpone.
IRG Working Set 2024v4.0
Source: Lee COLLINS
Date: Generated on 2026-04-15
Unification
Showing 2 comments.
Attributes
Showing 140 comments.
雨 = 8
田 = 5
奚 = 10 (爫=4+幺=3+大=3)
SC=23
Evidence
Showing 11 comments.
Glyph Design & Normalization
Showing 26 comments.
As you can see in this image from "Khó Chữ Hán Nôm Mã Hoá" p. 613, all other characters with the "vươn" phonetic show 爰:
VN-F04BE and U+7AB0 both already have the suggested general structure, ⿱爫缶. Is the desire here to move the 爫 one or two pixels up and to the left so it is the same as U+55C2, etc?
Below is an example of VN-F0CBC from "Lục Vân Tiên" showing a form somewhat between 𠬠 and 𰰝
Historically, there are many examples of 𰰝, but the current trend is to standardize on 𠬠, as shown in this the "BẢNG CHỮ HÁN NÔM CHUẨN THƯỜNG DÙNG" http://www.hannom-rcv.org/NS/bchnctd%20300623.pdf
SC=5, FS=4, TS=9
The original structure was ⿰黑⿱⿰夕丰木. The word "kịt" means dark, dense. Based on the phonetic value, "kịt", this should have been normalized to ⿰黑桀, with 桀 as phonetic.
Editorial
Showing 13 comments.
Other
Showing 14 comments.
Submitter Request
Showing 2 comments.