Chapter 745 I use it too
But the Chinese character encoding in the previous life also had problems, that is, the Unicode encoding came out too late, which resulted in Microsoft having to adopt an extended encoding based on GB13000. For this reason, As a result, the national standard had to be patched on the basis of GB13000 encoding, expanded into GBK, and then expanded into GB18030.
The final GB18030-2005, the full name of the national standard GB18030-2005 "Information Technology Chinese Coded Character Set", is fully compatible with GB2312-1980, basically compatible with GBK, and supports all unified Chinese characters of GB13000 and Unicode. A total of 70,244 Chinese characters are included.
At that time, Unicode did not include as many Chinese characters as GB18030-2005. Although it could theoretically accommodate all Chinese characters, countless code bits were empty.
The final situation is that an old system has been patched with patches, while a new system has a large number of empty code bits and no one has done the filling work. As a result, decades later, Chinese characters still exist in the information system. A big problem is that transcoding is not fully compatible.
Zhou Zhi, a state-owned enterprise programmer in later generations, was deeply affected by this, so he believed that the key to solving this problem was that the country should abandon the cramped ISO/IEC1064 from the beginning, and first grab enough Chinese character space in the Unicode standard , at least grab 100,000 code points to fill them in, and use this as the only mandatory standard, which is used all over the world.
So he said: "Isn't it just right that there is not a single stroke in the eight characters? Only if there is not a single stroke in the eight characters can we participate deeply. As long as there are three sections of code space left for us, we can accommodate 100,000 Chinese characters."
“And Unicode only has the concept of encoding, and its design purpose itself is to hold all kinds of texts in the world.”
“ Chinese character encoding is undoubtedly the most complicated character encoding work in the world. If we complete this work, we will have full say in the organization. In the future, we can also guide the work of other countries and organizations and help us write other ethnic languages. It also serves as a foundation.”
Now it is the turn of the literary and historical experts on Mr. Gu’s side to understand what Zhou Zhi and Li Hongjiang were discussing.
Mr. Gu interrupted the lively discussion between the two: "Elbow, Xiao Li, which one of you can explain it first in words that us old men can understand?"
Mai Mingchuan smiled and said: "I understand the general meaning. Okay, let me explain it first to see if it is correct. If not, Xiao Li and Zhou Zhi will add more.”
“There are two sets of standards now, one is ISO/IEC1064. This system has matured. Of The first part has been promulgated, but our country has developed GB13000 based on it, which can be implemented quickly.”
“But this system has a big problem, that is, there are too few code bits and can only accommodate 21,000. It seems that there is still a long way to go before fully meeting the needs of zero and three Chinese characters."
"There is also a set of standards, which is Unicode."
"As long as the coding range allocated to Chinese characters is sufficient, this set of standards can accommodate all our Chinese characters, and in the future we can continue to capture more coding ranges for further expansion, or be used to code other ethnic minority characters. .”
“From a design principle, Unicode The standard is actually better than ISO/IEC1064. However, this standard is still only half-baked, and the first version has not yet been released. If we want to use the Unicode standard, we must first improve the standard, and then talk about interval allocation and next steps. One step of work ”
“Xiao Li’s. What it means is that we will use GB13000 first. We already have the foundation for GB2312. This method is familiar and the results will be quick. "
"What I mean is that we should start Unicode from the beginning and get it right in one step. , since the Unicode standard has not yet been decided, then we will Actively participate in it and work on the standards together!”
“It would be the best result if we can really achieve what Zhuzi said. But do we have the strength?” Mr. Gu said to the national information industry. I still have the impression of a fierce pursuit from the beginning, but I am worried that the country will not be able to complete this task with the current technical strength.
"In fact, they have basically completed this work." Li Hongjiang said: "Most computers use the American Standard Information Interchange Code, which is ASCII code. It is a 7-bit code that represents all uppercase and lowercase letters, numbers, punctuation marks and control characters. scheme. Unicode has been given to ASCI The I code has been compiled, and '\u0000' to '\u007F' correspond to all 128 ACSII characters. ""In other words, the computer system can actually use Unicode encoding, but it has not yet formed a big standard? "< br>
"There are still many areas that need to be improved." Li Hongjiang said: "Of course, now that the ACSII problems have been solved, at least the architecture has matured, and the rest are minor problems."
"If , I mean if, we can have a 100,000-level I believe the alliance would be very interested if the content of the code space is available for them to fill in. "
There is a saying in later generations, which is called "first-rate companies make standards, second-rate companies make brands, and third-rate companies make products." Now. GBK and Unicode are actually a battle for standards.
Zhou Zhi added: "This is a major event that affects the whole world. To put it bluntly, it is a battle for standards."
"China's right to speak in the world's information industry can be said to be insignificant, but The Chinese character library can be called a special resource."
"It is possible to add up all the symbols of all alphabetic language countries in the world. There are not as many Chinese characters as there are in China.”
“If we complete this font first, then Unicode can be shown to the world as its absolute advantage.”
"It's like GBK is still using tank cannons, and Unicode has detonated a hydrogen bomb."
"We can definitely use our own results, pay membership fees, and become members of the organization."
Li Hongjiang did some research on this organization and said: "The Unicode Alliance is a Unicode organization located in California, USA. They actually It allows any company or individual who is willing to pay membership fees to join."
"Two organizations were established in the late 1980s, one is the commercial organization of the Unicode organization, and the other is for international cooperation. The International Standardization Organization. Under the needs of computer popularization and information internationalization, they respectively established the Unicode organization and the ISO-10646 working group. ""
"They soon discovered the existence of each other, and everyone worked hard for it. Working for the same purpose, the two organizations worked together to develop universal codes suitable for languages in various countries, and published Unicode and ISO-10646 character sets in a tacit understanding. Although the character set encodings of the two are actually the same, they are actually two different standards. "
"The Unicode Alliance first released TheUnicodeStandard the year before last. Unicode was developed in conjunction with ISO/IEC10646, the universal character set, formulated by the International Organization for Standardization. The two are actually the same in terms of how coding works. ”
"But TheUnicodeStandard contains more detailed implementation information, covering more detailed topics such as bit encoding, proofreading, and rendering. It even enumerates many character characteristics, including those that must support two reading directions, such as The normal reading direction is from left to right, just like the right to left direction of Arabic. ”
My go! Zhou Zhi's eyes met Gu Kailai's and Dan Zeng's eyes instantly in the air. The reading habit of ancient Chinese classics is also from right to left!
I can use it in Arabic, and I can also use it in Chinese classics!
(End of this chapter)