Chapter 748 Computer Language


Chapter 748 Computer Language

“Manual intervention may still be needed, but the work intensity has been greatly reduced. We can scan pictures of rare characters into the computer by taking pictures, scanning, etc., and let the program analyze it. Finally, the standard dot matrix character is obtained, compiled The code is then added to the font library to expand the font library. "

"To realize this function, there are several prerequisites. The first is that there must be enough samples in the font library. We have already completed this, because we are now. There are already 40,000 standard dot matrix character manuscripts . ”

“Let’s take the “Kangxi Dictionary” as an example. We first scan the scanned image of each word in the “Kangxi Dictionary” into the text recognition system, then extract its font parameter characteristics, and then assign these parameters to Give existing standard dot matrix words and let the system calculate based on these The 'system words' generated by the parameters."

"Then check these generated 'system words' with the 'scan words', and verify them with the 'system words' generated by 40,000 standard words. The representativeness of the parameters, and finally adjust the parameters to the best.”

"After obtaining the parameter system, we can apply it to the scanned new characters, and finally obtain the nearly 60,000 single-character standard dot matrix characters in the "Kangxi Dictionary". The font library has been expanded from 40,000 to 60,000."< br>
“This idea is novel, but it is also feasible.” Mai Mingchuan nodded: “But there are still problems, namely storage and calculation.”

“I also use the "Kangxi Dictionary" as an example , Nearly 60,000 words are nearly 60,000 pictures. Let’s take a picture of 5 megabytes to calculate, which is 300G of space. This is too scary.”

“Then Dean Wang felt, How much image storage capacity can this system accept? "

"One G," Mai Mingchuan blurted out. After thinking about it, he thought it was a bit bullying: "At most, two G."


1G is 1024M, two hundred 5M pictures, two G is 400 pictures.

“In other words, if according to this standard, this system can complete the scanning and analysis of four hundred words and the work of converting words into the database at one time?”

“Then it still counts as serial Parallel." Li Hongjiang has already taken the bait and started thinking about how the program works: "When it comes to time costs, it is impossible for the school to use limited calculations. All resources are used to do this. "

"Can we ask our superiors for help?" Zhou Zhi asked: "Let's do the basic work first, and then apply for national-level computing resources to complete it. , is there any chance?”

“National resources are even tighter, and countless units across the country are queuing up.” Mai Mingchuan shook his head with a wry smile.

Mr. Gu said: "Let's just do this. Let's do the work in more detail. Let's slow down the program of character recognition and reverse calculation of dot matrix characters mentioned by Zhu Zi. The first step is to focus on setting the standards."< br>
“At the same time, on our side, we will expand the current 40,000-character manuscripts to 70,000.”

“Xiao Li, on our side, hurry up and develop a program to read the manuscripts. Let’s finish the digitization of the 70,000-word manuscript first.”

“After that, we will use the digital achievements of these 70,000 Chinese characters to negotiate with the Unicode Alliance, and we must leave enough room for further expansion, and strive to make our large character library a global unified standard.”

“With this large font library, we will develop several subsets to meet the needs of different domestic and international application scenarios. Is our first step completed at this point?”

"As for the text recognition that Jiuzi said, it is also very important." Gu Zhenduo added: "This is our next step in digitizing classics!"

Gu Lao couldn't help but sigh: "This is not a resource. It’s limited, are you afraid you won’t be able to make it?”

“That’s not necessarily the case,” Zhou Zhi said, “We can do that. If you submit all these ideas together, we are asking for a lot of money, and we are waiting for the ministries and commissions to come down and pay back the money. It depends on what we do, and what we get depends on it!"

The big guys all burst into laughter, they understand this. In ministries and commissions, it is almost impossible for others to meet your requirements 100% without trying to manipulate you. If they can satisfy you even 50%, it is just a sign of mercy. Therefore, you might as well make the pancake bigger. Even if you cut it in half, a small pancake will still be enough to eat.

Today is just a retreat meeting to unify ideas and study the possibility of interdisciplinary cooperation in liberal arts and sciences. Now that we can get a relatively unified opinion, it has exceeded the expectations of the meeting.

The main credit here is the preliminary work that Zhou Zhi has completed.

Li Hongjiang held Zhou Zhi's hand and said eagerly: "Why don't you study for another degree in information engineering? I think that although Zhu Ziqian is a liberal arts student, he has such a good foundation in information technology, so he should study for another degree. There’s no problem at all.”

“My energy is really limited.” I could only decline Li Hongjiang's kindness: "But I am very interested in the text recognition system. If Professor Li is interested, I can also participate in the research."

"Are you familiar with programming? BASIC or PASCAL ." Until now, Li Kaijiang never believed that Zhou Zhi was an outsider.

"I'm more familiar with C language." Zhou Zhi recalled the fear of being dominated by code.

"C?" Li Hongjiang felt as if he had found a treasure: "What about UNIX?"

"UNIX is okay." Zhou Zhi asked: "This is a new product introduced in the school A minicomputer? One that can run ANSIC?”

For most people, the conversation between the two is like a bible.

The most basic computer language is of course the instruction set that directly operates the chip. For example, the earliest punched paper tape was directly a combination of zeros and ones.

This language is the most straightforward for computers, but the least user-friendly.

So people invented a set of "basic instruction sets" based on chips, which is assembly language.

Assembly language is also targeted at chips, but at least it allows professionals to understand what the program is supposed to do for the machine.

But there is a problem with assembly language, that is, it cannot be transplanted, because it is developed for a certain type of chip. If it is switched to another type of chip, the other type of chip cannot understand it.

So humans came up with another way to design the concept of compiled libraries and higher-level programs. This is a process-oriented programming language, and C language is the leader among them.

C language is difficult because it has the concise and efficient characteristics of assembly language, so it works quickly, the code is compact, and its readability is better than assembly language. The most important thing is that it is easy to debug, modify and transplant.

Programmers are only responsible for writing code. After the code is written, it is compiled and converted into binary code that can be read by the computer, called an "application".

For different chip types, there are different compilation methods in the compilation library, which can compile programs written in C language into programs that can run on different systems.

This solves the problem of program transplantation, allowing the same program to be used in DOS systems, Apple systems, and UNIX systems without having to write three separate programs.

(End of this chapter)

Previous Details Next