Age structure of coauthors in Chinese information science

Liang Liming, Yongzheng Guo
Institute for Science, Technology and Society,
Henan Normal University, China


Hildrun Kretschmer
Free University Berlin, Germany

1 Introduction

Structures of scientific collaboration are the issues which scientometricians, sociologists and psychologists are interested in. Scientometricians try to create mathematical models revealing and describing various types of common and special structures in scientific collaboration, while sociologists and psychologists always want to find the historical, social and psychological explanation of the structures. Relative papers had been published.

However, as one kind of such structures, age structure of scientific collaboration, or ASSC for short, has been paid less attention. Up to now not so many case analyses, as our expectation, had been studied because of the lack of data. In China we have seldom seen such kind of research.

Not only scientists, but also the public wonder what Chinese ASSC look like? Do the scientists who are the similar age study together, or older scientists collaborate with young men? Which roles do the scientists in different age-groups play in scientific collaboration? What are the characteristics of ASSC in different disciplines, organizations, regions and scientific communities? What is the relationship between ASSC and its performance of scientific research? What is the difference between Chinese ASSC and the ASSC of other nations? How does the age fault in the ranks of Chinese scientists make impact on Chinese ASSC? More and more problems tempt and force us to study hard to find the answers step by step. This paper is only a case analysis of ASSC in Chinese information science.

2 Data and Methods
2.1 Data


There are many academic journals in the field of Chinese information science, but only a few offer the age records of authors. After many times of consultation and comparison three journals are selected. One is Computer Research and Development undertaken by Institute of Computing Technology, Chinese Academia Sinica and Chinese Association of Computer Science (CACS). Another is Journal of Software undertaken by Institute of Software of Chinese Academia Sinica and CACS. The third is Chinese Journal of Computers undertaken by CACS. Three journals are all published by Science Press, a very famous publishing house in China. Taking impact and regularity of journal as the points of view, these three journals are ideal for being the data source with objectivity and representativeness.

From Computer Research and Development we obtained 1764 age items (1992~1997,1999. In 1998 there were no age records in this journal). Journal of Software offered 2249 age items (1994~1999). Only 1011 age items are recorded in Chinese Journal of Computers (1998- 1999). The total amount are 5024 items, including 311 age items of author, who wrote the paper himself, and 4713 age items of author, who wrote the paper in collaboration with others. Most of the age records are authors' dates of birth.

2.2 Methods

First, we built a database containing two original types of data: author's date of birth and the year when the author submitted his manuscript to the journal. Then the age, at which the author finished his paper, could be calculated. Here we suppose that after finishing their paper, the authors submit it to the journal immediately.

Second, we choose the indicator or expression of authors' age. In fact, there are two options open to us: the real age and the date of birth. The relation between them is: (a) the authors, who submitted their manuscripts at the same age, might be born in different years because of the difference of submitting time; (b) the authors, who were born in same year, might be different years of age when submitting their manuscripts, if their submitting occurred in different years. That means we should distinguish two kinds of age structures expressed by two indicators separately. In this paper only the structure expressed by authors' real age is shown. Another structure expressed by author's date of birth could be described and analyzed using the same method.
We divide scientists' age span into three periods: young-under thirty- six years of age, middle-aged-over thirty-six and under fifty, old-over fifty years of age, and call the scientists in three periods younger, middle-aged person and elder separately for short.

Third, two-dimensional and multidimensional ASSCs are defined as follow: suppose there are at most n authors signed in a paper, xi is the age of the author who ranks i in the sequence of authors, i=1,2,3, n, then m dimensional (2=m=n) ASSC is {(x1*,x2*,, xm*)}, xj*?{xi}, j=m,i=n. When m=2, that is two-dimensional ASSC, when m>2, it is multi- dimensional ASSC. For example, (x1, x2) is a couple of age combining the first author's age and the second author's age, then {(x1, x2)} is an ASSC formed by (x1, x2). While (x1, x2, x3) is an age group combining first author's age, second author's age and third author's age ,{(x1, x2, x3)} is a three dimensional ASSC formed by (x1, x2, x3).

Fourth, the collaboration frequencies in various ASSCs are counted. Average ages of authors in each rank are calculated. Then two-dimensional ASSC and multidimensional ASSC are revealed and described using two-dimensional and three-dimensional scatter plots, histograms of age difference, histograms of age sum and the column figures. Finally, to explain the reasons why China has such ASSC, some special issues are discussed: what is the age composition of all authors? What is the age distribution of the authors ranking in a certain position? How did the authors put their names in order when the professors collaborate with their students? The results give us some tentative answers.

3 Two-dimensional ASSC and multidimensional ASSC: results and analyses

3.1 ASSC {(x1, x2)} formed by the first author's age and the second author's age

3.2 Multidimensional ASSC

It is difficult to express directly ASSC of more than three-dimensions. We resolved multidimensional ASSC into two-dimensional and three- dimensional ASSC to discuss.

3.2.1 Two-dimensional ASSC in multidimensional ASSC

The papers written by three or more authors are 1195. Only those finished in three, or four of five dimensional collaborations are taken into account, because the authors ranks after the fifth authors are only 9 with 8 data of age.

The description to structure {(x1, x2)} has been listed in section 3.1. Following is the analysis to other two-dimensional structures {(x1, x3)}, {(x1, x4)}, {(x1, x5)}, {(x2, x3)}, {(x2, x4)}, {(x2, x5)}, {(x3, x4)}, {(x3, x5)} and {(x4, x5)}.

Figure 4(a)~(i) are scatter plots of all these structures, where some results could be obtained. First, similar to figure 1(a), Figure 4(a) shows two clusters of points and another patch of points in the right down corner. Therefor, in structure {(x1, x3)}, Y-E and Y-Y are the main two patterns of collaboration, then the E-Y. Although the points in figure 4(b) and figure 4(c) are much less than the points in figure 4(a), but they have the same pattern, in which Y-E and Y-Y are still the main types of collaboration.

Second, ignoring collaboration frequency, the number of points in figure 4(d) is similar to the number of points in figure 4(a)-they are all decided by the number of third authors. However, the points in figure 4(d) are not so concentrated as in figure 4(a), but formed 4 patches of points, corresponding to Y-E, Y-Y, E-Y and E-E separately. It is to say structure {(x2, x3)} is another kind of structure different from {(x1, x3)}.

Third, the number of the fourth authors controls the number of points in figure 4(e) and in figure 4(g). In general figure 4(e) and figure 4(g) have similar distribution pattern of fuzzy cluster. But in figure 4(e), the points in patch Y-E is denser than in figure 4(g), while in figure 4(g) the patch of E-E contains more points than in figure 4(e). Fourth, theoretically figure 4(f), figure 4(h) and figure 4(i) have the same number of points unless some ranks are short of age records of authors. Figure 4(f) is similar to figure 4(h). It is very interesting that figure 4(i) shows in structure {(x4, x5)} the ratio of E-E collaboration is larger than that in other two-dimensional structures. The last, whatever kinds of two-dimensional structures cover least points corresponding to collaboration between middle-aged scientists and other-aged scientists. Extremely, the collaboration inside middle-aged scientist is almost empty.

3.2.2 Three-dimensional ASSC in multidimensional ASSC

We take only the structures {(x1, x2, x3)}, {(x2, x3, x4)} and {(x3, x4, x5)} into account. Figure 5(a), 5(b) and 5(c) are three-dimensional scatter plot, exhibiting the regularity of three-dimensional collaboration from three visual angles. The figures corresponding to structure {(x2, x3, x4)} and {(x3, x4, x5)} are not given in this paper. Based on two-dimensional scatter plots shown in figure 1 and figure 4, all the three-dimensional figures are analyzed and we received some results as follow.

First, arranging the order of the authors in collaboration as the first author-the second author-the third author, the rank of various author groups according to the density of points in structure {(x1, x2, x3)} is Y-Y-E, Y-Y-Y, Y-E-Y, Y-E-E, E-Y-E and E-Y-Y. The points of author groups of E-E-Y and E-E-E are scattered. Although the author groups of collaboration containing middle-aged scientists are few, we could still recognize that M-M-Y and M-M-E are the two main patterns in which middle-aged scientists are collaborating with other-aged scientists.

Second, both Y-Y-E and Y-Y-Y are the densest scopes of points in structure {(x2, x3, x4)}, though all the points in this structure are less and scattered than in structure {(x1, x2, x3)}.

Third, in structure {(x2, x3, x4)}, Y-Y-E, Y-Y-Y and Y-E-E are three important age groups. Figure 6(a), 6(b), 6(c) are histograms of age sum of three authors with the data coming from {(x1, x2, x3)}, {(x2, x3, x4)} and {(x3, x4, x5)} separately. It should be illustrated that only the age groups with all the three authors' age records are selected as samples. It seems age sum of three authors in structure {(x1, x2, x3)} and that in {(x2, x3, x4)} follow the same distribution similar to binomial distribution, just the span of age sum in {(x2, x3, x4)} is wider and the age peak value moved towards the right direction. The distribution of age sum of three authors in {(x3, x4, x5)} is not regular because of the shortage of data, but even so, we could still see that the age peak value keeps the tendency towards right direction.

Figure 6(a), 6(b), 6(c) indicate that in general, the authors in structure {(x1, x2, x3)} are younger than those in structure {(x2, x3, x4)} and much younger than those in {(x3, x4, x5)}. This phenomenon could be seen in section 3.2.3 more clearly.

3.2.3Average age of authors in each rank

Showing the average age of authors in each rank is the direct description to ASSC. Table 1 shows the average ages of authors from the first rank to the eighth rank. It is astonishing that the average ages from the first rank to the sixth rank are increasing! That is the whole characteristic of multidimensional ASSC in Chinese information science.

Table 1 Average age of authors in each rank in ASSC

Rank 1 2 3 4 5 6 7 8
Age items 1792 1580 916 332 81 8 3 1
Average age 33.85 44.07 45.03 46.06 46.54 48.50 36.33 53.00

3.3 Analyses to the causes of forming two-dimensional and multidimensional ASSC.

A structure or a system is determined by two factors: the basic elements and their combination way. It is expected that this structure or system is reasonable and will amplify its function to output much more performance, according to the principle of that one plus one does not equals two, but more than two.

Scientists are elementary units of scientific collaboration and scientists' ages are basic elements of ASSC, so firstly we should study on the distribution of scientists' age. Particularly, the age distribution of authors who occupy the different ranks in co-authorship, must be analyzed one rank by one rank, so that we will be able to find the forming reason of ASSC.

Further, because of some reasons the authors, being same or different years of age, were combined, two or three or more working together, producing academic paper, and signing their name on the paper according to some appointed regularity. Thus lots of papers were produced and a large number of authors scattered in lots of papers and dropped on every rank of co-authors. Finally we found that many groups of authors' age are formed. Therefor, why and how do the authors cooperate with each other and put their names in order should be discussed. As a case analysis here we focus our attentions on the issue of signing in the papers completed by professors and their students.

3.3.1 Age distribution of all authors

We have 4713 age items of authors, who involved in ASSC. The age distribution of all the 4713 age items of authors is shown in figure 7(a). The abscissa is age and the ordinate is frequency. This distribution is typical double-peak distribution. The age span of the higher peak is from 24 to 36 years of age and the peak value is 26. The lower peak's age span is from 52 to 64 with the peak value of 58 years of age. The age valley is from 38 to 50 years of age, belonging to middle-aged authors. Such age distribution of all authors decides that the primary elements of the ASSC of Chinese Information Science must be young men and elders, and young men are even more important than elders.

3.3.2 Age distribution of authors in every rank

Figure 7(b), (c), (d), (e), (f), (g) show separately the age distributions of authors in rank first, rank second, rank third, rank fourth, rank fifth and rank sixth.

The histogram in Figure 7(b) looks like Figure 7(a), with double peaks, but the peak of young men is much higher than the peak of elders.

The age distribution of second authors and that of third authors are similar to each other. They are still the patterns of double peaks, but the height of two peaks is almost the same.

The fourth authors are not so many as the authors in front three ranks, the fifth authors are even less. Nevertheless, the age distribution of the fourth authors, as well as the fifth authors are also the same patterns of double peaks. With only 8 age items, the sixth authors shows that their age distribution is like double peaks too-three young men, four old men and one middle-aged author.

It is clear that figure 7 illustrates the reasons why ASSC of Chinese Information Science contains Y-E and Y-Y as main types of two-dimensional collaboration and Y-Y-E as well as Y-Y-Y as main types of three-dimensional collaboration. Still we wondered why Y-E (not E-Y) is the main type of two-dimensional collaboration? The discussion how the authors putting their names in order when professor collaborated with his student would give a tentative explanation.

3.3. Signing issue in collaboration of professor with student

Scientists have many ways to put all the co-authors' names in suitable places. In China it is common arranging all the authors' name according to one's contribution to the paper. Whose contribution is greater in the collaboration of professor with student, professor or student? How do they sign in their paper? As sample, we analyzed 609 papers, of which each paper written by only two authors.

First, we define relationship between professor and student as three situations: (a) tutor (teacher) with student in doctorial degree or in Master's degree, (b) professor with student in doctorial degree or Doctor who has no position, (c) professor or associate professor with student in Master's degree or Master who has no position.

According to such definition we selected 304 papers (from all the 609 papers) written by professor and student. Among 304 papers, there are 267 papers with student as the first author, making up 87.8 percent, and 37 papers with professor as the first author, amounting to only 12.2 percent. The average age of 304 students is 28.5 years of age. Probably that is one of the reasons why Y-E is the main type of collaboration and E-Y is the secondary type in collaboration of professor with student.

Of course, it is not a pure scientometric problem to discuss the contribution in connection with the paper written by professor and student. It touches upon the meaning of project, value of methods, effect of experience, especially, the difficulty of innovation. In any case, we want to stress that when discussing the signing issue, we should not ignore the background of Chinese traditional culture. Putting professor's name behind student, is it the support or help that professor gave to his student?

4 Conclusion and discussion
A special type of ASSC exists in Chinese information science, which is mainly composed of scientists younger than 36 or older than 52 years of age. In such type of ASSC, the many are young men who occupy the chief position in collaboration, the few are elders. Middle-aged persons are seldom. In general, young scientists are playing an important role in scientific collaboration of Chinese information science. Since there is obviously the age fault of 38 to 50 years of age, middle-aged scientists, who would originally be the mainstay of scientific collaboration, are playing a least important role in ASSC.

The formation of ASSC of Chinese Information Science has its historical cause and cultural background. The main historical reason is that unusual development of Chinese high education in the period of Cultural Revolution from 1966 to 1976 led to a decade fault in the ranks of Chinese scientists. This fault changes the ASSC in Chinese science into a pattern different from other countries. The cultural background might be the style of doing scholarly research advocated by Chinese traditional culture. In this kind of culture, teacher should be a ladder to give guidance and effective help to his student and to lead the student forward. We guess that is why most of the professors' name are behind students, making Y-E the main type of ASSC.

Further discussion on the sociological and psychological causes of the forming of ASSC would be needed in future.

This paper is sponsored by National Natural Science Foundation of China and DFG in Germany


(a), (b), (c)

Fig.1 Scatter plot of first author's age-second author's age


(a), (b), (c)

Fig.2 Collaboration frequency of first author with second author
X:first author's age
Y:second author's age
Z:collaboration frequency


(a), (b), (c)

Fig.3 Distribution of the absolute value of difference between first author's age and second author's age
X:the absolute value of difference between first author's age and second author's age
Y:frequency

Fig.1(a), Fig.2(a), Fig.3(a)
Data: 1565 couple of first author's age-second author's age within the papers written by two or more authors

Fig.1(b), Fig.2(b), Fig.3(b)
Data: 609 couple of first author's age-second author's age within the papers written by two authors

Fig.1(c), Fig.2(c), Fig.3(c)
Data: 956 couple of first author's age-second author's age within the papers written by more than two authors

Fig.4 Age structure of two dimensional collaboration

Fig.5 Age structure of three dimensional collaboration
X:first author's age
Y:second author's age
Z:third author's age

Back to Papers ListBack to Papers List
Top