Age structure of coauthors in Chinese information science Liang Liming, Yongzheng Guo
There are many academic journals in the field of Chinese information science, but only a few offer the age records of authors. After many times of consultation and comparison three journals are selected. One is Computer Research and Development undertaken by Institute of Computing Technology, Chinese Academia Sinica and Chinese Association of Computer Science (CACS). Another is Journal of Software undertaken by Institute of Software of Chinese Academia Sinica and CACS. The third is Chinese Journal of Computers undertaken by CACS. Three journals are all published by Science Press, a very famous publishing house in China. Taking impact and regularity of journal as the points of view, these three journals are ideal for being the data source with objectivity and representativeness. From Computer Research and Development we obtained 1764 age items (1992~1997,1999. In 1998 there were no age records in this journal). Journal of Software offered 2249 age items (1994~1999). Only 1011 age items are recorded in Chinese Journal of Computers (1998- 1999). The total amount are 5024 items, including 311 age items of author, who wrote the paper himself, and 4713 age items of author, who wrote the paper in collaboration with others. Most of the age records are authors' dates of birth. 2.2 MethodsFirst, we built a database containing two original types of data: author's date of birth and the year when the author submitted his manuscript to the journal. Then the age, at which the author finished his paper, could be calculated. Here we suppose that after finishing their paper, the authors submit it to the journal immediately. Second, we choose the indicator or expression of authors' age. In fact, there are two options open to us: the real age and the date of birth. The relation between them is: (a) the authors, who submitted their manuscripts at the same age, might be born in different years because of the difference of submitting time; (b) the authors, who were born in same year, might be different years of age when submitting their manuscripts, if their submitting occurred in different years. That means we should distinguish two kinds of age structures expressed by two indicators separately. In this paper only the structure expressed by authors' real age is shown. Another structure expressed by author's date of birth could be described and analyzed using the same method. We divide scientists' age span into three periods: young-under thirty- six years of age, middle-aged-over thirty-six and under fifty, old-over fifty years of age, and call the scientists in three periods younger, middle-aged person and elder separately for short. Third, two-dimensional and multidimensional ASSCs are defined as follow: suppose there are at most n authors signed in a paper, xi is the age of the author who ranks i in the sequence of authors, i=1,2,3, … n, then m dimensional (2=m=n) ASSC is {(x1*,x2*,…, xm*)}, xj*?{xi}, j=m,i=n. When m=2, that is two-dimensional ASSC, when m>2, it is multi- dimensional ASSC. For example, (x1, x2) is a couple of age combining the first author's age and the second author's age, then {(x1, x2)} is an ASSC formed by (x1, x2). While (x1, x2, x3) is an age group combining first author's age, second author's age and third author's age ,{(x1, x2, x3)} is a three dimensional ASSC formed by (x1, x2, x3). Fourth, the collaboration frequencies in various ASSCs are counted. Average ages of authors in each rank are calculated. Then two-dimensional ASSC and multidimensional ASSC are revealed and described using two-dimensional and three-dimensional scatter plots, histograms of age difference, histograms of age sum and the column figures. Finally, to explain the reasons why China has such ASSC, some special issues are discussed: what is the age composition of all authors? What is the age distribution of the authors ranking in a certain position? How did the authors put their names in order when the professors collaborate with their students? The results give us some tentative answers. 3 Two-dimensional ASSC and multidimensional ASSC: results and analyses3.1 ASSC {(x1, x2)} formed by the first author's age and the second author's age3.2 Multidimensional ASSCIt is difficult to express directly ASSC of more than three-dimensions. We resolved multidimensional ASSC into two-dimensional and three- dimensional ASSC to discuss. 3.2.1 Two-dimensional ASSC in multidimensional ASSC
The papers written by three or more authors are 1195. Only those finished in three, or four of five dimensional collaborations are taken into account, because the authors ranks after the fifth authors are only 9 with 8 data of age. The description to structure {(x1, x2)} has been listed in section 3.1. Following is the analysis to other two-dimensional structures {(x1, x3)}, {(x1, x4)}, {(x1, x5)}, {(x2, x3)}, {(x2, x4)}, {(x2, x5)}, {(x3, x4)}, {(x3, x5)} and {(x4, x5)}. Figure 4(a)~(i) are scatter plots of all these structures, where some results could be obtained. First, similar to figure 1(a), Figure 4(a) shows two clusters of points and another patch of points in the right down corner. Therefor, in structure {(x1, x3)}, Y-E and Y-Y are the main two patterns of collaboration, then the E-Y. Although the points in figure 4(b) and figure 4(c) are much less than the points in figure 4(a), but they have the same pattern, in which Y-E and Y-Y are still the main types of collaboration. Second, ignoring collaboration frequency, the number of points in figure 4(d) is similar to the number of points in figure 4(a)-they are all decided by the number of third authors. However, the points in figure 4(d) are not so concentrated as in figure 4(a), but formed 4 patches of points, corresponding to Y-E, Y-Y, E-Y and E-E separately. It is to say structure {(x2, x3)} is another kind of structure different from {(x1, x3)}. Third, the number of the fourth authors controls the number of points in figure 4(e) and in figure 4(g). In general figure 4(e) and figure 4(g) have similar distribution pattern of fuzzy cluster. But in figure 4(e), the points in patch Y-E is denser than in figure 4(g), while in figure 4(g) the patch of E-E contains more points than in figure 4(e). Fourth, theoretically figure 4(f), figure 4(h) and figure 4(i) have the same number of points unless some ranks are short of age records of authors. Figure 4(f) is similar to figure 4(h). It is very interesting that figure 4(i) shows in structure {(x4, x5)} the ratio of E-E collaboration is larger than that in other two-dimensional structures. The last, whatever kinds of two-dimensional structures cover least points corresponding to collaboration between middle-aged scientists and other-aged scientists. Extremely, the collaboration inside middle-aged scientist is almost empty. 3.2.2 Three-dimensional ASSC in multidimensional ASSC
We take only the structures {(x1, x2, x3)}, {(x2, x3, x4)} and {(x3, x4, x5)} into account. Figure 5(a), 5(b) and 5(c) are three-dimensional scatter plot, exhibiting the regularity of three-dimensional collaboration from three visual angles. The figures corresponding to structure {(x2, x3, x4)} and {(x3, x4, x5)} are not given in this paper. Based on two-dimensional scatter plots shown in figure 1 and figure 4, all the three-dimensional figures are analyzed and we received some results as follow. First, arranging the order of the authors in collaboration as the first author-the second author-the third author, the rank of various author groups according to the density of points in structure {(x1, x2, x3)} is Y-Y-E, Y-Y-Y, Y-E-Y, Y-E-E, E-Y-E and E-Y-Y. The points of author groups of E-E-Y and E-E-E are scattered. Although the author groups of collaboration containing middle-aged scientists are few, we could still recognize that M-M-Y and M-M-E are the two main patterns in which middle-aged scientists are collaborating with other-aged scientists. Second, both Y-Y-E and Y-Y-Y are the densest scopes of points in structure {(x2, x3, x4)}, though all the points in this structure are less and scattered than in structure {(x1, x2, x3)}. Third, in structure {(x2, x3, x4)}, Y-Y-E, Y-Y-Y and Y-E-E are three important age groups. Figure 6(a), 6(b), 6(c) are histograms of age sum of three authors with the data coming from {(x1, x2, x3)}, {(x2, x3, x4)} and {(x3, x4, x5)} separately. It should be illustrated that only the age groups with all the three authors' age records are selected as samples. It seems age sum of three authors in structure {(x1, x2, x3)} and that in {(x2, x3, x4)} follow the same distribution similar to binomial distribution, just the span of age sum in {(x2, x3, x4)} is wider and the age peak value moved towards the right direction. The distribution of age sum of three authors in {(x3, x4, x5)} is not regular because of the shortage of data, but even so, we could still see that the age peak value keeps the tendency towards right direction. Figure 6(a), 6(b), 6(c) indicate that in general, the authors in structure {(x1, x2, x3)} are younger than those in structure {(x2, x3, x4)} and much younger than those in {(x3, x4, x5)}. This phenomenon could be seen in section 3.2.3 more clearly. 3.2.3Average age of authors in each rankShowing the average age of authors in each rank is the direct description to ASSC. Table 1 shows the average ages of authors from the first rank to the eighth rank. It is astonishing that the average ages from the first rank to the sixth rank are increasing! That is the whole characteristic of multidimensional ASSC in Chinese information science. Table 1 Average age of authors in each rank in ASSC
Fig.1 Scatter plot of first author's age-second author's age
Fig.2 Collaboration frequency of first author with second author
Fig.3 Distribution of the absolute value of difference between first author's age and second author's age Fig.4 Age structure of two dimensional collaboration Fig.5 Age structure of three dimensional collaboration |