This article throws light upon the four main steps involved in analysis of data. The steps are: 1. Establishment of Categories or Classification of Data 2. Coding 3. Tabulation 4. Statistical Analysis of Data.
Step # 1. Establishment of Categories or Classification of Data:
Social science research generally involves a large variety of responses to different kinds of questions asked or stimuli presented to the sample or ‘population’ of respondents. These responses may be verbal or non-verbal.
Clearly, if a large number of different kinds of responses are to be organized such that they can be used in answering the research questions, or in drawing generalizations, they must be grouped into a limited number of categories or classes. To take a simple example, suppose a questions is put to respondents, “Are you in favour of the objective type examination for college students?”
The responses of the respondents may possibly be grouped under four broad categories, as under:
(a) “Yes” responses.
(b) “No” responses.
(c) “Don’t know”, “Can’t say” etc., responses.
(d) “Did not reply.”
Suppose another question asked to the respondents is, “To which social class would you say you belong?”
The responses of the respondents may be grouped into the following categories:
(a) Upper class.
(b) Middle class.
(c) Lower class.
(d) “Can’t say.”
(e) Other responses (like, “I don’t believe in social classes.” “It hardly matters where I belong” etc.).
A prerequisite to taking a decision about the categories to be instituted for grouping the data is that the researcher must select some appropriate principle of classification. The research question or hypothesis, if any has been formulated, provides a good logical basis for selecting a classificatory principle.
Suppose, the hypothesis in a study is:
“Students who have had an experience of studying in coeducational schools will have more favourable attitude toward the system of co-education.”
Here, obviously, one of the principles of classification of responses will be whether or not the respondent has had prior experience of co-educational system. Another basis of classifying responses would be the degree of favourableness or unfavourableness expressed toward the co-educational system. Other bases of classification may also be invoked, depending on what further associations are to be scrutinized.
The first basis of classification would yield two categories of responses:
(a) Said, they had a prior experience of co-education;
(b) Said, they did not have any prior experience of co-education.
These two categories contain within themselves the entire range of responses (assuming, of course, that no respondent refused to answer or did not respond or gave some ‘other response.’ No responses on the above assumption are beyond the compass of these two categories. These two categories together form what is known as a “category-set.”
A ‘category-set’ must meet the following three requirements:
(1) The set of categories should be derived from a single classificatory principle. This requirement is quite understandable because if more than one principle of classification is employed, a single response may be claimed by more than one category.
Thus, the categories will not be independent of each other. For example, if we have three categories constituting the category-set, e.g., male, female, child, deriving obviously, from two classificatory principles, namely, sex and age respectively, then any one case (respondent) may be covered by more than one category in the category-set.
For example, a child may also be a male, a female may also be a child and so on. The classificatory principle may, however, be a compound one, i.e., made up of two or more criteria, i.e., male child, female child, etc.
(2) The second requirement is that the category-set should be exhaustive, that is, it should be possible to place every response in one of the categories within the set. ‘No response’ should be left out for want of an appropriate category in the set that will include it.
Whatever be the responses, it must be covered by some category within the set. For example, if the people of the world were to be classified on the basis of their racial stock, the category-set constituted of three categories, namely, (a) Caucasoid, (b) Negroid and (c) Mongoloid, would clearly not be an exhaustive category-set in accord with the requirement outlined above, since it does not contain a single category in which many of the Indian people (and some others) can find a place.
(3) The last requirement is a corollary of the first one, namely, that the categories within the set should be mutually exclusive; that is, the categories should not overlap. Thus, no response would be claimed by more than one category within the set.
Establishment of categories for data characteristic of social sciences is not always an easy task. The classificatory principle may often be a compound one (as opposed to simple, unitary). The task of drawing out all the mutually exclusive categories which together would exhaust the total universe of responses, on the basis of a compound classificatory principle, is indeed an exacting one, demanding imagination.
It is a great help in such cases to reduce the attributes constituting the compound principle of classification to symbols or codes and draw out by means of the technique of Boolean expansion, the entire range of possible categories comprising the category-set.
Let us take a very simple example. Suppose the researcher considers three attributes, e.g., sex (male of female), age (below 21 years of age or above 21 years of age) and marital status (married or single) as constituents of his single (but compound) principle of classification and reduces these to symbols as under:
Male = S, female = S̅
Below 21 years of age = A, Above 21 years of age = A̅
Married = M, Single = M̅
The resulting category-set will be the exhaustive totality consisting of all possible combinations of these three attributes that comprise the compound classificatory principle. The possible combinations, i.e., categories, will be 23 = 2 x 2 x 2 = 8 in number.
These are as under:
(1) S A M
(2) S̅ A M
(3) S A̅ M
(4) S A M̅
(5) S̅ A̅ M
(6) S̅ A M̅
(7) S A̅ M̅
(8) S̅ A̅ M̅
Decoding, i.e., substituting the real connotations for the symbols, we get eight mutually exclusive categories which read as under:
(1) Males below 21 and married.
(2) Females below 21 years and married.
(3) Males above 21 years and married.
(4) Males below 21 years and unmarried.
(5) Females above 21 years and married.
(6) Females below 21 years and unmarried.
(7) Males above 21 years and unmarried.
(8) Females above 21 years and unmarried.
By the same token, if the compound classificatory principle is constituted of four attributes, we shall have 24 = 2x 2 x 2 x 2, i.e., 16 mutually exclusive categories. It should be clear now how this method of establishing categories, rather than intuition, makes the task of classification much easier and fool-proof.
It is obvious that the establishment of a set of categories is relatively easy if the responses obtained from the respondents during the study are fairly simple and clear- cut and so the categories can be easily defined in an unambiguous way. Although this is the way categories should always be defined, the task is much more difficult with certain types of content.
Suppose in a study the researcher asked the male students, “How would you say the female students feel about studying in the same college with male students like you?” The answers are likely to range from indications of highly favourable attitudes (imputed to female students) to imputations of highly unfavorable attitudes. Suppose, these are some of the answers received from respondents.
(1) They like the idea.’
(2) ‘I don’t think they mind.’
(3) ‘They think it lowers them.’
(4) I don’t come in contact with them, so I wouldn’t know.’
(5) ‘They hate it.’
(6) ‘Some of them like it, some don’t.’
(7) ‘They want to study here so they can say they are no less than the males.’
(8) ‘In a purely ladies’ college they would miss much, so they seem to like it here.’
In regard to the above responses it would not be difficult to evolve a simple set of categories based on the classificatory principle of favourable versus unfavorable attitudes imputed to girl students. But we find that both favourable and unfavorable answers convey different shades of meanings.
The male student who says, “They (the girl students) want to study here so they can say they are no less than males” conveys something different from one who says, “They like the idea.” Similarly, the male student who says, “They think it lowers them” is again saying something different from the one who says, “They hate it.”
Thus, we see that two attributes, i.e:
(1) Imputation of favourable or unfavorable attitudes to girls, and
(2) Explicit reference or absence of reference to benefits or harm supporting favourable or unfavorable attitudes are two significant constituents of a compounded principle of classification.
The categories in the category-set in accord with the ideal requirements of a category-set discussed earlier, may be put down as under:
(1) Favourable attitude imputed to girl students, explained in terms of benefits they derive from studying in the same college with male students (for example, 7th and 8th answers).
(2) Favourable attitude imputed to girls without explicit reference to benefits gained from studying in the same college with males (e.g., statement No. 1).
(3) Neutral or accommodative attitude imputed to girls (e.g., statement No. 1).
(4) Unfavorable attitude imputed to girls, explained in terms of disadvantages (negative benefits) they derive from studying in the same college with male students.
(5) Unfavorable attitude imputed to girls without explicit reference to disadvantages or losses resulting from co-education (e.g., statement No. 5).
(6) Other answers, can’t say, no answer, don’t know (e.g., statement No. 4).
The above illustration would give an idea as to how very complex a classification in social science can get. Working with such complex categories requires considerable care and effort at classification. Even when categories have been worked out carefully their use will present greater problems than the use of categories more narrowly and exactly defined.
If a male student in the above example says, “they like it alright here, they know why” it is a moot question whether or not this statement implies a benefit. Thus, additional rules would have to be established to deal with such answers.
It must be said even at the cost of some repetition that although in principle, it is possible to use many attributes of responses for formulation of category-sets, in practice, this is often unnecessary, uneconomical and unrewarding since not all of these classificatory principles bear upon the objective of the study.
Let us now turn to consider the problem of selecting a classificatory principle for categorizing unstructured material (i.e., information collected by unstructured tools).
In studies using structured instruments for gathering data relevant to clearly-formulated research questions or hypothesis, the appropriate principle for classification of responses are fairly clearly prescribed by the nature of the questions and the secured responses.
In working with unstructured material or data, however, the first problem is to arrive at decisions about which aspects of the material are to be categorized, i.e., what classificatory principles are to be used in establishing categories.
In exploratory studies which definitionally do not start with a well-formulated problem or explicit hypothesis, the decision about the classificatory principles is difficult to arrive at. At the time of data collection, the investigator does not know which aspects may turn out to be most important.
He must therefore, collect a large amount of data of unstructured type. In the course of analysis, the researcher is faced with the problem of dealing not only with unstructured materials but also with a big volume of them.
It is advisable when analyzing data of an exploratory study to develop working hypothesis that will yield workable satisfactory classificatory principles. The researcher is required to read carefully through all his material, being all the time alert to the latent clues in data. Such clues are often secured through studying materials on subjects or situations that contrast with the ones he is studying.
Such a study helps the investigator to see the important differences between the two situations. Another procedure to get at such clues is to put together one’s cases into groups that seem to have a close kinship or appear to belong together and then ask oneself what led him to feel that the cases he placed in a single group are alike.
Yet another approach that may stimulate clues for formulation of working hypothesis is to note matters that seem surprising in view either of certain theoretical expectations or common sense and then to search for possible explanation of the surprising or unanticipated phenomena.
It should, however, be remembered that even with clear-cut hypothesis, the analysis of unstructured material presents special problems. Firstly, there is always the possibility that information on a given point may be missing from some of the documents.
There is also the likelihood of a great deal of material not having a direct bearing on the hypothesis. Besides, there is the problem of deciding on the size of units of the material to which the categories are to be applied.
For example, if a researcher was using case- records kept by welfare agencies, he must decide which unit (e.g., clients, statements, acts, social workers, sessions with the client or the entire record) is most appropriate in providing answers to his specific research questions.
Step # 2. Coding:
Coding consists in assigning symbols, usually numerals to each answer which falls in a predetermined class. In other words, coding may be regarded as the classification process necessary for subsequent tabulation. Through coding, the raw data are transformed into symbols that may be tabulated and counted.
This transformation is not however automatic, it involves great deal of judgement on the part of the coder. ‘Coder’ is the official title for a person who is assigned the responsibility to give particular codes to responses after the recorded notes have been brought to the office.
It should be remembered, however, that often the judgement as to which response should be assigned a particular code, is made by a person other than the one who goes by the official designation of ‘coder.’
Coding may take place at three different points in a study at each of which, different kinds of persons may be responsible for assigning codes to the raw data. In many a study, the respondent himself may be asked to assign codes to his own reaction or situation.
This is true for many poll-type and multiple-choice questions. For example, when the respondent is asked to indicate which of the classes (say income groups) he belongs to, e.g., (a) below 3000 rupees p.m., (b) Rs. 3001/- to Rs. 6000/- p.m., (c) Rs. 6001/- to Rs. 9000/- p.m., (d) Rs. 9001/- and above, the respondent codes his response simply by ticking off his position among the given alternatives.
The second point at which the coding can take place is when in the course of data collection, the interviewer or observer categorizes the subjects’ responses. This is what is being done when an interviewer or observer employs a rating scale to describe a person’s response or behaviour.
The final point at which coding can take place is, of course, when the raw uncategorized data (collected especially through unstructured instruments of data-collection) are deposited in the project office and the official coders here exercise their judgement to assign particular codes to particular responses or data.
Let us briefly compare and contrast the pros and cons of coding by the official coders in the office and coding by the interviewers or observers done in the course of data- collection in the field.
The interviewers or observers are in a position to notice the situation as well as the individual’s behaviour. Thus, they have more information upon which to base their judgements in regard to the appropriate categorization of responses as compared to the coders working on the basis of written records which may not give a complete idea about the real meaning of the response.
Another advantage of coding by data-collectors themselves is that, both time and labour can be saved.
On the contrary, coding in the office by coders has certain signal advantages. Coding of complex data which requires time for reflection should advisedly be done by the office- coders. On the spot coding judgement made by data-collectors may not be as discerning as judgements made with more time for deliberation.
The judgement of data-collectors may be coloured by many factors, viz., respondent’s appearances, accents and responses to previous questions, mannerisms, etc. Secondly, there is a danger of the data-collectors lacking in uniformity when coding responses.
Thus, comparability of data obtained from a large number of respondent is hampered. Thirdly, the interviewers or observers may develop their own personal frames of reference in respect of the material that they are coding. This would tend to make their categorizations unreliable, after a time. A common frame of references is easier to obtain and maintain in the office-coding operation than in the field.
Let us discuss some of the important problems related to reliability in coding. There are many things that may operate to make the judgement of coders unreliable. Some of the factors may arise from the data to be categorized, some from the nature of the categories that are to be applied and still others may emanate from the coders themselves.
We shall now consider briefly some of these factors and the ways they can be guarded against.
Many of the difficulties that occur in coding result from the inadequacies of data. Frequently, the data do not supply enough relevant information for a reliable coding. This could be due to deficient and inadequate data-collection procedures. These difficulties, however, can generally be overcome by careful editing of data. The process which consists in scrutinizing the data to improve their quality for coding known as editing.
When the data-collector hands in his material to the project office, the possibility of eliminating many potential coding difficulties still exist. A careful examination of the data as soon as they are collected and if necessary, a systematic questioning of the interviewers or observers helps avert many coding problems.
Not only does editing help to avoid later coding problems, it may also substantially improve the quality of data-collection by pointing out where the interviewers or observers might have misunderstood instructions or might not have recorded data in sufficient detail.
In fact, editing should be done in the course of pre-testing the interview or the observation-schedule training the interviewers or observers and in fact throughout the period of data-collection. Editing at the project office goes a long way in removing coding problems.
Thus, editing must be done while the interviewers or observers can be easily made available for questioning. Editing involves a careful scrutiny of the interview or observation schedules.
These should be checked for:
(1) Completeness: The editors need to see that all items are duly filled in. A blank space next to a question in an interview schedule, for example, may mean either ‘no response’ or ‘Don’t know’ or refusal to answer or inapplicability of question, or the question having been omitted by oversight, etc.
(2) The editor should examine the interview or observation schedules to find out whether the handwriting or the symbols or codes assigned by interviewer or observer can be easily understood by the coder.
It is always advisable to check for legibility when the material is handed in and if necessary to get the interviewer or observer to rewrite it. If this is not done, coding may get stuck up at a stage when the interviewers or observers may not easily be recalled for questioning.
(3) Editing also involves examining the schedules for comprehensibility. It often happens that a recorded response is perfectly comprehensible to the interviewer or observer, but not intelligible to the coder because the context of behaviour or response is not known to the coder. Systematic questioning of the data-collectors will clear off confusion and ambiguities and improve considerably the quality of coding.
(4) The data should also be examined or checked to find out whether there are certain inconsistencies in regard to the responses recorded in the schedule.
For example, a respondent might have said in response to one of the earlier questions that he had never met people of particular group and yet, in response to a later question he might have said something about visiting certain people of this group in course of his rounds. If such is the case, there is an obvious need to enquire into this inconsistency, and get it clarified through questioning the data-collectors.
(5) It is also necessary to check the degree of uniformity with which the interviewers have followed instructions in collecting and recording data. Coding may be hampered if a response is recorded in units other than those specified in the instructions.
(6) It should be noted that some response may simply appear to be irrelevant for the purpose of the investigation. This is likely to happen if a question is not clearly worked or not intelligently asked. The data should thus be carefully examined with a view to segregating the inappropriate responses from the appropriate ones.
The value of the categorization of data depends naturally on the soundness of categories employed. It is necessary that the categories besides being relevant to the purpose of research are also defined from a conceptual point of view.
Coding will be unreliable if the categories are not defined clearly in terms of indicators that are applicable to data, here and now. In practice, the categories are defined by means of examples from the data in hand. It is very helpful if illustrations from data show not only what kind of responses typify the category but also help to distinguish the boundary-line between seemingly similar categories.
It is obvious that the quality of coding is affected by the competence of the coders. The training of coders is thus an important step in any study.
The training of coders may proceed by the following stages:
Firstly, the various codes are explained to the trainess (coders) and illustrated with examples from the data to be categorized.
Secondly, all the trainee-coders then practice on a sample of the data, problems that arise are discussed by the coders as a group with the supervisor to develop common procedures and definitions.
Thirdly, clues resulting from practice-coding are used to effect revisions in the categories to make them better applicable to the material and to put in writing the procedures and definitions that have evolved during the preliminary coding.
Fourthly, at some point in the practice period when relatively few new problems arise, the coders work on an identical portion of data without consulting one another or the supervisor. The consistency or the reliability of coding is then computed to determine whether it is feasible to begin coding in right earnest.
Depending on the results of reliability or consistency checks, it may be decided to eliminate the categories that seem too unreliable or to spend more time in training coders or to eliminate coders who are most inconsistent and so on.
Lastly, periodical checks are made to ensure that coders do not become careless with more experience or that they do not develop personal idiosyncratic methods of handling new problems in the material. To ensure uniformity, and decision that is made after coding has begun should be communicated to all coders without delay.
Obviously, the consistency and appropriateness with which a given type of answer is assigned to a given category will have an important bearing on the outcome of analysis, hence, it is important to check the reliability of coding and to increase the agreement among coders as much as possible.
It is, of course, difficult to set any given level of reliability as the standard to be attained. Different types of material present different degrees of difficulty in achieving reliability. As a rule, the more structured the material to be coded and hence simpler the categories used, the higher the reliability.
It should be noted that the types of codes used in a study will differ according to whether the data are to be tabulated by machine or by hand. If the data are to be sorted manually, a word-description of the classes is satisfactory.
Also abbreviations or letters of alpha-bates, e.g., ‘Y for Yes, ‘N’ for No, etc., may be used. Machine tabulation, on the other hand, requires that classes be expressed in numerical symbols, since the machines can only be fed with numerical data.
Mechanical tabulation requires the use of punch cards. However, the number of different classes which can be shown on the punch card is limited. In any case, all codes used for machine tabulation can also be used for hand tabulation.
If codes are to be put on punch cards of which two sizes are in general use, i.e., 80 column cards and the 54 column cards, it is desirable to use ten on fewer classes/ categories for most items of information or response.
The punch card contains 10 numbered spaces and an X and Y in each column making a total of 12 codes that can be used. It is rather complicated procedure to get more than one type of item in a column. For example, nativity and age codes cannot be punched in a single column unless only six age-groupings are used for each.
Step # 3. Tabulation:
Tabulation is a part of the technical process in the statistical analysis of the data. The essential element in tabulation is the summarization of results in the form of statistical tables.
It is only when raw data are divided into groups and counts made of the number of cases falling in these various groups, that it is possible for the researcher to determine what his results mean and to convey his findings to the consumer in a form which can be readily understood.
Tabulation naturally depends on establishing categories for raw data, editing and coding of response (punching and running the cards through machines for mechanical tabulation and sorting and tallying for hand tabulation).
Experienced researchers generally develop tabulation plans at about the same time as they draft or construct the data-collection instruments and make sampling plans. The inexperienced researchers seldom concern themselves with tabulation plans until the data have been collected. Of course, it is impossible for the researcher to foresee the entire range of tabulation that will be subsequently desired.
He should be familiar enough with his research problem or the subject of investigation to be able to draw up tables that will provide answers to the questions which gave rise to the study. The researcher should be able to prepare adequate tabulation plans if he uses the findings from the earlier researches which have elements in common with the one for which the plans are being drawn.
In exploratory studies, a better and safer procedure is to pretest the data-collection instrument on a sample of population of the type that would be covered in the final study. This way, some clues in regard to what kind of tabulation would be meaningful can generally be obtained.
Tabulation, may be done entirely by manual methods; this being known as hand tabulation. Alternatively, it may be done by mechanical methods utilizing automatic and fast power machines for the bulk of data, the process being known as mechanical tabulation.
The researcher must decide before he draws detailed tabulation plans for his study, what method of tabulation he would use. This decision will be based on various considerations such as cost, time, personnel, etc.
Both hand tabulation as well as mechanical tabulation procedures have their respective merits and limitations. The researcher’s alert to these merits and demerits is in a better way to decide which method would be suited to his problem.
We shall briefly review the merits of these two methods of tabulation:
(1) Mechanical tabulation involves much clerical work and specialized operations. Of course, it facilitates speed but the speed may not always be an adequate compensation for extra clerical work.
(2) If the number and types of tables desired are not decided upon before tabulation work is begun,. machine-tabulation may be more expedient. But, if hand tabulation is considered to be efficient, the order in which various sorts and counts would be made is determined in advance of tabulation.
(3) A major advantage of machine tabulation is that it facilitates cross-classifications. In large-scale studies where many variables are to be correlated or cross- classified, machine tabulation is reasonably preferable.
It is for this reason that mechanical tabulation is used in studies requiring many inter correlations among variables. But, if the total number of respondents is small, a manual counting of them in accordance with the cross-classificatory principle may be relatively economical.
(4) When there is a great deal of coded information and several punch cards required for each case, hand tabulation may be preferable.
(5) If it is desired to keep the data in a form ready for new tabulation at a relatively short notice punch cards are typically useful. Mechanical tabulation is useful for periodic studies or surveys in which the same type of information is required to be collected at frequent intervals.
(6) The process of sorting and counting is less likely to produce errors if done by machine than if done by hand. Errors, of course, can and do arise in machine tabulation and when they do, they are often very difficult to identify and check.
Any errors discovered at coding, editing or field-work stages of the survey may hold up machine-tabulation work. It is often desirable, therefore, to proceed with hand tabulation alongside with the field-work.
(7) Cost of tabulation operations is an important concern of the researcher. Machine tabulation often involves much greater cost since the most of punch cards, charges for punching and verifying, machine charges for sorting and tabulation machines and expenses on hiring specialized services of specific types of machine operators often add up to much more than those involved in hand tabulation.
(8) Another important consideration is time. In mechanical tabulation the work of tabulation as such is done in a very short time, but the preparatory stages as also the training, supervision and possible non-availability of certain types of machines on hire resulting in dislocation of work may all inevitably contribute to wastage of time.
(9) The considerations of convenience can hardly be ignored. If mechanical tabulation demands despatching of raw data to some office far away from the project office, inconveniences involved in packing, transportation, etc., are caused.
(10) Lastly, the amount of commentary material to be recorded and analyzed may also affect the choice of tabulation methods. In some opinion-surveys, the verbatim comments of informants are important. The hand-code card used in hand tabulation alone can provide space for such remarks or comments.
Machines which handle tabulation work are of many kinds. Developments in this field have been extremely rapid during the recent years. Some machines simply sort and count cards, others sort, count and print the results, still others are equipped to perform most complicated statistical operations or calculations.
These last-mentioned machines are extremely complex and they must be programmed for a given operation by a specialist in the line. A table is an exhibit of the numerical data systematically arranged in labelled columns (vertical) and rows (horizontal).
A simple or elementary table indicates simple counts of the frequencies with which the various categories in each set occur in the data, for example, the number of people in the sample who have attended high school but not passed, the number of people who have attended college but not graduated and so on. The table given below simply points out the frequencies of visits of fifty respondents to the cinema.
In research, we are often interested in finding out the correlation between two or more variables, e.g., education and income and fertility, simple tables (illustrated above) showing frequency distribution of the respondents in respect of a single characteristic, e.g., education or income or fertility, do not help us see the relationship among two or more variables.
The way to see the relationship is by preparing cross-tables or breakdown tables. Such tables make possible the grouping of cases that occur jointly in two or more categories, for example, tabulation of the number of cases that are high in education, low in income and have between 2 and 3 children, or the number of cases that are low in education, low in income and have between 4 and 5 children and so on. The most elementary form of cross-tabulation the students are familiar with is the college time-table.
Suppose a researcher wants to see the relationship among three variables, viz., occupation, income and fertility. He must employ a scheme of tabulation that will afford all possible combinations of the different categories of these three variables.
Cross- tabulation of the data on a hypothetical sample of 100 persons may be presented as under:
In the above table, we have indicated the number of children in rows. This variable of fertility has been divided into five categories, i.e., no issue, 1 to 2 issues, 3 to 4, 5 to 6, 7 and above. So in the margin on the left hand, we have these 5 categories of fertility. We have indicated the income of 100 respondents in columns.
The income variable has been sub-divided into five categories, i.e., below Rs.200, Rs.201-400, 401-600, 601-800, 801-1000. Thus, we have five columns corresponding to these categories.
Again since we have one more variable, i.e., occupation to accommodate, the columns for income have each been sub-divided into two parts corresponding to the two categories in which the occupations have been divided, i.e., white collar occupation and blue-collar occupation.
Thus, we have ten vertical columns, corresponding to income and occupation. The number of horizontal rows we have for the categories of the fertility variable is five. Thus, we have ten columns crossed by five rows making up the body of the table.
The intersection of the columns and rows has effected 50 (fifty) cells or boxes. Each of these boxes or cells houses a particular number of cases which are different from those in other cells either in respect of income or occupation, in fertility or in any two of these or in all of these. Let us read the table to get some idea as to what it represents.
Out of the total sample of 100 cases, there are 25 who have between 3 and 4 issues. Of these 25, reading from the left-hand side, 5 persons (with between 3 and 4 children) have income below Rs.200/- and are employed in white-collar occupations.
Two persons (with between 3 and 4 children) have income below Rs.200 and are employed in blue- collar occupations. Let us now take the second row. Of the total respondents, 38 have between 1 and 2 children. 11 (in the 7th cell) who have between 1 and 2 children are from the income group Rs.601 to Rs.800 and are employed in white-collar occupation.
This exercise should make it very clear that cross-tabulation is an essential step in the discovery of or testing of relationships among the variables contained in the data.
Tabulation is a means to present data in a summarized form in a way that facilitates the required statistical calculations. Data may, however, be presented in other ways, i.e., instead of presenting them in a tabular form, the researcher may present them in the form of diagrams or graphs. Such diagrammatic or graphic representations do have the merit of being intelligible to a less knowledgeable reader.
But they suffer from the limitation that they are not so useful as a basis for statistical calculations. Let us now proceed to discuss the next operation, i.e., the statistical analysis of data. Tabulation is a prerequisite or a first step in this direction.
Step # 4. Statistical Analysis of Data:
In research, we are not concerned with each individual respondent. The purpose of research is broader than this. That is, we wish to know much more than simply that a given respondent, for example, has extremely favourable attitude toward disarmament and that another respondent has moderately unfavorable attitudes toward the same issue. But this information is just not enough.
Social science researches are generally directed toward providing information about a particular population of respondents mostly via a sample. The sample of the totality might be asked certain questions related to the problem of our study, or be subjected to some form of observation.
Let us suppose that we have asked a sample of a thousand college students studying in ‘post-graduate’ classes a series of questions with a view to securing information about their study habits. Our research would thus be directed toward providing information about the ‘population’ of ‘post-graduate’ students of which the thousand cases is a sample.
As a necessary step to characterizing this ‘population’, we would have to describe or summarize the information about study habits that we have obtained on the sample thereof. Tabulation is just a part of this step. In addition, we must estimate the reliability of generalizations of the ‘population’ from the obtained data. Statistical methods are useful in fulfilling both these ends.