Some thoughts on Family Tree studies

Alice and Sidney Baker – my paternal grandparents

A salutary lesson

Over perhaps the last ten years, I have spent some of my spare time researching my family tree, in a fairly ad hoc and disorganised way, but I had nonetheless built up a considerable set of information, and one or two branches on my tree reached back to the 1650s. In recent weeks, having a little more time because of retirement and enforced isolation, I have been looking again at all the information in a hopefully more systematic way. I have posted the results of my endeavors at the Baker Family Tree on my website.  During this reappraisal, I have tried to make sure that all I include is properly sourced in census, birth, baptism, marriage or death records. I did however find that, in my enthusiasm to push my tree back as far as possible, I had previously used some unsourced family tree material found on, for example, Ancestry Public Member Trees. When I came to check where this material came from, I could not actually find any proper sources in a number of cases, and I have had, regretfully, to remove such material from my tree, and in doing so have lopped of a few quite long branches. Looking at the various public trees, I am clearly not the only one who has done this, with some of the unsourced material I initially used being repeated in a number (and sometimes a large number) of family trees.  The temptation to do so is obvious as it represents a quick way of identifying ancestors, but it retrospect it was not the wisest thing to do.

The Baker Family Tree

That being said, the whole affair has made me think a little more deeply about the accuracy and quality of information that is available to build up family trees. As a result, I have carried out some simple analysis using my own tree to try to investigate these issues a little further. Before going any further however, I need to make clear the characteristics of this tree, as to some degree these determine what follows. Very broadly my ancestors, as far back as I can trace them, were either miners, ironworkers or industrial or agricultural labourers of some description – in other words at the bottom of the social scale. The tree is centred on a region at the western edge of the Black Country, and by and large my ancestors come from within 30 to 40 miles of there, as far back as I can trace them.  Some of the branches of the tree can be traced back to the seventeenth and eighteenth centuries in places where there are comprehensive surviving baptismal and marriage registers – in particular in the Anglican churches in Lichfield, and the non-conformist churches in Shifnal in Shropshire.  On my website, I present my tree as four separate sub-trees, beginning with my grandparents – the paternal grandfather tree, the paternal grandmother tree, the maternal grandfather tree and the maternal grandmother tree. I label my grandparents generation as generation 1, their parents as generation 2 and so on. The tree concentrates on my direct ancestors, although I do give details of the children of direct ancestors where I have found this information. More could be done in this regard. The other very specific oddity of the tree is that both my father’s surname and my mother maiden name was Baker, so there are two, independent, Baker sub-trees. 

Generation length

Firstly I looked at the average birth / baptism dates for each generation, using the information for each member of the tree that was available. Birth dates are plotted against generation number in figure 1 below. Basically it shows that, on average, the generations are spaced at around 25 to 30 year intervals – which seems reasonable, given that most of my ancestors married at around the age of 20 and those who bred successfully continued to doing so into their forties. 

Figure 1 Generation birth dates


Secondly I looked at the completeness of the tree. In each generation there are a theoretical maximum number of individuals – 4 in generation 1, 8 in generation 2, and up to 4096 in generation 10. Figure 2 shows the percentage of this maximum number that I have identified in each generation. For convenience I plot this percentage against the generation birth date. It can be seen that this percentage remains high for birth dates back to around 1800, then drops rapidly, and remains at a low level as far back as the trees can be traced in the seventeenth century. The major fall off in the completeness of the tree around 1800 is due to the fact that individual born before that rarely appear in the censuses from 1841 onwards, and that the coverage of parish and non-conformist registers is somewhat patchy before this date.  Information finally runs out around 1630 to 1650, before which the parish records were all but non-existent.  For trees such as mine these two dates represent significant information horizons. For those with ancestors somewhat higher up the social scale there may well be other sources of information available that allow trees to be pushed back further in time.  Taken to its extreme, the House of Windsor can trace its pedigree back to the early royal houses of Wessex, Mercia, Dal Riada and the Picts etc. Indeed, if one believes some genealogies this pedigree can be traced back further to Woden, although the source information for that is probably not readily available. 

Figure 2 Generation Completeness


I then looked at the quality of the information for each individual in the tree. I used a scoring system with one point allocated for each of the following pieces of information.

  • Birth / baptism year
  • Birth / baptism location
  • Marriage year
  • Marriage location
  • Death year
  • Father’s first name
  • Mother’s first name
  • Mother’s maiden name
  • Spouse’s first name
  • Spouse’s surname

Whilst the list is to some degree arbitrary, in my experience the more of these details that are known about an individual the more confident one can be about their place in the tree.  There are thus a maximum of ten points for each person.  Applying this to my tree, one can draw histograms of quality scores for each generation, and these are shown in figure 3 below.  Generation 1 has four entries, all of which have maximum scores. The number of entries increases through generations 2 to 4, but in general the quality of information remains high, although it is beginning to spread out by generation 4 with some low scores. Generation 5 is however distinctly different, with a large number of individuals with low quality scores. This corresponds to the fall of in the number of identifiable individuals in figure 2 for this generation with an average birth date of 1780, and the low scores represent individuals whose line can be traced back no further. Generations 6 to 9 then show a wide spread of scores, with high scores for those branches that are well documented, and low scores for those branches that are coming to an end. The high scores become fewer and fewer as the generation date approaches the early limit around 1630 to 1650.

This raises the question as to whether it would be sensible to impose a confidence limit on individuals in tracing back family trees. My own feeling is that if the quality score for an individual is 3 or below, then it would be wise to stop the tree this point, rather than making speculative identifications from the record for earlier generations. 

Figure 3 Generation Quality


As I noted above most of my ancestors were very unadventurous in terms of where they lived and all came from a quite restricted area. This same tendency can be seen in the names they chose for themselves.  There are 21 different first male names in my tree, with the top four being John, William, Joseph and Thomas (in that order), accounting for 65% of the total. There are slightly more female names – 27 in total with the top four in order being Mary, Ann, Hannah and Sarah, which account for 47% of the total. This is very consistent with the analysis of the names in the Book of Reference to the 1822 Fowler Map of Kingswinford in Kingswinford Manor and Parish. Here the top four male names were the same as in my tree, and in the same order and accounted for 55% of the total. The top four female names in my tree are all in the top five of those in the Fowler Book of Reference, although not in the same order, and together they make up 57% of the total. This concentration of names is matched throughout the area over the period from 1600 to 1900. Also “Baker” is a common name with two Baker families being represented in my tree with perhaps half a dozen other Baker families in the records in the immediate vicinity, and many of the other names in the tree are equally common. Putting these two facts together, it is easy to see that in a number of cases, there is significant ambiguity in identifying individuals in the record. The name “John Baker” was particularly difficult in this regard – there were a lot of them about.  It is this ambiguity that brings many branches of the tree to an end. I have taken the view that if there is there is more than one candidate for a particular place in the tree, and there is no strong supporting evidence for choosing one above the other (such as, for example, birth / baptism location) then it is best to make no identification until further evidence can be found. 

Final thoughts

Finally, can my reflections above, based on my own, really very specific , family tree, be generalized in any way? I think a number of fairly general points can be made. 

  • The two information horizons I identified will be generally true for all those trees without a high status component, and all such trees are likely to thin when the generation birth dates are before 1800, and to run out completely around 1650.
  • The concept of quality of information might be useful to those researching trees, in identifying an appropriate place to stop each branch.
  • Very great care should be taken with ambiguous situations – there is little point in pursuing a branch back in time if there is a significant probability that one link on it may be incorrect.

