Jump to content
TNG Community

Check for missing citations


Paul Barrett

Recommended Posts

Paul Barrett
@Robin Richmond asked me to repost the start of a facebook thread here, so that he could provide a more comprehensive answer.
 
If this is possible, what would be the query that could generate a report of all facts/events for a selected tree that do not have a citation?
I'd appreciate any help with this please?
 
Reason: I am about to launch a TNG installation for members of Berkshire Family History Society so that members can submit a copy of their tree for collaboration and archive purposes.
 
We want to provide tools for us and members to validate the data in their trees, and a classic error that pervades the first few trees we've had submitted is BMD and other event dates that have no citation.  To the purists in an FHS, citations are everything.  No citation means a dat or event is really only speculation.
 
Helping members improve the quality of their trees is a real value add service and I wondered whether this report might appeal to others?
 
@Robin it occurs to me that the ability to select all or a specific tree would be very helpful, if that's possible please?
 
Link to comment
Share on other sites

Paul Barrett

By the way Robin, if we have to split this into BMD+* events and Other Events, that's not an issue.

*By BMD+ I mean Birth baptism, Marriage, Death, burial or cremation, but BBMDBC is a bit of a mouthful

Link to comment
Share on other sites

Robin Richmond

Finding events without citations turns out to be very difficult, so this post, which includes several SQL statements, is going to be pretty long. :-)

First - yes, Paul, what you refer to as "BMD+" is my "Built-in events", and your "Other Events" is my "Custom Events". (See the TNG Events and Eventtypes wiki article if you need help going to sleep.)

A quick mention of some complications:

  • Because "built-in" events in the People and Families tables are structured quite differently from "custom" events in the Events table, it would be very, very hard to do this in one query. (But Paul said that it is ok to split them up).
  • To allow the query to display (for example) a person's name in addition to the personID,  we'll need to do separate queries for People and Families.
  • It turns out that it is very difficult to handle the built-in events, for reasons that I'll mention below.
  • Finally, all of the queries below ran a very long time (on each pageful) before returning any results. Maybe that's inherent, or maybe someone can find a better way. (My database has about 11,000 people.)

Modifications you'll need to make to these queries:

  1. To process one of your own trees, "rr" with "your tree abbreviation" in each WHERE clause.
  2. To do modify the queries to operate against all trees in the database,  just remove AND gedcom="rr" from each WHERE clause.  But you must retain the JOIN ON conditions that are in the form x.gedcom=y.gedcom
  3. And very importantly, you'll almost certainly need to add the prefix "tng_" to all of the table names. FWIW, I decided to drop the tng_ prefix several years ago because I got tired of typing it every time I wrote a SQL query.  

Ok, let's go. Keep your head down and watch out for syntax 

Query 1: Here's a query for custom events that just gives personID and familyID without any other record identifier such as person names. You'll need to specify your own treeID, of course, and you'll probably need to add the "tng_" prefix to each table name.

SELECT e.persfamID, ee.tag, e.eventdate, e.eventplace 
FROM events e
INNER JOIN eventtypes ee ON ee.eventtypeID=e.eventtypeID
LEFT JOIN citations c ON c.eventID=e.eventID
WHERE c.citationID IS NULL AND e.gedcom='rr'
ORDER BY e.persfamID, e.tag, e.eventdatetr

Let's break Query 1 into slightly more useful separate queries for People and for Families

 Query 2: Custom events for people and display the person's name:
SELECT p.personID, concat(p.lastname, ', ', p.firstname, ' ', p.suffix) AS fullname, ee.tag, e.eventdate, e.eventplace 
FROM people p
INNER JOIN events e ON e.persfamID=p.personID AND e.gedcom=p.gedcom
INNER JOIN eventtypes ee ON ee.eventtypeID=e.eventtypeID
LEFT JOIN citations c ON c.eventID=e.eventID
WHERE c.citationID IS NULL AND p.gedcom='rr'
ORDER BY p.lastname,p.firstname, ee.tag, e.eventdatetr

Query 3: Family custom events, showing last names of the husband and wife.
SELECT f.familyID, h.lastname Husband, w.lastname Wife, ee.tag, e.eventdate, e.eventplace
FROM families f
INNER JOIN events e ON e.persfamID=f.familyID AND e.gedcom=f.gedcom
INNER JOIN eventtypes ee ON ee.eventtypeID=e.eventtypeID
LEFT JOIN people h ON h.personID=f.husband AND h.gedcom=f.gedcom
LEFT JOIN people w ON w.personID=f.wife AND w.gedcom=f.gedcom
LEFT JOIN citations c ON c.eventID=e.eventID
WHERE c.citationID IS NULL AND f.gedcom='rr'
ORDER BY h.lastname, w.lastname, ee.tag, e.eventdatetr

Built-in Events

These are a tough because, among other things
- Several types of event are recorded in each Person or Family record
- The fields are always present, and it's not that easy to determine whether they have a value or not, at least in one big query,
- The citation table links to built-in events by specifying an event tag (BIRT, CHR, DEAT, BURI, etc.), not a recordID
- The event tags usually match the first part of the relevant data and place field (e.g. BIRT-> birthdate, birthplace), but not always (e.g. CHR->altbirthdate, altbirthplace)

Query 4: Here's a compound query that looks at the birth and christening events. You could extend it to cover all of the eventtypes in the People table by either
- Running separate queries for each type of event in the Person table, or
- Adding more subqueries, all connected with the UNION keyword

SELECT p.personID, CONCAT(p.lastname, ', ', p.firstname, ' ', p.suffix) as fullname, birthdate edate, birthplace place, 'BIRT' tag 
FROM people p LEFT JOIN citations c ON c.persfamID=p.personID and c.eventID='BIRT' 
WHERE c.citationID IS NULL AND (p.birthdate <> '' OR p.birthplace<>'') 
UNION 
(SELECT p.personID, CONCAT(p.lastname, ', ', p.firstname, ' ', p.suffix) as fullname, altbirthdate edate, altbirthplace place, 'CHR' tag
FROM people p
LEFT JOIN citations c ON c.persfamID=p.personID and c.eventID='CHR'
WHERE c.citationID IS NULL AND (p.altbirthdate <> '' OR p.altbirthplace<>'')
)
ORDER BY fullname, tag

Note that ORDER BY fullname, tag occurs once at the very end.  And I use the fieldname alias edate (for event date) because date is a SQL keyword.

Query 5:  MARR and DIV in the Families table, without an identifier such as the husband's and wife's last names. You can add those by adding h.lastname husband, w.lastname wife, to all of the SELECT clauses, and adding
LEFT JOIN people h ON h.personID=f.husband AND h.gedcom=f.gedcom
LEFT JOIN people w ON w.personID=f.wife AND w.gedcom=f.gedcom
above each WHERE clause.

SELECT familyID, marrdate mdate, marrplace place, 'MARR' tag 
FROM families f LEFT JOIN citations c ON c.persfamID=f.personID and c.eventID='MARR' 
WHERE c.citationID IS NULL AND (marrdate<>'' OR marrplace<>'')
UNION 
(SELECT familyID, divdate ddate, divplace place, 'DIV' tag 
FROM families f LEFT JOIN citations c ON c.persfamID=f.personID and c.eventID='DIV' 
WHERE c.citationID IS NULL AND (divdate<>'' OR divplace<>'')
)
ORDER BY familyID ,tag

-------------------------------------------------------------

Hmm... maybe that wasn't such a nightmare. But it did take me at least 3 hours to get these queries working.

- Robin

p.s. No, not "grammar sites". But if you know what a grammarcite (or may it's grammarsite) is, you get lots of bonus points, and you've had a lot of fun. I w 

Link to comment
Share on other sites

Paul Barrett

Hi Robin

That, my friend, is pretty damn impressive and explained in away that even this SQL numpty can work with! 😂  Thank you!

Just one question (for now)  As far as i can tell,  the the table names are those that immediately follow the FROM and JOIN statements, correct?

Link to comment
Share on other sites

Robin Richmond

Yep, table names follow the FROM and JOIN keywords.

FWIW, table names can be used throughout SQL statements.  In the queries I wrote in the posting above, I used "aliases" such as "p" for "people" and "e" for "events", but everywhere I used "p", "e", "ee", "f", and "c" (etc.), I could have used the full tablenames.

That said, I submitted to the compulsion to explain further, and further, and further, to the point where I've created a new TNG wiki article that starts with an introduction to relational database, and need to introduce SQL keywords, add a few simple examples, and include at least some of the queries I wrote above. The text that I started to write in this posting (and that I went crazy with) starts with heading More Details.

My original intent was to give you a few pointers that that might help you understand other queries that use different syntax.  Of course, you do not need to read it all, nor to read it at all. :-)  Still, I hope that portions of it will be useful to someone, someday.

https://tng.lythgoes.net/wiki/index.php/Aspects_of_SQL

- Robin

Link to comment
Share on other sites

Paul Barrett

This is all fabulous.  Thank you.

I currently have some issues to debug on our main sites, so I'll read all of this when I'm done.  I know diddly squat about SQL, so this will be a good intro

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...