Jump to content
TNG Community
fluffy82

Admin Branches Queue

Recommended Posts

fluffy82

Hi @Robin Richmond

To avoid having to rebuild my branches every single upload (which takes about 20 hours in total, during which time my site is unuseable...), I wanted to try and use your Admin Branches Queue mod to see if that makes life easier (if I understand correctly, it also rebuilds the branches to add new people, but it does so in the background in stead of having to wait around for them to finish).

I'm having some trouble understanding how it works...

I started out with an empty branchlinks table.

I selected my 5 branches one by one, and clicked on "add". The action "add" was started and finished at 10:02 this morning, the action "queued" seems to have started at 10:06, but hasn't finished yet (?).

Here's the weird thing...

  • when I select "Count labels in each branch" on the mod's tab, it shows a number of persons that seems to be correct
  • when I'm on the usual branches tab, it says no people/families in the branches
  • when I look at my SQL tables, the branchlinks table is empty
  • when I'm on my website, the branches show as they should (not all show, but I suppose that's because it's still processing)

Could you explain how the mod works? I've read all the information on the Wiki but that doesn't really explain why the branchlinks table is empty and the numbers on the different tabs are not the same.

Regards,
Tom

database.PNG

branches.PNG

count labels.PNG

no label.PNG

with label.PNG

Share this post


Link to post
Share on other sites
fluffy82

It's getting weirder...

In an attempt to get the numbers right, I followed the advice given on the pages, and did a "remove and add" for all branches. This resulted in 5.000 labels to be added to the database and the index (whatever that is) showed a couple of hundred. It remained the same number for several hours (went to pick up an order at Ikea in the meantime and was gone for 3 hours). Even though it's better than 0, it's still far off from the 41.000 it should be, and it wasn't going up at all. So I was afraid that Branch A didn't only remove the existing Branch A-label to add it again, but also all existing other labels for Branch B and Branch C etc, resulting in only the last of the 5 actually being applied. Many of my persons belong in more than one branch though. A quick check of the content of the table confirmed this: people of whom I know they should have been part of two branches, only had one assigned.

Trying to sort out the mess, I stopped the queue and told it to empty the branches with the goal to have no branches at all, and just add (without replacing or deleting anything), but that didn't have any effect. The 5.000 labels just remained put.

To get rid of them, I emptied the database tabel in MySQL, I even removed the branches in TNG so I could start anew. But now labels are being added. An hour after I told it to stop adding labels (which it wasn't doing anyway, nothing was moving since hours), and after deleting the branches from within TNG, branch labels are suddenly being added at a rate of over 1.000 per couple of seconds.

In short: I'm now waiting patiently, emptying the tabel from time to time, until it remains empty. I will then recreate the 5 branches I had, and add the necessary labels. Hoping that this will work...

Share this post


Link to post
Share on other sites
fluffy82

Still trying to make this work...

After the above, everything seemed to be stagnating somewhere, with numbers that didn't make sense and didn't add up. So I did yet another complete restart.

  • I deleted the 5 branches
  • I emptied the branchlink table
  • I emptied the persons and families tables
  • I double checked that nothing was running and no fields were being added automatically by anything, anywhere
  • I reimported my gedcom
  • I created 5 new branches
  • I queued one branch (not all five of them as I did before) with the "add" button in "Queue branch labeling operations: Immediate Action Buttons"
  • 10 hours later, not a single branchlink has been added... The process is showing that it started at 11.30 last night, but hasn't stopped yet

So while I wanted to improve the 20 hours I needed before to add my branches, it seems to be making it a lot worse. First working at an extremely low pass when queueing 5 branches at the same time, and now doing nothing at all when queueing only one branch.

@Robin Richmond if you are reading this: am I doing something wrong?

Share this post


Link to post
Share on other sites
Rob Roy

I too have a very large database and have found that if I try to regenerate my two big branches, it hangs.  My solution is very, I mean VERY, ugly.  I do a Gedcom export into PAF, then produce a cascading pedigree chart for each branch.   Using the pedigree charts, I basically regenerate 12 to 20 generations at a time.  For two major branches, this takes about 2 to 3 days.  I said it was VERY ugly.  I suspect that there is some table or other restraint internal to either TNG or SQL Server that causes this issue.  Yes Virginia, you can trace back too far.

Share this post


Link to post
Share on other sites
Ken Roy

Sounds like you both are using branches for a different purpose than its original intended use - allow logged in users to view information on Living and Private

I have 25,000+ entries and it takes very little time to relabel branches.   i only need to then add the branch label to new family members that I added in PAF since my last import to all relatives to view their new family additions.   If I forget, no big deal since the new baby will not be viewable.

I only label branches from a registered user's grandparent or grandparent and only label down and not up.

Share this post


Link to post
Share on other sites
fluffy82
Just now, Ken Roy said:

Sounds like you both are using branches for a different purpose than its original intended use - allow logged in users to view information on Living and Private

I am using them to show myself and my visitors whether or not a person is a direct ancestor, and of which family. I divide these into 4 family groups - one for each of my grandparents. A fifth branch is added with my own direct ancestors to have those names displayed in bold through a different mod. I use branches for easy reference, not for privatisation of my tree. I only use a required login to hide or show living people, everyone can see my complete tree, nothing to hide.

3 minutes ago, Ken Roy said:

I have 25,000+ entries and it takes very little time to relabel branches.

Re-attaching existing labels after a new gedcom import is relatively fast. The problem is that this does not add labels to new people in the tree. To do this, one needs to remove all existing labels, and relabel the branches (to avoid doubles, or to avoid leaving in labels that should be removed). That takes many many many hours. When done through the vanilla TNG, the whole site becomes inaccessible during the process. The current mod is supposed to trigger a background process which doesn't interfere with your work. It seems to take a lot of time too, but at least the site remains operative while it's labeling.

6 minutes ago, Ken Roy said:

i only need to then add the branch label to new family members that I added

It is impossible for me to keep track of all new family members I add between uploads, and whether or not they are ancestors or not and of which side of the family. I work on my tree very randomly. I could do 3 hours today, 10 minutes tomorrow morning, one hour tomorrow evening, not touch it for a week, and do another 2 hours. Uploads are done monthly - if there aren't any issues.

One of the things I do, for example, is to list all people of a certain name in a certain area. I add their spouses, children, etc and try to link them into one big family. Through the use of witnesses at baptisms or marriages, I can then find who of the younger generation was my ancestor. Linking them to my tree, gives me a dozen, up to sometimes 100 "new" ancestors. People who maybe were already in my tree, or maybe not, depending on how fast I found the connection to my tree.

Or you might discover a line where some great-grandmother in branch A is actually already in your tree as the sister of an ancestor of another branch B. Part of that branch B suddenly becomes also part of branch A. I need TNG to calculate those labels for me, without me noting down lists and changing things manually.

Share this post


Link to post
Share on other sites
fluffy82
22 minutes ago, Rob Roy said:

I too have a very large database and have found that if I try to regenerate my two big branches, it hangs.  My solution is very, I mean VERY, ugly.  I do a Gedcom export into PAF, then produce a cascading pedigree chart for each branch.   Using the pedigree charts, I basically regenerate 12 to 20 generations at a time.  For two major branches, this takes about 2 to 3 days.  I said it was VERY ugly.  I suspect that there is some table or other restraint internal to either TNG or SQL Server that causes this issue.  Yes Virginia, you can trace back too far.

As I try to upload a new gedcom every month, that is not really an option for me. I'm not prepared to do all this work every time over and over again. I need some sort of automated process. Unfortunately, TNG does not keep track of existing and new people in the database when importing a new gedcom. So to have correct labels, they need to be recalculated after every import.

Share this post


Link to post
Share on other sites
Rob Roy

Ken you are absolutely correct.  Branches was designed for limiting viewing of living persons.  That said, many programs have a way of identifying lines and Branches is the obvious choice.  In my personal database, I use Branches to identify ancestors of myself, my wife, and inlaws, and to differentiate them.  Living in West Virginia, it is very important that I can prove that my wife and I are kin. :)

As I said, I have a method that works.  Possibly this could be a feature for version 14.

Share this post


Link to post
Share on other sites
Robin Richmond

I don't understand this notion of what branches were designed to do.  How one is using branches has very little if anything to do with how long it takes to label a branch. Maybe the implication is that branches were "designed" to use a smaller range of generations that we are using.  But since we can specify a relatively large range of generations well, obviously branches are designed to support these larger branches. 

The problem is how long it is taking to label a branch (that has been defined with the native specification features), not how we are using branches.

That said, I'm doing my damnedest to make the code more efficient. Its taking me much longer than I had hoped, and life and other TNG issues are intervening, but there is still hope.  I have managed to decrease the branch-processing time, at the cost of missing some records.  I have figured out an algorithm that should reduce the execution time significantly.  In fact, my implementation has improved the execution time, but at the cost of missing some records.  That doesn't help much, does it?  But hope springs, well, for a long time. 

Share this post


Link to post
Share on other sites
Rob Roy

Having 120 generations, I am one of those whose TNG hangs if I try to do an entire branch.  I have a very ugly workaround that I do twice a year.  Not looking forward to New Years Day :)

I do know where much of the problem comes from.  If your ancestry links to nobility (lots of Americans do) then you are faced with a lot of intermarriages.  It is in the range of 1400 to 800 that this tangle of people hangs branches.  My guess is that there is a table created during the procedure that fills up and crashes the procedure.  I could well be wrong, but that is my best guess. 

Share this post


Link to post
Share on other sites
Ken Roy
30 minutes ago, Robin Richmond said:

The problem is how long it is taking to label a branch (that has been defined with the native specification features), not how we are using branches.

Robin,

you are absolutely correct.  TNG is distributed with Max Generations set to a very smaller number.  Not anything in the 100 of generations

Share this post


Link to post
Share on other sites
Michel KIRSCH
1 hour ago, Rob Roy said:

My guess is that there is a table created during the procedure that fills up and crashes the procedure.  I could well be wrong, but that is my best guess. 

No Rob. No chance to fill a table. A table has no physical existence on your disk. A table is just a humanly understandable representation of data.

The only limiting factor to "fill" a table is the ID. In TNG an ID is integer(11) or 99.999.999.999 records

On the other hand, during long processing, the limiting factor is the time your ISP allocates for each PHP script in progress.

Don't look: that's where the problem comes from...

Michel

Share this post


Link to post
Share on other sites
Michel KIRSCH
55 minutes ago, Ken Roy said:

TNG is distributed with Max Generations set to a very smaller number

Ken, is it a parameter somewhere in TNG?

Michel

Share this post


Link to post
Share on other sites
Robin Richmond

There has been a parallel discussion on the tngUsers2 email discussion list, and I plan to focus my responses here.  I've attached the entire tngUsers2 thread here for context. 

BranchProcessingFailuresDiscussion.docx

Share this post


Link to post
Share on other sites
Ken Roy

TNG has at more than one parameter in chart settings

Pedigree Chart  Max Generations  8

Descendancy Chart  Max Generations  12

Relationship Chart Max Generations:  15

I don't see a limit on Branch labeling on the screen.  I did not check the code.

Share this post


Link to post
Share on other sites
Rob Roy

Michael,  So much for my 1980's database technology know-how.  If it is the ISP (I've got one of the worst in the country if not the worst) I wonder, if you brought TNG local, if it would work?

Share this post


Link to post
Share on other sites
Robin Richmond

There's no limit on the number of ancestors nor descendants from the starting point, but going straight up or down the tree is not (much of) a problem. There is a hard-coded limit of five generations of descendants from each ancestor. That's where the big problem comes into play. If it were higher, things would be even worse.

And that's a good segue to my analysis of the two problems I've identified in branch processing, and my approach to mitigating it.

First - to Rob's question - There is no database log of of records processed.  The program does track records that have been labeled (or unlabeled) through an array, but the array storage is not a problem at all; it could accommodate many millions of entries.

Descendants of Ancestors
The main problem is that, as the program processes descendant of an ancestor, it duplicates most of the descendants from the previous generations.  For example, let's start with me in generation 1, with however many ancestors, and 3 generations of descendants from each ancestor, and assume that I have grandchildren.

  • Go up one generation (to my parents), and back down three. You'll capture me and my siblings (descending generation 1), all of our children (descending generation 2), and grandchildren (generation 3).  Let's say that that's 30 descendants.
  • Go up a second generation (to my grandparents) , and back down three. You'll capture my parents and all of my aunts and uncles (descending generation 1), me and my siblings and my first cousins (descending generation 2), and all grandchildren of me, my siblings, and my first cousins.  However many that is, we had already processed 30 of them.
  • Go up 10 generations, where you can have upwards of 1,000 ancestors.  Now come down 3 generations from each of those ancestors, and consider the number of descendants the program duplicates. The duplications increase exponentially as the number of generations increases.  I believe that THAT is the biggest problem.

I've defined a branch in my tree that goes 12 generations up and 5 generations from each ancestor.  It has just under 10,000 people in it, and, as it is being processed, those 10,000 records are visited a total of 77,000 times.  That not ideal.

The first rule in the solution is not just to mark records we have flagged, but also to stop going down the tree once we have encountered a person we have already flagged as we descend the tree. Note that we can't just stop when we see that we've labeled a person, because if we label people as we go up the tree (which the program does), we haven't processed their descendants yet. So the second rule is not to flag ancestors IF we are also going to come down the tree from each ancestor.

But the first rule isn't quite correct. When cousins marry (which they inevitably do, at some level) it throws things off.  If John and Sarah marry and are third cousins once removed, well, we'll encounter them at separate points in the process, and most particularly, we'll encounter them at different levels in the downward processing.  Let's say that we encounter John first, in level two of three descending generations.  That means that we will flag him and process his children, flagging them, and stop.  Fine. Now we encounter Sarah at level one of three descending generations.  We need to process her (and John's) children and grandchildren.  But we've already flagged John and Sarah's children, so according to step 1, we would stop with those children.  But that doesn't work because we do need to process their grandchildren.

So we replace rule 1 with rule 3:
a. Flag each person as we go down the tree with a value that represents the number of generations yet to go as we process descendants.
b: Keep going if we need to process more generations that the flag indicates have been processed.

That's a good bit more complicated than the original walk-up and walk-down-from-each-ancestor process, but it should be reasonably straightforward to implement.

Well, not for me.  I haven't gotten it to work correctly.  I've implemented what seems to me to be a correct implementation of rules 2 and 3, and my process is faster, but it just doesn't capture everyone.  I should be able to fine-tune it, but I have to scratch out the time to so.

Ancestors

We essentially have the same problem as we process ancestors. That is, when ancestors of ancestors marry (which is really equivalent to saying cousins marry, since going up from each cousin, we'll encounter the same ancestors twice) we will find ourselves in a situation where we have already processed their ancestors when we encounter them again as ancestors of another ancestor.  This problem has the same complication as the descendant problem, and a similar solution, but doesn't have the same implications because it doesn't happen over and over again with each generation.

Now, back to work on code

- Robin

Share this post


Link to post
Share on other sites
Michel KIRSCH
56 minutes ago, Rob Roy said:

Michael,  So much for my 1980's database technology know-how.  If it is the ISP (I've got one of the worst in the country if not the worst) I wonder, if you brought TNG local, if it would work?

Yes, since Paradox and DBaseII (my first databases too) , there are some technological advances... :-)

If you try it in local, the only benefit is that you can tune the time allowed for a PHP script to go to end.

it's the parameter "Maximum execution time" in the php.ini. See  http://php.net/max-execution-time

Michel

Share this post


Link to post
Share on other sites
Michel KIRSCH
1 hour ago, Ken Roy said:

TNG has at more than one parameter in chart settings 

Pedigree Chart  Max Generations  8

Descendancy Chart  Max Generations  12

Relationship Chart Max Generations:  15

I don't see a limit on Branch labeling on the screen.  I did not check the code.

Thanks Ken. These limitations are modifiable via the TNG setup.

Because branch labeling is incremental 1 by 1, the limit is int(11) - 1

Michel

 

Share this post


Link to post
Share on other sites
Michel KIRSCH

Other thing that can be explain the lenght of a batch work : the maximum number of transactions per hour.

With some ISP's, the number of transactions per hour can be limited to 10.000 or 15.000...

Michel

Share this post


Link to post
Share on other sites
Rob Roy
1 hour ago, Michel KIRSCH said:

Yes, since Paradox and DBaseII (my first databases too) , there are some technological advances... :-)

Actually before those.   System2000 on IBM Mainframes.  It was a Hierarchical Database.  Loading and configuration were done via COBOL.  Now that I have truly dated myself, I shall bow out of the discussion.

Share this post


Link to post
Share on other sites
Michel KIRSCH

:-)

Bye Rob. Have a good day/night

Michel

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×