Jump to content
TNG Community

Require Login for all but googlebot


dlech

Recommended Posts

The last year or so the amount of bots continually fetching data has become unmanageable for me. I have decided to require login for all users. I changed Loginlib.php to provide default username/password entries to a visitor(guest) account. This allows those people who do not want to request an account to still gain entry to my data. But doing this will also kills access for search engine indexing. So I want to require login for all except a select few search engine indexers, for example googlebot.

First let me say I have a technical background but retired decades ago, long for PHP etc. Basically I know enough to be dangerous. I have RIP Prevention Mod already installed. Looking at the code I think I see where it fetches and checks the network data for the user

$remip = getenv( "REMOTE_ADDR" );
    if( !$remip ) $remip = $_SERVER['REMOTE_ADDR'];
    $remhost = getenv( "REMOTE_HOST" );
    if( !$remhost ) $remhost = isset($_SERVER['REMOTE_HOST']) ? $_SERVER['REMOTE_HOST'] : "";
    if( !$remhost ) $remhost = @gethostbyaddr( $remip );
    if( !$remhost ) $remhost = $remip;
    if( $charset === "UTF-8" )
        $remhost = utf8_encode($remhost);
    else
        $remhost = utf8_decode($remhost);

To check for googlebot...

if (strpos($remhost,"google") !== false) {
            $remhost = "googlebot.com";
            $remip = "66.249.6x.x";
    } elseif (strpos($remip,"66.249.6") !== false) {
            $remhost = "googlebot.com";
            $remip = "66.249.6x.x";

    }

I do not know the sequence for the call to RIP Prevention Mod and Login/Loginlib code to know which code gets executed first. But looking at it simplistically I want to do something like

if $remhost !== "googlebot.com" &&  $remip !== "66.249.6x.x"

   require Login

Hopefully you can see what I'm driving at.  Any suggestions would be appreciated.

Thanks

 

 

 

Link to comment
Share on other sites

  • 6 months later...

By The way.
You also need to set this in Setup >> Configuration >> General Settings >> Privacy: Require Login: Yes

Also check the other setting there.

Link to comment
Share on other sites

  • 2 months later...

Hi Rob,

Thank you for your reply. In April 2025 when I started working on this, I had RIP Prevention Mod and RIP Challenge Mod installed. Don’t recall seeing Bot Manager Mod back then.

What I ended up with is this. I modified code so GoogleBot and MsnBot are allowed unrestricted access but all other access requires the user to login. I changed the Login page so it provides a preset username and password so that all guest users login using the same Guest account. I also modified and enhanced the Admin log messages so I could get more detail on who was attempting to access data and what data they are trying to access (i.e. php module name, IP address, domain name, etc).

At that time, I also started saving Google and Msn index packets per hour data, URL indexed counts, and Guest access counts in a spreadsheet. First thing I noticed was how my hosting company and/or AWS who they used was severely restricting the number of Google and Msn indexing packets.  I am very aware of problem hosting companies have with Google and Msn bombarding them with packets but I am also aware that if my website never gets indexed, few will find it. My hosting company sincerely tried to remedy the packet restriction but never really succeeded.

So, September I changed hosting companies picking a non-AWS related host. In 3 months, the number of indexed URL’s doubled and the number of new legit users per day tripled.

To help matters I periodically scan Admin log message to address the worst offenders by blocking specific IP addresses or subnets. The new hosting company also provided the ability to block users by geographic location so the worst offenders in the Far East can be blocked easily. That was a huge help.

I’ve been running with the modified code since April with few glitches. I know that blocking offenders is an unwinnable game of Wack-a-Mole but over time the number of offenders has decreased. If nothing else the login required (Are you Human? technique) seems to thwart the countless AI bots probing for data.

For now, I think I’ve won the battle. But I have absolutely no delusions about winning the war…

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...