Skip to content

Counting numbers on big files that changes once per day but the check is done multiple times per hour?

Taking the following as a story that exists.

A monitoring tool is parsing a big file every x minutes to count the amount of a word inside. If the counted amount is less than a threshold limit, counter measurements are triggered. This big file is created once per day.

This is working fine if the file is small. Now think about xml files with a size of gigabytes and this files are on a network storage and you have many of them to monitor.

My solution for this problem is to create a caching layer. The only thing we need to solve is to detect if the big file has changed.

Using sha256sum on big files takes time. Using md5sum on big files takes time. There is one thing that is fast and good enough to be unique.

ls -l jobs_with_channels_multiple_location_nodes.xml | awk '{print $5$6$7$8}' 

Where to store the cache? Just use the name of the big file and add a ".cache" to the name.

What should be in the cache file? Only two lines. First line is the cache key, second line is the cached count value.

And the logic?

#!/bin/bash
####
# Counts amount of <foo> nodes in provided file.
# To speed up things, we create a cache file.
#
# This is just an logic example file. If you are using it on production, good luck!
#####
# @since 2019-03-06
# @author stev leibelt <artodeto@bazzline.net>
####

SOURCE_FILE_PATH="${1}";
CACHE_FILE_PATH="${SOURCE_FILE_PATH}.count_cache"

if [[ ! -f "${SOURCE_FILE_PATH}" ]];
then
    echo ":: Invalid argument provided."
    echo "   Provided file path >>${SOURCE_FILE_PATH}<< does not exist."

    exit 1;
fi

SOURCE_CACHE_KEY=$(ls -l "${SOURCE_FILE_PATH}" | awk '{print $5$6$7$8}' )

if [[ -f "${CACHE_FILE_PATH}" ]];
then
    CACHE_KEY=$(head -n1 "${CACHE_FILE_PATH}")
else
    CACHE_KEY=""
fi

if [[ "${CACHE_KEY}" == "${SOURCE_CACHE_KEY}" ]];
then
    COUNT=$(tail -n1 "${CACHE_FILE_PATH}")
else
    COUNT=$(cat "${SOURCE_FILE_PATH}" | grep -c '<foo>')
    cat >"${CACHE_FILE_PATH}"<<DELIM
${SOURCE_CACHE_KEY}
${COUNT}
DELIM
fi

echo ${COUNT}

Hope this helps.

uberspace and let's encrypt

Nichts besonderes, ich habe einfach nur das Wissen aus dem wiki und dem blog genommen und ich ein shell script gepackt.

Das Resultat ist folgender Cronjob, der einmal aller 60 Tage läuft.


#!/bin/bash -l
####
# @see:
#   https://blog.uberspace.de/lets-encrypt-rollt-an/
#   https://wiki.uberspace.de/webserver:https?s[]=lets&s[]=encrypt
# @author: stev leibelt <artodeto@bazzline.net>
# @since: 2015-12-28
####

#begin of local parameters
LOCAL_ROOT_PATH='/home/<user name>'
LOCAL_LOG_PATH=$LOCAL_ROOT_PATH'/<path to your log files>'
LOCAL_ACCOUNT='<your.domain.tld>'
#end of local parameters

#begin of parameters for letsencrypt-renewer
LOCAL_CONFIGURATION_PATH=$LOCAL_ROOT_PATH'/.config/letsencrypt'
LOCAL_LOGGING_PATH=$LOCAL_ROOT_PATH'/.config/letsencrypt/logs'
LOCAL_WORING_PATH=$LOCAL_ROOT_PATH'/tmp/'
#end of parameters for letsencrypt-renewer

#begin of parameters for uperspace-prepare-certificate
LOCAL_KEY_PATH=$LOCAL_ROOT_PATH'/.config/letsencrypt/live/'$LOCAL_ACCOUNT'/privkey.pem'
LOCAL_CERTIFICATE_PATH=$LOCAL_ROOT_PATH'/.config/letsencrypt/live/'$LOCAL_ACCOUNT'/cert.pem'
#end of parameters for uperspace-prepare-certificate

letsencrypt-renewer --config-dir $LOCAL_CONFIGURATION_PATH --logs-dir $LOCAL_LOGGING_PATH --work-dir $LOCAL_WORING_PATH &>$LOCAL_LOG_PATH
uberspace-prepare-certificate -k $LOCAL_KEY_PATH -c $LOCAL_CERTIFICATE_PATH &>>$LOCAL_LOG_PATH
In schön gibt es das script auch noch einmal hier.
Großen Dank an uberspace und lets encrypt.

Damit das script funktioniert, müsst ihr natürlich zu erst lets encrypt aufsetzen:


uberspace-letsencrypt 
letsencrypt certonly
Ich bin recht faul. Aus diesem Grund lass ich mir die Zertifikate einmal im Monat neu generieren. Um die Infrastruktur nicht zu sehr zu belasten, habe ich mir einen anderen Tag, als den Ersten im Monat ausgesucht. Das gleiche gilt für die Uhrzeit.

determine if an apache process is still running via bash to prevent multiple instances running

Given is the fact that you have some processes (like cronjobs) executed via an webserver like apache. Furthermore you have installed and enables apache server status. To add some re usability benefits, we should divide and conquer the problems into either shell scripts or shell functions. Side note, if I am writing about shell, I am in the bash environment. What are the problems we want to tackle down?:

  • find the correct environment
  • check all available webservers if a process is not running
  • specify which process should not run and start it if possible

We can put the first two problems into shell functions like the following ones. I am referencing to some self written shell functions. The reference is indicated by the "net_bazzline_" prefix.

#!/bin/bash
#find the correct environment

if net_bazzline_string_contains $HOSTNAME 'production';
    NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT=1
else
    NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT=0
fi

And the mighty check.

#!/bin/bash
#check all available webservers if a process is not running
####
# @param string <process name>
# @return int (0 if at least one process was found)
####
function local_is_there_at_least_one_apache_process_running()
{
    if [[ $# -lt 1 ]]; then
       echo 'invalid number of arguments'
       echo '    local_is_there_at_least_one_apache_process_running <process name>'

       return 1
    fi

    if [[ $NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT -eq 1 ]]; then
        LOCAL_ENVIRONMENT='production'
    else
        LOCAL_ENVIRONMENT='staging'
    fi

    #variables are prefixed with LOCAL_ to prevent overwriting system variables
    LOCAL_PROCESS_NAME="$1"

    #declare the array with all available host names
    declare -a LOCAL_HOSTNAMES=("webserver01" "webserver02" "webserver03");

    for LOCAL_HOSTNAME in ${LOCAL_HOSTNAMES[@]}; do
        APACHE_STATUS_URL="http://$LOCAL_HOSTNAME.my.domain/server-status"

        OUTPUT=$(curl -s $APACHE_STATUS_URL | grep -i $LOCAL_PROCESS_NAME)
        EXIT_CODE_OF_LAST_PROCESS="$?"

        if [[ $EXIT_CODE_OF_LAST_PROCESS == "0" ]]; then
            echo "$LOCAL_PROCESS_NAME found on $LOCAL_HOSTNAME"
            return 0
        fi
    done;

    return 1
}

And here is an example how to use it.

#!/bin/bash
#specify which process should not run and start it if possible

source /path/to/your/bash/functions

LOCAL_PROCESS_NAME="my_process"

local_is_there_at_least_one_apache_process_running $LOCAL_PROCESS_NAME

EXIT_CODE_OF_LAST_PROCESS="$?"

if [[ $EXIT_CODE_OF_LAST_PROCESS == "0" ]]; then
    echo "$LOCAL_PROCESS_NAME still running"
    exit 0;
else
    #execute your process
    echo 'started at: '$(date +'%Y-%m-%d %H:%M:%S');
    curl "my.domain/$LOCAL_PROCESS_NAME"
    echo 'started at: '$(date +'%Y-%m-%d %H:%M:%S');
fi

You can put this into a loop by calling it via the cronjob environment or use watch if you only need it from time to time:

watch -n 60 'bash /path/to/your/shell/script'

Enjoy your day :-).

bash function to search and replace in all files with special extension recursively

Pretty much, the code is below. It is a part of my function collection for the bash.

####
# Replaces a string in all files in given path and below
# taken from: http://www.cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files/
# taken from: http://stackoverflow.com/questions/4437901/find-and-replace-string-in-a-file
# taken from: http://stackoverflow.com/questions/7450324/how-do-i-replace-a-string-with-another-string-in-all-files-below-my-current-dir
#
# @author stev leibelt 
# @since 2013-7-30
####
function net_bazzline_replace_string_in_files ()
{
    if [[ $# -lt 3 ]]; then
        echo 'invalid number of arguments provided'
        echo 'command search replace fileextension [path]'
        return 1
    fi

    if [[ $# -eq 4 ]]; then
        find "$4" -name "*.$3" -type f -exec sed -i 's/'"$1"'/'"$2"'/g' {} \;
    else
        find . -name "*.$3" -type f -exec sed -i 's/'"$1"'/'"$2"'/g' {} \;
    fi
}