A monitoring tool is parsing a big file every x minutes to count the amount of a word inside.
If the counted amount is less than a threshold limit, counter measurements are triggered.
This big file is created once per day.
This is working fine if the file is small. Now think about xml files with a size of gigabytes and this files are on a network storage and you have many of them to monitor.
My solution for this problem is to create a caching layer.
The only thing we need to solve is to detect if the big file has changed.
Using sha256sum on big files takes time.
Using md5sum on big files takes time.
There is one thing that is fast and good enough to be unique.
ls -l jobs_with_channels_multiple_location_nodes.xml | awk '{print $5$6$7$8}'
Where to store the cache?
Just use the name of the big file and add a ".cache" to the name.
What should be in the cache file?
Only two lines. First line is the cache key, second line is the cached count value.
And the logic?
#!/bin/bash
####
# Counts amount of <foo> nodes in provided file.
# To speed up things, we create a cache file.
#
# This is just an logic example file. If you are using it on production, good luck!
#####
# @since 2019-03-06
# @author stev leibelt <artodeto@bazzline.net>
####
SOURCE_FILE_PATH="${1}";
CACHE_FILE_PATH="${SOURCE_FILE_PATH}.count_cache"
if [[ ! -f "${SOURCE_FILE_PATH}" ]];
then
echo ":: Invalid argument provided."
echo " Provided file path >>${SOURCE_FILE_PATH}<< does not exist."
exit 1;
fi
SOURCE_CACHE_KEY=$(ls -l "${SOURCE_FILE_PATH}" | awk '{print $5$6$7$8}' )
if [[ -f "${CACHE_FILE_PATH}" ]];
then
CACHE_KEY=$(head -n1 "${CACHE_FILE_PATH}")
else
CACHE_KEY=""
fi
if [[ "${CACHE_KEY}" == "${SOURCE_CACHE_KEY}" ]];
then
COUNT=$(tail -n1 "${CACHE_FILE_PATH}")
else
COUNT=$(cat "${SOURCE_FILE_PATH}" | grep -c '<foo>')
cat >"${CACHE_FILE_PATH}"<<DELIM
${SOURCE_CACHE_KEY}
${COUNT}
DELIM
fi
echo ${COUNT}
Ich bin recht faul. Aus diesem Grund lass ich mir die Zertifikate einmal im Monat neu generieren. Um die Infrastruktur nicht zu sehr zu belasten, habe ich mir einen anderen Tag, als den Ersten im Monat ausgesucht. Das gleiche gilt für die Uhrzeit.
Given is the fact that you have some processes (like cronjobs) executed via an webserver like apache. Furthermore you have installed and enables apache server status.
To add some re usability benefits, we should divide and conquer the problems into either shell scripts or shell functions. Side note, if I am writing about shell, I am in the bash environment.
What are the problems we want to tackle down?:
find the correct environment
check all available webservers if a process is not running
specify which process should not run and start it if possible
We can put the first two problems into shell functions like the following ones. I am referencing to some self written shell functions. The reference is indicated by the "net_bazzline_" prefix.
#!/bin/bash
#find the correct environment
if net_bazzline_string_contains $HOSTNAME 'production';
NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT=1
else
NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT=0
fi
And the mighty check.
#!/bin/bash
#check all available webservers if a process is not running
####
# @param string <process name>
# @return int (0 if at least one process was found)
####
function local_is_there_at_least_one_apache_process_running()
{
if [[ $# -lt 1 ]]; then
echo 'invalid number of arguments'
echo ' local_is_there_at_least_one_apache_process_running <process name>'
return 1
fi
if [[ $NET_BAZZLINE_IS_PRODUCTION_ENVIRONMENT -eq 1 ]]; then
LOCAL_ENVIRONMENT='production'
else
LOCAL_ENVIRONMENT='staging'
fi
#variables are prefixed with LOCAL_ to prevent overwriting system variables
LOCAL_PROCESS_NAME="$1"
#declare the array with all available host names
declare -a LOCAL_HOSTNAMES=("webserver01" "webserver02" "webserver03");
for LOCAL_HOSTNAME in ${LOCAL_HOSTNAMES[@]}; do
APACHE_STATUS_URL="http://$LOCAL_HOSTNAME.my.domain/server-status"
OUTPUT=$(curl -s $APACHE_STATUS_URL | grep -i $LOCAL_PROCESS_NAME)
EXIT_CODE_OF_LAST_PROCESS="$?"
if [[ $EXIT_CODE_OF_LAST_PROCESS == "0" ]]; then
echo "$LOCAL_PROCESS_NAME found on $LOCAL_HOSTNAME"
return 0
fi
done;
return 1
}
And here is an example how to use it.
#!/bin/bash
#specify which process should not run and start it if possible
source /path/to/your/bash/functions
LOCAL_PROCESS_NAME="my_process"
local_is_there_at_least_one_apache_process_running $LOCAL_PROCESS_NAME
EXIT_CODE_OF_LAST_PROCESS="$?"
if [[ $EXIT_CODE_OF_LAST_PROCESS == "0" ]]; then
echo "$LOCAL_PROCESS_NAME still running"
exit 0;
else
#execute your process
echo 'started at: '$(date +'%Y-%m-%d %H:%M:%S');
curl "my.domain/$LOCAL_PROCESS_NAME"
echo 'started at: '$(date +'%Y-%m-%d %H:%M:%S');
fi
You can put this into a loop by calling it via the cronjob environment or use watch if you only need it from time to time:
Pretty much, the code is below. It is a part of my function collection for the bash.
####
# Replaces a string in all files in given path and below
# taken from: http://www.cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files/
# taken from: http://stackoverflow.com/questions/4437901/find-and-replace-string-in-a-file
# taken from: http://stackoverflow.com/questions/7450324/how-do-i-replace-a-string-with-another-string-in-all-files-below-my-current-dir
#
# @author stev leibelt
# @since 2013-7-30
####
function net_bazzline_replace_string_in_files ()
{
if [[ $# -lt 3 ]]; then
echo 'invalid number of arguments provided'
echo 'command search replace fileextension [path]'
return 1
fi
if [[ $# -eq 4 ]]; then
find "$4" -name "*.$3" -type f -exec sed -i 's/'"$1"'/'"$2"'/g' {} \;
else
find . -name "*.$3" -type f -exec sed -i 's/'"$1"'/'"$2"'/g' {} \;
fi
}