Skip to content

Die KW 27/2019 im Link-Rückblick

Die KW 26/2019 im Link-Rückblick

Die KW 25/2019 im Link-Rückblick

Die KW 24/2019 im Link-Rückblick

Die KW 23/2019 im Link-Rückblick

Die KW 22/2019 im Link-Rückblick

Die KW 21/2019 im Link-Rückblick

Die KW 20/2019 im Link-Rückblick

Die KW 19/2019 im Link-Rückblick

Die KW 18/2019 im Link-Rückblick

Die KW 17/2019 im Link-Rückblick

Die KW 16/2019 im Link-Rückblick

Die KW 15/2019 im Link-Rückblick

Die KW 14/2019 im Link-Rückblick

Die KW 13/2019 im Link-Rückblick

Die KW 12/2019 im Link-Rückblick

Die KW 11/2019 im Link-Rückblick

Die KW 10/2019 im Link-Rückblick

Counting numbers on big files that changes once per day but the check is done multiple times per hour?

Taking the following as a story that exists.

A monitoring tool is parsing a big file every x minutes to count the amount of a word inside. If the counted amount is less than a threshold limit, counter measurements are triggered. This big file is created once per day.

This is working fine if the file is small. Now think about xml files with a size of gigabytes and this files are on a network storage and you have many of them to monitor.

My solution for this problem is to create a caching layer. The only thing we need to solve is to detect if the big file has changed.

Using sha256sum on big files takes time. Using md5sum on big files takes time. There is one thing that is fast and good enough to be unique.

ls -l jobs_with_channels_multiple_location_nodes.xml | awk '{print $5$6$7$8}' 

Where to store the cache? Just use the name of the big file and add a ".cache" to the name.

What should be in the cache file? Only two lines. First line is the cache key, second line is the cached count value.

And the logic?

#!/bin/bash
####
# Counts amount of <foo> nodes in provided file.
# To speed up things, we create a cache file.
#
# This is just an logic example file. If you are using it on production, good luck!
#####
# @since 2019-03-06
# @author stev leibelt <[email protected]>
####

SOURCE_FILE_PATH="${1}";
CACHE_FILE_PATH="${SOURCE_FILE_PATH}.count_cache"

if [[ ! -f "${SOURCE_FILE_PATH}" ]];
then
    echo ":: Invalid argument provided."
    echo "   Provided file path >>${SOURCE_FILE_PATH}<< does not exist."

    exit 1;
fi

SOURCE_CACHE_KEY=$(ls -l "${SOURCE_FILE_PATH}" | awk '{print $5$6$7$8}' )

if [[ -f "${CACHE_FILE_PATH}" ]];
then
    CACHE_KEY=$(head -n1 "${CACHE_FILE_PATH}")
else
    CACHE_KEY=""
fi

if [[ "${CACHE_KEY}" == "${SOURCE_CACHE_KEY}" ]];
then
    COUNT=$(tail -n1 "${CACHE_FILE_PATH}")
else
    COUNT=$(cat "${SOURCE_FILE_PATH}" | grep -c '<foo>')
    cat >"${CACHE_FILE_PATH}"<<DELIM
${SOURCE_CACHE_KEY}
${COUNT}
DELIM
fi

echo ${COUNT}

Hope this helps.

Translate to de es fr it pt ja

Categories: coding
Defined tags for this entry: ,
Vote for articles fresher than 7 days!
[-2] 417 hits

Die KW 9/2019 im Link-Rückblick

Die KW 8/2019 im Link-Rückblick

Migration from Windows Outlook 2007 to Linux Thunderbird

It is doable!

All you need to do is to read this and this migration guides from mozilla.

Or, you just do my step-by-step list.

  • Download Thunderbird version 31.8
  • Download Mail PassView
  • Run mailpass view and note down all important informations
  • Install Thunderbird
  • Open thunderbird and import things (tools->import) - important import each single stuff (mails, contacts)
  • Add your mail accounts
  • Upgrade thunderbird
  • Copy the thunderbird profile (c:\Users\\AppData ...) to your linux pc
  • Edit and update ~.mozilla-thunderbird|thunderbird/profiles.ini
    Translate to de es fr it pt ja

Die KW 7/2019 im Link-Rückblick