Skip to content

Counting numbers on big files that changes once per day but the check is done multiple times per hour?

Taking the following as a story that exists.

A monitoring tool is parsing a big file every x minutes to count the amount of a word inside. If the counted amount is less than a threshold limit, counter measurements are triggered. This big file is created once per day.

This is working fine if the file is small. Now think about xml files with a size of gigabytes and this files are on a network storage and you have many of them to monitor.

My solution for this problem is to create a caching layer. The only thing we need to solve is to detect if the big file has changed.

Using sha256sum on big files takes time. Using md5sum on big files takes time. There is one thing that is fast and good enough to be unique.

ls -l jobs_with_channels_multiple_location_nodes.xml | awk '{print $5$6$7$8}' 

Where to store the cache? Just use the name of the big file and add a ".cache" to the name.

What should be in the cache file? Only two lines. First line is the cache key, second line is the cached count value.

And the logic?

#!/bin/bash
####
# Counts amount of <foo> nodes in provided file.
# To speed up things, we create a cache file.
#
# This is just an logic example file. If you are using it on production, good luck!
#####
# @since 2019-03-06
# @author stev leibelt <[email protected]>
####

SOURCE_FILE_PATH="${1}";
CACHE_FILE_PATH="${SOURCE_FILE_PATH}.count_cache"

if [[ ! -f "${SOURCE_FILE_PATH}" ]];
then
    echo ":: Invalid argument provided."
    echo "   Provided file path >>${SOURCE_FILE_PATH}<< does not exist."

    exit 1;
fi

SOURCE_CACHE_KEY=$(ls -l "${SOURCE_FILE_PATH}" | awk '{print $5$6$7$8}' )

if [[ -f "${CACHE_FILE_PATH}" ]];
then
    CACHE_KEY=$(head -n1 "${CACHE_FILE_PATH}")
else
    CACHE_KEY=""
fi

if [[ "${CACHE_KEY}" == "${SOURCE_CACHE_KEY}" ]];
then
    COUNT=$(tail -n1 "${CACHE_FILE_PATH}")
else
    COUNT=$(cat "${SOURCE_FILE_PATH}" | grep -c '<foo>')
    cat >"${CACHE_FILE_PATH}"<<DELIM
${SOURCE_CACHE_KEY}
${COUNT}
DELIM
fi

echo ${COUNT}

Hope this helps.

Translate to de es fr it pt ja
Categories: coding
Defined tags for this entry: ,
Vote for articles fresher than 7 days!
[-2] 282 hits

Die KW 9/2019 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 8/2019 im Link-Rückblick

Translate to de es fr it pt ja

Migration from Windows Outlook 2007 to Linux Thunderbird

It is doable!

All you need to do is to read this and this migration guides from mozilla.

Or, you just do my step-by-step list.

  • Download Thunderbird version 31.8
  • Download Mail PassView
  • Run mailpass view and note down all important informations
  • Install Thunderbird
  • Open thunderbird and import things (tools->import) - important import each single stuff (mails, contacts)
  • Add your mail accounts
  • Upgrade thunderbird
  • Copy the thunderbird profile (c:\Users\\AppData ...) to your linux pc
  • Edit and update ~.mozilla-thunderbird|thunderbird/profiles.ini
Translate to de es fr it pt ja

Die KW 7/2019 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 6/2019 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 5/2019 im Link-Rückblick

sql2graphite - TypeError: not enough arguments for format string

Just ran into following issue.

Traceback (most recent call last): File "/usr/local/bin/sql-to-graphite", line 11, in <module> sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/sql_to_graphite/__init__.py", line 55, in main get_executor(dsn), File "/usr/local/lib/python2.7/dist-packages/sql_to_graphite/__init__.py", line 25, in run data = map(executor, queries) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 942, in execute return self._execute_text(object, multiparams, params) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text statement, parameters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception util.reraise(*exc_info) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 507, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 187, in execute query = query % tuple([db.literal(item) for item in args]) TypeError: not enough arguments for format string

Result is, that I've created a sql statement with a >>%<< inside. I need to escape it (transform it to >>%%<<) and all is fine.

Translate to de es fr it pt ja
Categories: coding
Vote for articles fresher than 7 days!
Derzeitige Beurteilung: 1 of 5, 2 Stimme(n) 1736 hits

Die KW 4/2019 im Link-Rückblick

Translate to de es fr it pt ja

Running Contao 3.x version and upgrading from PHP 5.x to PHP 7 results in a 500 Status?

I had the joy to debug a not working contao system line by line.

Why line by line? Because there was no entry in any logs, even with "log all motherfucker"-php.ini values on.

After parsing the lines, I ended up into this error message.

Fatal error: Cannot use 'String' as class name as it is reserved in ... system/modules/core/library/Contao/String.php on line 28

Well, nice to know the error but where was this triggered? So another round with joy and I ended up with an extension, of course installed "by hand" which means not possible to update by the contao updater and this lovely line.

$this->import('String');

After I've changed that line to the following line, all the gizmos where working again.

$this->import('StringUtils');

So, what should you do when you where faced with a 500 Apache Status Code after a contao installation moved from an PHP 5.x runtime environment to a PHP 7.x runtime environment?

cd <project root> grep -ir "import('String');" *

Replace >>import('String');<< with >>import('StringUtil');<< and that is it.

But all would be better if you are not installing extensions by hand.

Translate to de es fr it pt ja
Categories: coding
Defined tags for this entry: , ,
Vote for articles fresher than 7 days!
Derzeitige Beurteilung: 1 of 5, 3 Stimme(n) 286 hits

Die KW 3/2019 im Link-Rückblick

Translate to de es fr it pt ja

How to upgrade nextcloud from gui to next major version

So, nextcloud version 15.0.x is out already and whenever you are logging into your next cloud with your administrator account, you still get this "you are on the latest version" message.

Further more, you just got a update to version 14.0.x while waiting for version 15.0.x.

To fix this, all you have to do is to switch from the channel "stable" to "beta". After a page reload, you should see a version 15.0.x available. Do the upgrade and do not forget to switch back to the stable channel.

Translate to de es fr it pt ja

Die KW 2/2019 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 1/2019 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 52/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 51/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 50/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 49/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 48/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 47/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 46/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 45/2018 im Link-Rückblick

Translate to de es fr it pt ja

Die KW 44/2018 im Link-Rückblick

Translate to de es fr it pt ja