Tuesday, 5 December 2017

My first time to FOSDM 2017:

This was written earlier on Feb. however, I never had the time to finish it. lots have happened since then, and soon FOSDEM 2018 will be around. so here it is raw incomplete:

I have made it.  I have attended most of what I wanted to attend today, even though I came late, tired and was worried that I will catch a cold/flu that my fellow passenger had. My first flight to Dubai was delayed due to adverse weather conditions, heavy rains. Caused me lose my flight to Brussels, demanded that I need to be there as it was a crucial event for me. A moment to cherish in my life of open source fan.

So First I used the city mapper app for Brussels, and it recommends or uses ecab. ecab is useless, could not find me a suitable Taxi on time. So, switched to Uber, and voila, my ride was ready in a couple of minutes, dropped me near the info desk where I bought my FOSDEM 2017 shirt, not a hoodie, I had to go to another info desk, but there was no time the keynote is already starting on J

Lessons Learned:

Get Familiar with the university campus rooms, especially talks you want to attend if they are not in the same room.

Be in the room as early if possible, if the talk is really important, attend one talk earlier. I was so disappointed to miss the HPC talk, even though I was 30 mins earlier, there was already a queue, and the room was full. if I came to a talk earlier, I would have made it.

The general guest wifi connection is not that reliable, and my roaming restricting me to one telecom network, ensure that you have your favourite talks printed hardcopy, and which room. there was something about not buying a Belgium sim card unless you buy it 1 working day in advance as its data package will not be active straight away if you are thinking using a local data package.

Food as in snacks and drinks are almost everywhere.

Saturday, 3 December 2016

Dhahran-Docker November 2016 meetup

On Wednesday, 16th November, Dhahran Docker meetup participated in the global #learndocker event "Global Mentor Week." two useful resources that were used to some extent were PWD: Play With Docker and Katacoda.

There were lots of questions asked during and after the meetup. So here I am trying to capture to the best of my knowledge Docker and containers questions and some of the answers for the future meetup mentors as a taste of things to come, I will  try to update and answer more questions, however I encourage joining the Docker community Slack channel and Forum for any further questions.


Q: I am still confused what is the difference between image, and containers, aren't container and image both the same thing?

A: Think of the image is the golden template, and when Docker run it, it creates an instance of it in memory with the required customization "port exposed, environment variables set to configure an aspect, volumes bound, network connected,.., etc";  when the process/container is done, it still exists but now in disk, in case you need to create a  template from it, otherwise one needs to clean them up periodically. Although that question seems easy to answer, it was the most asked question from everyone at some time;

from Stackoverflow.com docker-image-vs-container

Docker Images vs. Containers

In Dockerland, there are images, and there are containers. The two are closely related, but distinct. For me, grasping this dichotomy has clarified Docker immensely.

What's an Image?

An image is an inert, immutable, file that's essentially a snapshot of a container. Images are created with the build command, and they'll produce a container when started with a run. Images are stored in a Docker registry such as registry.hub.docker.com. Because they can become quite large, images are designed to be composed of layers of other images, allowing a minimal amount of data to be sent when transferring images over the network.
Local images can be listed by running docker images
REPOSITORY                TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu                    13.10               5e019ab7bf6d        2 months ago        180 MB
ubuntu                    14.04               99ec81b80c55        2 months ago        266 MB
ubuntu                    latest              99ec81b80c55        2 months ago        266 MB
ubuntu                    trusty              99ec81b80c55        2 months ago        266 MB
<none>                    <none>              4ab0d9120985        3 months ago        486.5 MB
Some things to note:
  1. IMAGE ID is the first 12 characters of the true identifier for an image. You can create many tags of a given image, but their IDs will all be the same (as above).
  2. VIRTUAL SIZE is virtual because its adding up the sizes of all the distinct underlying layers. This means that the sum of all the values in that column is probably much larger than the disk space used by all of those images.
  3. The value in the REPOSITORY column comes from the -t flag of the docker buildcommand, or from docker tag-ing an existing image. You're free to tag images using a nomenclature that makes sense to you, but know that Docker will use the tag as the registry location in a docker push or docker pull.
  4. The full form of a tag is [REGISTRYHOST/][USERNAME/]NAME[:TAG]. For aboveubuntu, REGISTRYHOST is inferred to be registry.hub.docker.com. So if you plan on storing your image called my-application in a registry at docker.example.com, you should tag that image docker.example.com/my-application.
  5. The TAG column is just the [:TAG] part of the full tag. This is unfortunate terminology.
  6. The latest tag is not magical, it's simply the default tag when you don't specify a tag.
  7. You can have untagged images only identifiable by their IMAGE IDs. These will get the <none> TAG and REPOSITORY. It's easy to forget about them.
More info on images is available from the Docker docs and glossary.

What's a container?

To use a programming metaphor, if an image is a class, then a container is an instance of a class—a runtime object. Containers are hopefully why you're using Docker; they're lightweight and portable encapsulations of an environment in which to run applications.
View local running containers with docker ps:
CONTAINER ID        IMAGE                               COMMAND                CREATED             STATUS              PORTS                    NAMES
f2ff1af05450        samalba/docker-registry:latest      /bin/sh -c 'exec doc   4 months ago        Up 12 weeks>5000/tcp   docker-registry
Here I'm running a dockerized version of the docker registry, so that I have a private place to store my images. Again, some things to note:
  1. Like IMAGE ID, CONTAINER ID is the true identifier for the container. It has the same form, but it identifies a different kind of object.
  2. docker ps only outputs running containers. You can view stopped containers with docker ps -a.
  3. NAMES can be used to identify a started container via the --name flag.

How to avoid image and container buildup?

One of my early frustrations with Docker was the seemingly constant buildup of untagged images and stopped containers. On a handful of occassions this buildup resulted in maxed out hard drives slowing down my laptop or halting my automated build pipeline. Talk about "containers everywhere"!
We can remove all untagged images by combining docker rmi with the recent dangling=truequery:
docker images -q --filter "dangling=true" | xargs docker rmi
Docker won't be able to remove images that are behind existing containers, so you may have to remove stopped containers with docker rm first:
docker rm `docker ps --no-trunc -aq`
These are known pain points with Docker, and may be addressed in future releases. However, with a clear understanding of images and containers, these situations can be avoided with a couple of practices:
  1. Always remove a useless, stopped container with docker rm [CONTAINER_ID].
  2. Always remove the image behind a useless, stopped container with docker rmi [IMAGE_ID]
more from Stackoverflow "whats-the-difference-between-a-container-and-an-image"

Images are frozen immutable snapshots of live containers. Containers are running (or stopped) instances of some image.
Start with the base image called 'ubuntu'. Let's run bash interactively within the ubuntu image and create a file. We'll use the -i and -t flags to give us an interactive bash shell.
$ docker run -i -t ubuntu  /bin/bash
root@48cff2e9be75:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@48cff2e9be75:/# cat > foo
This is a really important file!!!!
root@48cff2e9be75:/# exit
Don't expect that file to stick around when you exit and restart the image. You're restarting from exactly the same defined state as you started in before, not where you left off.
$ docker run -i -t ubuntu  /bin/bash
root@abf181be4379:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@abf181be4379:/# exit
But, the container, now no longer running, has state and can be saved (committed) to an image.
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED              STATUS                          PORTS                      NAMES
abf181be4379        ubuntu:14.04        /bin/bash              17 seconds ago       Exited (0) 12 seconds ago                                  elegant_ardinghelli    
48cff2e9be75        ubuntu:14.04        /bin/bash              About a minute ago   Exited (0) 50 seconds ago                                  determined_pare        
Let's create an image from container ID 48cff2e9be75 where we created our file:
$ docker commit 48cff2e9be75 ubuntu-foo
Now, we have a new image with our really important file:
$ docker run ubuntu-foo /bin/cat foo
This is a really important file!!!!
Try the command docker images. You should see your new image ubuntu-foo listed along with the ubuntu standard image we started with.

Q: the "FROM" directive in Dockerfile, Shouldn't the best practice pin the image with a tag?

A: Yes, I think it would be a good practice to pin the base image in the FROM directive line in Dockerfile. The reasoning is the default is latest, that could mean if someone rebuilds the image again, he is not guaranteed original functionality as things might have changed in future base images that might affect the other layers, not guaranteed security also if it will be pulled from non-trusted repository/hub.

Q: What if my image was compiled with optimisation in mind, would that work with another platform that does not have the instruction set?

No, it would not work,  if one is targeting portability, he will build the image for the lowest denominator with no special platform specific compiler flags. There could be internal Enterprise use cases where different builds of the containers for particular platform, tags could be used to distinguish such images.

Q: How would containers help us startups? Do you have use cases?

Docker history has some good lessons to learn from, Docker by itself is a good use case. When Solomon Hykes the Docker founder started up dotcloud in Y Combinator and how the dotcloud team was supporting other Y Combinator startups build, deploy and ship their apps into AWS, and the 20 startups from YC that adopted Docker early. Check out the origin story of dotcloud and how it transformed to Docker.

Also, the rich Docker Hub app repository, regarding application subsystems, and ready components. If one have an idea how to help a customer, it can be fast and easy to prototype into a fully working application or build an appliance. The entire Docker Ecosystem to create a full working operational and development workflow for the prototype and help the application scale.

In most cases, containers are an alternative to virtual machines. So it's easier to talk to people that already have virtual machines in use as they can relate. Containers could prove faster, more performant, and easy to manage.

One case I had seen recently with Zenoss when they moved their architecture to Docker containers, it helped them ship a standalone version of a complex distributed application in a scalable, manageable approach. That customers can install Zenoss behind their firewall. The Zenoss distributed architecture is very complicated, it uses a mix of many solutions, but still user's the whole installation, upgrade, and management experience have been inspiring. 

How does Docker differ or relate to vagrant? How about docker-machine vs. vagrant?

The essence of Docker is the a-z of the software application lifecycle regarding build, ship, and run; that could be in any environment, development, production, QA, and could be multiple of environments mixed. Vagrant, on the other hand, is meant to be mainly for development, the Vagrant file is more or less like  the Docker compose YAML file describes the environment setup but doesn't describe how  the application instances or images are created, in Docker the images can be described using Dockerfile if not pulled from a registry/repository/Docker hub.

How can swarm allocate a container that might later overuse memory resources of the host?

How does Docker hub ensure images pushed to are lawfully licensed?

Some questions related to differences between images and containers
some questions when to use virtualizations vs. container or can one use both? 
Will it be possible to run Windows containers on Linux platform, or the opposite without docker-machine?
Can one enforce network usage limits,? Bandwidth on a container?
Not yet in current Docker versions at the time of this blog writing. There are several closed/open Github issues related to throttling/limiting network bandwidth for a container.  For example, Github shivacherukuri Docker-Network-Bandwidth solution is one way to go about it.

One questions about build Docker images for a legacy app that its container size reached 10G in size have asked him to follow up with me with details?

What is the size limit or expectations of container size?

it depends on the storage driver, Docker version used, and file system limits. for example for Docker 1.10+ and devicemapper it can be increased over 10G: Docker 1.10 daemon option to increase the basedevice size

Questioning the numbers of 7B+ downloads, the percentage of Docker in production? Having access to the application survey would have helped.?

As for the downloads number these were from Docker Hub, and most of it is the community images, a quick search today in the newly store, shows around 439,440 community images, and as one can see captured image below, some of these images had already been downloaded over 10M+ times

Docker Store showing community images with 10M+ downloads

there are several surveys done that report different statics regarding the adoption of Docker in production and in the Enterprise, however, all of them almost agree on the rate of growth of using Docker in production. one good resource discussing some of these statistics earlier this year is
Coscale Docker usage statistics increased adoption by enterprises and for production use

Some other example surveys:
  • https://clusterhq.com/assets/pdfs/state-of-container-usage-june-2016.pdf
  • https://redmonk.com/fryan/2016/12/01/containers-in-production-is-security-a-barrier-a-dataset-from-anchore/
  • https://www.datadoghq.com/docker-adoption/
  • https://www.cloudfoundry.org/wp-content/uploads/2016/06/Cloud-Foundry-2016-Container-Report.pdf
All these numbers are obsolete by now due to the rapid growth Docker and containers, in general, are seeing.

Non-technical  questions:

I did register for the Docker community, I do not see the general/ Global-Mentor-Week channel?

After you sign up for the Docker community group, you will get in your email a Slack invite, this is a manual process and could take some from few hours to a couple of days. Also double check your spam folder for any Docker/Slack related emails. 

 Can I download the training instructions offline?

I have raised the question in Docker community slack.  However, all training is now accessible free online; you should note that the practice is based on Docker community GitHub repos for the development course check it out at  Docker Github Labs, as for the operations, you can find it here Docker Orchestration Workshop

Some business related questions to local support, and resellers/partnerships in Saudi?

check  https://www.docker.com/docker-support-services#/faq and email sales@docker.com

Other questions:

Why the voter app breaks build breaks in Linux platform but does not on Windows? And using proposed solution makes the app not function right?

What is the percentage of serious critical/stateful business enterprise applications compared to web/cloud apps? Is Docker mostly for web/cloud apps?

Will there be professional certifications and exams on Docker ECO system?

As far as I know, Red Hat have some courses and exams:

Doing an ldd/pmap inside a container, how can the view from inside the container relate to the outside view from the host system, and what is static vs. dynamic in here?

Friday, 2 December 2016

Devops and traditional HPC

Last April, I have co-presented in Saudi HPC 2016 a short talk titled "What HPC can learn from DevOps." It was meant to bring awareness to DevOps culture and mindset to HPC practitioners. Following day, the talk was complemented by an Introductory tutorial to containers. This talk and tutorial were my second contribution promoting DevOps locally. The first attempt was In Saudi HPC 2013 with the DevOps afternoon; in which we had speakers from Puppet and Ansible with good examples back then of how automation "Infrastructure as code" frameworks encourage communications, visibility and feedback loops within the organisation.

Talk Abstract: 

Cloud, Web, Big Data operations and DevOps mindsets are changing the Internet, IT and Enterprise services and applications scene rapidly. What can HPC community learn from these technologies, processes, and culture? From the IT unicorns "Google, Facebook, Twitter, Linkedin, and Etsy" that are in the lead? What could be applied to tackle HPC operations challenges? The problem of efficiency, better use of resources? A use case of automation and version control system in HPC enterprise data centre, as well a proposal for utilising containers and new schedulers to drive better utilizations and diversify the data centre workloads, not just HPC but big data, interactive, batch, short and long-lived scientific jobs.

Here are some of my personal talk notes at that time. Apparently, they did not fit the 15 minutes window I was given.

Talk reflections and thought points:

Definitions: Presenting the different possible HPC workloads: HTC, HPC, HSC, and the recent trend in Data Centre convergence by considering BigData “Analytics” and more recently MLDM “Machine learning, Data mining.” Highlighting the diversity and variability of HPC workload, then moving to what DevOps means to HPC, Why it did not pick up as much? What HPC can learn from Enabling, cloud, and Big Data operations?

The disconnect: Traditional HPC software changes are infrequent; HPC does not need to be agile handling frequent continuous deployments. Each HPC cluster deployment is a snowflake unique in its way, making it hard for group users to port their work to other clusters, a process that takes days, weeks, often months.  The concept of application instrumentation and performance monitoring is not the norm, nor the plumbing and CI/CD pipelines.

The motivation: However, HPC infrastructures inevitably have to grow, innovations in HPC hardware requires a new look into HPC software deployments and development, HPC data centres will need them few highly skilled operational engineers to scale operations with fewer resources efficiently. The defragmented use of system resources needs to be optimised. The scientific and business applications might be rearranged, refactored, reworked to consider better workflows. Analysing application and data processing stages and dependencies looking at them as a whole and connected parts while avoiding compartmentalization and infrastructure silos.

The scalability Challenge: What could be the primary HPC driver to introduce DevOps culture and tooling?  Can't stress enough on scalability (the imminent growth due to initiative like national grids, and International Exascale computing, the workload, number of nodes, number of personalities or roles an HPC node might take)

DevOps tools: Emphasise richness of the tool set and culture that have driven tools evolution. Pointing out it is not about the tools, more than the concepts that tools enable. Not just automation, building, shipping and delivery workflows, but the ever engaging feedback loops, the collaboration, ease because of integration, highlight that communication and feedback are not just the human face-to-face but also the meaningful dashboards and actionable metrics,  the importance of code reviews, the rich API, the natural UX.  Such comprehensive set of tools and unlike the current HPC defragmented alternatives or in some cases Enterprise tools used wrongly for HPC.

Use case of differences:  The case of Provisioning; and how the terminology differs between the HPC and web/cloud communities. Taking this example further to pivot to the false assumptions of HPC can be just bare-metal provisioning.

Validation: Validation of the hypothesis of serious HPC workload in the cloud, and recent use cases for containers deployment in HPC from surveys and production ready vendor solution trending the last couple of years may be present some of the related HPC cloud news.

2nd Generation Data Centre provisioning tools: Alternatives, offer open source alternatives to traditional HPC provisioning tools and highlight their diversity in handling bare-metal, virtual images instances, and containers. As well the possibilities for combining this with diskless and thing OS hosts.

The current state of the HPC Data Centre:  Highlight the problem of static partitioning (silos), and the various workload needed to either support or complement the bigger business/scientific application and discuss valid reasons of partitioning.

Resource Abstraction:  What if we abstract the data centre resources, and break down the silos? How should that be done?  What core components need to be addressed? Why? Present an example proposal of such tooling with the reasoning behind it.

Unit of change:  Containers technology is a useful enabling technology for such problems. Does not have the performance overhead issues that HPC shied away from in virtualisation related solutions, and will enable portability for the various HPC workload deployments. Not to mentions the richness of its ecosystem to enhance the current status quo of scheduling, resources, and workload management to greater levels of efficiency and better utilisation of Data Centre resources.

The software-defined data Centre:  Everything so far can be either code or managed and monitored by code. How flexible is that? And what new opportunities it brings?  How can everything be broken down into components? How parts integrate and fit together? enabling a “Lego style” Compose-able infrastructure driven and managed by code, policies, and desired state models. How has code opened new possibilities to stakeholders?

Some Docker evaluation use cases:

Challenges ahead: The road ahead expectations? The unique differences and requirements?  Which underlying container technologies need to be in place and for what?  The right amount of namespace isolation vs. cgroups control, how about LXC, LXD, Singularity, Docker? What would we see coming next?

The importance of having the right mindset to evaluate, experiment new paradigms and technologies, eventually deploy and utilise them in production; introduce new workflows, enable better communication between the different teams (developers, users, security, operations, business stakeholders). The concept of indirection and abstraction to solve computer problems, in this case, the 2-level indirection scheduling for granular resource management. The container unit concept for the workload is not just for applications; it could also be for data.

to be continued ...



Saturday, 22 March 2014

What if Ansible Run Hangs?

Running  Ansible against 1000s of nodes, not fully aware of some of the node status before the run, some were heavily loaded, and busy, some were down. such highly loaded of  OOM nodes or even some of the play-book tasks are prone to wait, and blockage, all of these conditions will cause Ansible to hang. below are some of the steps that I followed or were collected from Ansible mailing list* to help debug such a hang:

Is it the initial connection?

use -vvvv to trouble shoot the connection

What you call hung could be normal unless not intended:

from Ansible playbooks async :

By default tasks in play-books block, meaning the connections stay open until the task is done on each node. This may not always be desirable, or you may be running operations that take longer than the SSH timeout.

Is it the remote executed task ?

  •  Run ansible-playbook with ANSIBLE_KEEP_REMOTE_FILES=1
  • create a python tracefile

$ python -m trace --trace 
2>&1 | head 
  --- modulename: command, funcname: <module> 
command(21): import sys 
command(22): import datetime 
command(23): import traceback 
command(24): import re 
command(25): import shlex 
  --- modulename: shlex, funcname: <module> 
shlex.py(2): """A lexical analyzer class for simple shell-like syntaxes.""" 
shlex.py(10): import os.path 
shlex.py(11): import sys 

Possible causes of hangs :

  • stale shared file system in the remote targeted node
  • if it is a yum related task, and another yum process is running already in targeted node
  • Module dependency such as requirement to add the host in advance to known_hosts or forwarding SSH credentials.
  • some issues with sudo, where the ssh user and the sudo user are the same but sudo_user is not specified.
  • some command module tasks are expecting input from stdin
  • setup module could hang due to hardware or os related issue, updated firmware, drivers could help
  • network, or firewall related, or change of network/firewall/load balancing caused by Ansible run
  • could it be a lookup issue (e.g DNS,  or user look up)

* Thanks to Michael Dehaan and Ansible developers for a an awesome code,  and thanks to James Tanner for his help and pointers in the Ansible users mailing list, and IRC.

* This was written at the time of Ansible 1.4.2 in RHEL/CENTOS based environment,  ssh connections could even be further improved by enabling ControlPersist nor pipelining mode 

Thursday, 30 January 2014

DFIR Dec. 2013 Memory Forensics Challenge notes :

This is my first memory forensics outside of SANS 508 SIFT workstation investigating Timothy Dungan
workstation "Stark Research Labs Intrusion case by Hydra" . So even though I believe that I have answered the questions that were asked in the SANS DFIR blog , there are lots still to learn and more skills to sharpen.  Using lots of  curiosity, volatility, redline, and SIFT workstation it is easy to run a memory investigation especially if one is quipped by SANS508 course material and volatility IRC channel.  Below are my scattered notes from three separate sessions, the overall time it took is over 7-8 hours, it could have been done in one session with more focus and less distraction form the kids.

[ note to oneself : collect reports and screenshots more next time, and write report as you go along ]

Using Mandiant Redline:

Used Redline white listing to filter out a large amount of data that is not likely to be interesting: data that corresponds to unaltered, known-good software components, however, I was not successful at finding red flags "rouge processes" straight away, There were  three suspicious processes i was targeting ,  however could not find the obvious anomaly malware introduced to systems,  so started looking for other low hanging fruits/signals that could give me a good pivot point to start using also the low frequency of occurrence technique and focusing on the DFIR challenge questions asked to keep me focused.

·        Suspicious untrusted  handle pork_bun associated with the explorer.exe process (pid:1672)

 Possible Gobal root kit cloaking activity via  System  Service Descriptor hook:
 The hooking module name looks suspicious irykmmww.sys hooked to ntoskrnl.exe with NtEnumerateKey  , and NtEnumerateValueKey , as well as NtQueryDirectoryFile which are used to hide things:
o   NtEnumerateValueKey : : Allows an application to identify and interact with registry values.   Malware use this insert itself between any registry value request and filter out what value it wants to hide.
o   NtEnumerateKey  : Allows an application to identify and interact with registry Keys.   Malware use this insert itself between any registry key request and filter out any registry keys it may want to hide its value.
o   NtQueryDirectoryFile : Gives the application the ability to perform a directory listing. By hooking this function a malware can hide directories or files from normal file managers as well as anti-malware tools
o   NtDeviceIoControlFile, the API Windows uses to do network related stuff and has been widely mentioned in malware behavior analysis papers. Malware can use it to replay network traffic, how cool is that?!

Not to mention my company campus ISP blocks me from doing some more research ;-)

Not that it cannot be overridden with any vpn connection.

Tried to acquire the driver for further analysis, however Redline couldn’t dump it, you will see later i was able to dump it with volatility which proves why you need to know more than one tool, as most likely one tool will not be fit for all situations and always tools will fail you most when you need them. 

Using Volatility to cross check and dig deeper:

Treating it as a real case, preserving the initial image as read only image and its hash value:

                 $ sudo chattr +i dfir-challenge/APT.img

To start processing we need to know more about the image file profile, so we run imaginfo

sansforensics@SIFT-Workstation:/cases/dfir-challenge$ vol32.py -f ./APT.img imageinfo
Volatile Systems Volatility Framework 2.1_alpha
Determining profile based on KDBG search...

Suggested Profile(s) : WinXPSP3x86, WinXPSP2x86
AS Layer1 : JKIA32PagedMemoryPae (Kernel AS)
AS Layer2 : FileAddressSpace (/cases/dfir-challenge/APT.img)
PAE type : PAE
DTB : 0x319000
KDBG : 0x80545b60L
KPCR : 0xffdff000L
Image date and time : 2009-05-05 19:28:57
Image local date and time : 2009-05-05 19:28:57
Number of Processors : 1
Image Type : Service Pack 3


The normal process scan for the processes that are not supposedly hidden by unlinking the double linked list process structure.

Cross examining the processes seen normally via the doubly linked list vs. the ones scrapped from memory structures:

Scanning for network artifacts, since this is assumed to be an APT "advanced persistent threat" case, one good lead would if the box was infected at some time malware will have to connect with Covert Command-and-control (C2channels, or if this was not the one with the originally  infected malware, data exfiltration activity should leave some bread crumbs for us to trace.

  interestingly enough from the connection scan above we see port 443 which is usually firewall friendly port appears to be either inactive or stealth.  However it is from the same process to the same IP, the process is explorer.exe (1672). trying to find where is that ip using whois for the ip, as seen below we find out that the ip belongs to our friends in China state owned ISP in Beijing

Usually malware will set a mutant so that it does not cause issues again to the system or itself by trying to install or over configure itself,  that is done by checking if a certain mutant exists. one interesting mutant I have seen In both redline and volatility was: The pork_bun mutant

Now that I am quite confident that expolere.exe pid:1672 is the rouge process. Finding which process file have the malware  in case it was injected or hollowed is quite tedious task, however double cheking least frequent strange named unsigned handles starting with the executable DLLs , as well as SDT hooks,

both dll search, and  ssdt hooks via volatility arrived at the same conclusion as Redline, and this time I was able to dump the driver irykmmww.sys and confirmed its rouge using virustotal

Most of the virustotal findings point to a generic trojan/backdoor root kit installed using an exploit not spread like a virus, via social engineering, probably phishing as is the norm with APT, however i am not able to tell with the existing research so far.

virustotall also confirmed that an alternative of the notorious Poison Ivy Trojan was used, which famously was used to attack RSA's SecurID infrastructure in 2011, going strong after eight years and is being used in targeted attacks.

Other findings that the malware logs its findings or activity to :


So doing filescan and saving it to file for further analysis I can see a suspicious other files explorer file or two, for example 

'\\WINDOWS\\system32\\exploder.exe' does not make sense to be running under system32?!

and with that i have the 5 DFIR questions answered almost, the process was 1672 explorer.exe, thirykmmww.sys is what is hiding the malware artifacts from the system, and persistence  most likely achieved with dll injection  via the irykmmww.dll.

there is more for me to follow up, and research, and more notes that I should have collected real time and post. hopefully next investigation would prove more conclusive and complete, and I would be then more familiar with windows internals.

final note: SANS recommends highly that "Intrusion/Incident  reports" not  to state personal opinions and present facts only, however for my learning process I have put some of my opinions, and hopefully will validate them soon if SANS DFIR publish their  solution.  

Saturday, 7 December 2013

Dynamic Test/Evaluation Environment 

 Vagrant, Ganeti , Openstack are great tools for a dynamic data-driven test environment. couple them with a configuration management CFEngine3, Chef, puppet, , Ansible, or Saltstack and you will start having more time on your hand, and appreciating life around you. The possibilities are endless if you are looking for a backend highly available infrastructure Ganeti is your solution, used already by "Open Source Labs", Google, Mozilla, Greek Research and Technology Network, among others to manage cluster of virtual environments with resilience in mind. if you are looking for flexibility and providing your users with a private cloud solution Openstack will do. for testing new administration tools, policies, cookbooks, manifests, play books and blue prints than Vagrant is the way to go add the combination of these three together and you have dynamic solutions that scale in your own laptop or workstation from few virtual nodes to Amazon EC2, or your own company private cluster environment.