Docker Cheat Sheet


List all containers (only IDs)

docker ps -aq

Stop all running containers

docker stop $(docker ps -aq)

Remove all containers

docker rm $(docker ps -aq)

Remove all images

docker rmi $(docker images -q)
Advertisements

Python Issue Fix


Fix Python Permission issue:

sudo chown -R $USER /Library/Python/2.7

Install Specific Version:

pip install sox==1.9

Install with ignoring

pip install -U docker-compose –ignore-installed six

Install with ignoring Error

pip install –ignore-installed six

 

Distributed Tracing Course


Why do we need this

  • make debugging easier
  • solve latency issue
  • understand what’s going on in the system

History

  • Metrics: Aggregate events per service, understand trends and alerts. It only host level, you can’t see service relation.
  • Logs: Flexible, Log whatever we want, Correlate by time with our eye. Problem: Brittle and Not Reliable.
  • Manual Tracing: Get log, Get metrics, Grab Developers to eyeball the logs and identify the problem, check latency, etc.
  • Implement Distributed Tracing

Distributed Tracing

What is it

  • capture event/span that contains info Begin, End, Name Event, and metadata
  • build collections span to a graph. This is Trace

Open Tracing

  • Vendor-neutral API for tracing instrumentation
  • Designed for application/OOS developers first and forest
  • Linux and Cloud Computing Foundation
  • Old Name: “Distributed Context Propagation” => OpenTracing for better marketing.

Road to Distributed Tracing

Implementation Steps

1. Build Instrumentation

  • It’s hard & Tedious
  • Language, Thread pool, Protocol (HTTP, Messaging), Combination

2. Span report and Aggregation

  • Take files
  • Pool file and store to Kafka
  • Seems like collector
  • Direct save to API Call if it’s only single service
  • The problem if service is a lot

3. Deploy Instrumentation

  • Work for the team, convince them to do that and it’s useful for the company.
  • Sampling < 1%
  • If something break, they blame the framework developer.

4. Trace processing and Storage

  • Collector take spans
  • Spark Job collect the span, construct traces & send the result to Elastic Search
  • Elastic Search store data.
  • Zipkin Spark Streaming

5. Trace Visualization

  • Elastic Search Store
  • Zipkin UI to search
  • The Wall (Company Metrics) UI

Understand, debug, tune distributed system

  • Identify service interaction (Duration, Services interaction, Depth)
  • Identify duplicate computation. it will improve latency
  • Debug Distributed System: identify which cluster who made the call. Ex. The query is slow, show the query and give the server address
  • custom application span: identify which team.
  • Identify Clock Skew: when clock skew is different in machines
  • Identify Serial Execution, sometimes we want something parallel but it becomes a serial request

Baggage:

  • add user data field in metadata, and propagate to another context.
  • if we put too much, it will impact performance like network and CPU

Problem

  • expensive, complexity, sampling, storage
  • we don’t really need all the data
  • how do we capture data that we want?
  • do we need to trace everything/code paths?

Lesson Learnt

  • we need to understand why distributed tracing is valuable for the company
  • trace the most valuable path of the application
  • quality trace is more important than quantity.

Resources:

  1. Distributed Tracing at Pinterest
  2. Distributed tracing and Capacity Planning
  3. Spring Microservice in Action: Distributed Tracing with Spring Cloud Sleuth and Zipkin
  4. Distributed Tracing use cases
  5. Application tracing tutorial 1
  6. Application tracing tutorial 2
  7. Application tracing tutorial 3

Linux Shell Script


Find

-- Find corrupted zip file
find . -name "*.zip" -exec unzip -t "{}" \;  > rrrr.txt
find . -name "*.zip" -exec echo "{}" \;
{} means current file

-- Find file and limit the result
find 10587727/ -name "*.pdf" | head -2T

-- Find empty directory
find . -empty | sort

-- Find file lower than certain size:
find . -type f -size -200k -iname "*.pdf" -printf '%s %p\n'| sort -n | head -100`

Text Searching

cat zip_test.txt | grep pdf-archive.zip | sort | uniq