Archive for the ‘Linux Command Line’ Category

Linux command Line - copy a file from remote to local

Tuesday, August 3rd, 2010

Copy a file from a remote location to a local directory (for example to burn a backup on DVD):

scp backups@server1:backup20100105 local/path

PHP function to get the most recurrent words in a file - it uses Linux command line

Tuesday, August 3rd, 2010

/**
* Extracts the most recurrent one-word and two-word terms in a file
* Filters out some common stop words and you can also pass extra ones
*
* @param string $filepath
* @param int $minWordLength - the minimal word length for the terms to extract
* @param int $numberOfTerms - the number of terms to retrieve
* @return array of string - the most recurrent terms
*/
function getMostRecurrentTermsInFile ($filepath, $minWordLength, $numberOfTerms, array $extraStopWords)
{
$stopwords = array(’a', ‘about’, ‘above’, ‘above’, ‘across’, ‘after’, ‘afterwards’, ‘again’, ‘against’, ‘all’,
‘almost’, ‘alone’, ‘along’, ‘already’, ‘also’,'although’,'always’,'am’,'among’, ‘amongst’, ‘amoungst’, ‘amount’,
‘an’, ‘and’, ‘another’, ‘any’,'anyhow’,'anyone’,'anything’,'anyway’, ‘anywhere’, ‘are’, ‘around’, ‘as’, ‘at’,
‘back’,'be’,'became’, ‘because’,'become’,'becomes’, ‘becoming’, ‘been’, ‘before’, ‘beforehand’, ‘behind’, ‘being’,
‘below’, ‘beside’, ‘besides’, ‘between’, ‘beyond’, ‘bill’, ‘both’, ‘bottom’,'but’, ‘by’, ‘call’, ‘can’, ‘cannot’,
‘cant’, ‘co’, ‘con’, ‘could’, ‘couldnt’, ‘cry’, ‘de’, ‘describe’, ‘detail’, ‘do’, ‘done’, ‘down’, ‘due’, ‘during’,
‘each’, ‘eg’, ‘eight’, ‘either’, ‘eleven’,'else’, ‘elsewhere’, ‘empty’, ‘enough’, ‘etc’, ‘even’, ‘ever’, ‘every’,
‘everyone’, ‘everything’, ‘everywhere’, ‘except’, ‘few’, ‘fifteen’, ‘fify’, ‘fill’, ‘find’, ‘fire’, ‘first’, ‘five’,
‘for’, ‘former’, ‘formerly’, ‘forty’, ‘found’, ‘four’, ‘from’, ‘front’, ‘full’, ‘further’, ‘get’, ‘give’, ‘go’, ‘had’,
‘has’, ‘hasnt’, ‘have’, ‘he’, ‘hence’, ‘her’, ‘here’, ‘hereafter’, ‘hereby’, ‘herein’, ‘hereupon’, ‘hers’, ‘herself’,
‘him’, ‘himself’, ‘his’, ‘how’, ‘however’, ‘hundred’, ‘ie’, ‘if’, ‘in’, ‘inc’, ‘indeed’, ‘interest’, ‘into’, ‘is’, ‘it’,
‘its’, ‘itself’, ‘keep’, ‘last’, ‘latter’, ‘latterly’, ‘least’, ‘less’, ‘ltd’, ‘made’, ‘many’, ‘may’, ‘me’, ‘meanwhile’,
‘might’, ‘mill’, ‘mine’, ‘more’, ‘moreover’, ‘most’, ‘mostly’, ‘move’, ‘much’, ‘must’, ‘my’, ‘myself’, ‘name’, ‘namely’,
‘neither’, ‘never’, ‘nevertheless’, ‘next’, ‘nine’, ‘no’, ‘nobody’, ‘none’, ‘noone’, ‘nor’, ‘not’, ‘nothing’, ‘now’,
‘nowhere’, ‘of’, ‘off’, ‘often’, ‘on’, ‘once’, ‘one’, ‘only’, ‘onto’, ‘or’, ‘other’, ‘others’, ‘otherwise’, ‘our’, ‘ours’,
‘ourselves’, ‘out’, ‘over’, ‘own’,'part’, ‘per’, ‘perhaps’, ‘please’, ‘put’, ‘rather’, ‘re’, ’same’, ’see’, ’seem’, ’seemed’,
’seeming’, ’seems’, ’serious’, ’several’, ’she’, ’should’, ’show’, ’side’, ’since’, ’sincere’, ’six’, ’sixty’, ’so’, ’some’,
’somehow’, ’someone’, ’something’, ’sometime’, ’sometimes’, ’somewhere’, ’still’, ’such’, ’system’, ‘take’, ‘ten’, ‘than’,
‘that’, ‘the’, ‘their’, ‘them’, ‘themselves’, ‘then’, ‘thence’, ‘there’, ‘thereafter’, ‘thereby’, ‘therefore’, ‘therein’,
‘thereupon’, ‘these’, ‘they’, ‘thickv’, ‘thin’, ‘third’, ‘this’, ‘those’, ‘though’, ‘three’, ‘through’, ‘throughout’, ‘thru’,
‘thus’, ‘to’, ‘together’, ‘too’, ‘top’, ‘toward’, ‘towards’, ‘twelve’, ‘twenty’, ‘two’, ‘un’, ‘under’, ‘until’, ‘up’, ‘upon’,
‘us’, ‘very’, ‘via’, ‘was’, ‘we’, ‘well’, ‘were’, ‘what’, ‘whatever’, ‘when’, ‘whence’, ‘whenever’, ‘where’, ‘whereafter’,
‘whereas’, ‘whereby’, ‘wherein’, ‘whereupon’, ‘wherever’, ‘whether’, ‘which’, ‘while’, ‘whither’, ‘who’, ‘whoever’, ‘whole’,
‘whom’, ‘whose’, ‘why’, ‘will’, ‘with’, ‘within’, ‘without’, ‘would’, ‘yet’, ‘you’, ‘your’, ‘yours’, ‘yourself’, ‘yourselves’,
‘the’);

$stopwords = array_merge($stopwords, $extraStopWords);

// placing each word on a separate line
$command = ’sed -e “s/[^a-zA-Z]/\n/g” ‘ . $filepath;
$command .= ‘|’;
// striping out the empty lines
$command .= ‘grep -v “^$”‘;
$command .= ‘|’;
// adding lines combining all adjacent two words
// N.B.: I am commenting the single quotes inside the command
$command .= ‘awk \’(PREV!=”") {printf “%s\n%s %s\n”, PREV, PREV, $1} {PREV=$1}\”;

$command .= ‘|’;
// stripping out common stopwords (the actual command is something like this: grep -Ev ‘(\bis\b|\bsuch\b)’)
$command .= ‘grep -Evi “(\b’;
$command .= implode(’\b|\b’, $stopwords);
$command .= ‘\b)”‘;

$command .= ‘|’;
// removing all the words shorter than $minWordLength characters
$limit = $minWordLength -1;
$command .= “grep -Ev ‘^[a-zA-Z]{1,$limit}$’”;
$command .= ‘|’;
// N.B.: we are commenting the single quotes inside the command
$command .= ’sort | uniq -c | sort -nr’;
$command .= ‘|’;
// stripping out the numbers we use for sorting
$command .= “sed -e ’s/^[^0-9]*[0-9]* //g’”;

$command .= ‘|’;
$command .= ” head -n $numberOfTerms”;

$commandOutput = shell_exec($command);

$commandOutputLines = explode(”\n”, $commandOutput);

// sanitising the return
$ret = array();
foreach ($commandOutputLines as $commandOutputLine)
{
$commandOutputLine = trim($commandOutputLine);
if ( strlen($commandOutputLine) > 0 )
{
$ret[] = $commandOutputLine;
}
}

return $ret;
}

Linux Bash: Append and Prepend Strings to Files

Saturday, October 17th, 2009

I want to append the string “BYE” to the file test.txt

echo “BYE” >> test.txt

Now, I want to prepend the string “HI” to the file test.txt

echo “HI”|cat - test.txt > /tmp/out && mv /tmp/out test.txt

PHP - Execute Linux Commands

Wednesday, July 1st, 2009

There are 4 functions:

_ exec

_ system

_ passthru

_ shell_exec

They basically differ for the return. The best seems to be the last one, if you need the whole command output in your PHP script.

SSH Without Password

Wednesday, February 4th, 2009

Use Public/Private Keys for Authentication

Using encrypted keys for authentication offers two main benefits. Firstly, it is convenient as you no longer need to enter a password (unless you encrypt your keys with password protection) if you use public/private keys. Secondly, once public/private key pair authentication has been set up on the server, you can disable password authentication completely meaning that without an authorized key you can’t gain access - so no more password cracking attempts.

It’s a relatively simple process to create a public/private key pair and install them for use on your ssh server.

First, create a public/private key pair on the client that you will use to connect to the server (you will need to do this from each client machine from which you connect):

$ ssh-keygen -t rsa
This will create two files in your (hidden) ~/.ssh directory called id_rsa and id_rsa.pub. id_rsa is your private key and id_rsa.pub is your public key.

If you don’t want to still be asked for a password each time you connect, just press enter when asked for a password when creating the key pair. It is up to you to decide whether or not you should password encrypt your key when you create it. If you don’t password encrypt your key, then anyone gaining access to your local machine will automatically have ssh access to the remote server. Also, root on the local machine has access to your keys although one assumes that if you can’t trust root (or root is compromised) then you’re in real trouble. Encrypting the key adds additional security at the expense of eliminating the need for entering a password for the ssh server only to be replaced with entering a password for the use of the key.

Now set permissions on your private key:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/id_rsa
Copy the public key (id_rsa.pub) to the server and install it to the authorized_keys list:

$ cat id_rsa.pub >> ~/.ssh/authorized_keys
Note: once you’ve imported the public key, you can delete it from the server.

and finally set file permissions on the server:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys
The above permissions are required if StrictModes is set to yes in /etc/ssh/sshd_config (the default).

Now when you login to the server you won’t be prompted for a password (unless you entered a password when you created your key pair). By default, ssh will first try to authenticate using keys. If no keys are found or authentication fails, then ssh will fall back to conventional password authentication.

Once you’ve checked you can successfully login to the server using your public/private key pair, you can disable password authentication completely by adding the following setting to your /etc/ssh/sshd_config file:

# Disable password authentication forcing use of keys
PasswordAuthentication no

List Which Ports Are Listening - Open Ports - Open Connections

Saturday, January 31st, 2009

This is the most reliable method because is actually scans the ports:

nmap -sT -O localhost

Other methods based on internal checks that give you also the information about the program using the ports:

netstat -anp
lsof -i

Building Scalable Web Sites - Scalability

Tuesday, January 20th, 2009

A scalable system has got 3 simple characteristics:

  • The system can accommodate increased usage
  • The system can accomodate an increased dataset
  • The system is maintainable

It’s not about speed or complexity, those are other topics.
For exampel, this is a very scalable program:
sleep(1);
echo “CIAO”;

Scaling the Hardware Platform

  • Vertical Scaling
    • You have just one machine and you just upgrade it o change it to one more powerful (more RAM and CPU)
    • Problem: the cost per processor doesn’t grow linearly
    • Problem: at a certain point, the best machine in the world could not be enough for our application
  • Horizontal Scaling
    • You add some machines if the some type, quite cheap
    • the cost per processor is linear
    • Problem: more maintenance required
    • Problem: underutilization / i.e.: if we scale for more HD space, we will not use all the CPU power available.

Redundancy

  • Cold spare - to start working needs setup and configuration (software and/or physical)
  • Warm spare - to start working needs to be flipped on bacause all the configuration is done. A good example is a MySQL slave server that will be used as master when the master dies.
  • Hot spare - it starts working automatically because the system can detect a failure if a component fails. Problem: flapping

Scaling PHP

PHP is stateless then is scalable: every request is served by just one process that doesn’t need to talk to other processes. So the requests can be server randomly on many servers. The requests from a same user can be spread on many server.
That is true as long as you don’t use session that write data on a specific server, then the following request needs to be served by the same server. You can work around by:

  • storing session data on a centralized database
  • using cookie (encrypted if necessary)
  • using a msession on a centralizaed server

Otherwise, as we said, you can make sure the requests by the same user are served by the same server (sticky session…see below).

Load balancing

  • The cheap method for load balancing is creating more than one “A” record un the DNS zone but

    • it’s not very quick to add/remove a server because of the DNS cache. Somebody could still hit the faulty server
    • we can’t balance according to the load
  • Then we need an appliance to load balance: it could be either software or hardware.
    If our site supports sessions, our load balancer should support sticky sessions (or we can use any of the alternative methods mentioned above)
  • LB with HW
    • expensive
    • could be hard to set up (because of the old-fashioned interfaces)
    • good if you plan to leave it there for ages
  • LB with SW
    • cheaper
    • some solutions: Pearbal, Pound, LVS
  • LB Layer 4
    • very commonly used
    • at TCP layer: it can support sticky sessions
    • round robin algorythm
  • LB Layer 7
    • I can use HTTP headers to load balance, in particulat the request URI
    • Then I could load balance using an hash of the URL so the same files will be served by the same server (or poll of servers) so I can optimize the use of cache
  • Load Balancing for MySQL
    It would be difficult to use a regular load balancer server because that uses a HTTP-based protocol.
    What I can do is use a random shuffle in the method that connects to the databases. If a server is more powerful than the others, I can add it multiple times to my list of available server to shuffle.

Scaling MySQL

There are mainly two storage engines:

  • MyISAM
  • Full-text index type
  • table-level locking
  • then, not the best for concurrency
  • InnoDB
    • supports transaction & triggers
    • takes as much as three times the HD space compared to MyISAM
    • row-level locking
    • then better for concurrency

    MySQL Replication

    Good when there are much more reads than writes (the typical scenario in WebApps).
    The data is replicated between multiple machines
    See notes

    Database Partitioning

    See notes

    Scaling Storage

    • As your database grows we need more space.
    • It’s easy to get big hard disks; what we really need is some sort of failover at storage level to avoid needing expensive clones. Then we need to implement RAID.
    • For scaling, we can use federation
    • For sharing storage among different machines, the easiest way on Linux is NFS. NFS is designed to allow you access a remote filesystem in a way that it appears local. This is very powerful and flexible but is overkill for the file warehousing that is usually associated with web applications (i.e.: images related to a product on an ecommerce). We are talking about just file storage, then we need just simple operations such as write, read, delete files. We don’t need to append, change ownership, touch…
      Then NFS offers more than we need, that means overhead. NFS also keeps open a socket between every client and server, adding network overhead even when no work is taken place.
      For simply putting and delete files we can use FTP or SCP. For reading, we can use HTTP over our storage server. In this way, we can use standard Load Balancers, as well. For this purpose, we can even use HTTP servers specialized in delivering images or any light server (lighttpd).
      We can also use HTTP for our writes, using the PUT and DELETE methods. So we can do all our storage over HTTP.

    Scaling Logs

    When you have a cluster of webserver, the problem is that you have many log files instead of just one. Than, you should use one of these methods:

    • Google Analytics or a similar service
    • beacons - you can log just regular web pages - you can use a server just for this purpose
    • a software like spread
    • centralise the database for logs
    • use the load balancer (that is where all the traffic goes through)

    Linux: How To Count The Number Of Files Under a Directory

    Thursday, January 15th, 2009

    find . -type f | wc -l

    Linux - Cut Lines

    Saturday, January 3rd, 2009

    Very useful when you pipe it, for example, for easily reading a file containing very long quries
    cat queries.sql | cut -b -10

    MySQL - Importing Many SQL Dump Files in a Database

    Tuesday, December 16th, 2008

    ls *.sql | awk ‘{printf(”mysql -u [username] -p[password] [databasename] < %s\n”, $1) | “/bin/sh” }’