Category Archives: Solved

Handling Ceph near full OSDs

Running Ceph near full is a bad idea.   What you need to do is add more OSDs to recover.    However, during testing it will inevitably happen.  It can also happen if you have plenty of disk space, but the weights were wrong.  UPDATE: even better, calculate how much space you really need to run ceph safely ahead of time.  If you have to resort to handling near full OSDs, your assumptions about safe utilization are probably wrong.

Usually when OSDs are near full, you’ll notice that some are more full than others.   Here are the ways to fix it:

Decrease the weight of the OSD that’s too full.  That will cause data to be moved from it to OSDs that are less full.

ceph osd crush reweight osd.[x] [y]  

x is the OSD id, y is the new weight, be careful making big changes, usually even a small incremental change is sufficient

Temporarily decrease the weight of the OSD.  This is same as above except that the change is not permanent

ceph osd reweight [id] [weight]

id is the OSD# and weight is value from 0 to 1.0 (1.0 no change, 0.5 is 50% reduction in weight)

for example:
ceph osd reweight [14] [0.9]

 

Let Ceph reweight automatically

ceph osd reweight-by-utilization [percentage]

Reweights all the OSDs by reducing the weight of OSDs which are heavily overused. By default it will adjust the weights downward on OSDs which have 120% of the average utilization, but if you include threshold it will use that percentage instead

Slow VMWare performance iscsi tgt and ceph [Solved]

After a lot of head scratching and googling I finally discovered why my ceph performance was so slow compared to NFS when using iscsi tgs on my gateway.   I was getting only 0.1 MB/s compared to 90 MB/s that I was getting through NFS.  It turns out that ESXi had hardware acceleration (VAAI) turned on for it’s iSCSI initiator – apparently it’s something that isn’t compatible with tgt.  To turn it off I followed these steps

Turning off VAAI

I didn’t even have to reboot, or reload any configuration.  The effect was immediate jump in performance back to normal.

Adventures with TeslaCrypt (AKA “VVV Virus”)

I received an urgent call from a client.  The client’s computer got hit by the VVV virus AKA TeslaCrypt. All of their files were renamed to .vvv and when trying to open them, a message said that the only way to get them back was to pay $500 USD before a specific date, or $1000 USD after.  The payment had to be in bitcoins.  First question I asked: “Do you have a backup?” Client said “No”.  Having read up about ransomware like CryptoLocker, I knew this was  bad news.  Normally once the files are encrypted, it’s game over.   Only way out is backup.  Nevertheless, I dove into researching a possible solution – just in case.

There were 4 possible methods that I tried:

  • Restore from windows shadow copy. This didn’t work in this version of TeslaCrypt.  The criminals learned to wipe the shadow copies before encrypting the files.
  • Undelete files (someone claimed that when the files are encrypted a copy is made and original deleted). This didn’t work, I could not find a single deleted file that matched it’s encrypted counterpart.  All I found were files that were legitimately deleted by the user long before the infection.  This kind of makes sense, because surely the encryption process will overwrite the data rather than creating a copy.  But anything is worth a try when you are desperate.
  • Decrypt using key.dat (where the decryption key is stored). This didn’t work either because this was newer version of TeslaCrypt.  In the old version the criminals were stupid enough to leave the decryption key right on the infected system.   In the new version, they hold the key on a server they control.
  • Decrypt using a tool called teslacrack.py. This eventually ended up working – but more on that later.

Running out of time, and faced with the slim chance of recovery, my client felt he had no choice other than to pay the ransom.   The criminals are running  an organized business, and this gives it the impression that they may actually deliver the decryption key after payment in order to preserve their reputation.   After all, if people have bad experience after sending a payment, the word will spread and no one will ever send them any money no matter how desperate they are.  On the other hand if the payment and decryption are very quick and easy, people will see it as an easy way out. This, combined with how easy it is for the criminal to give out the decryption key (just a single click of a button) it is in the criminal’s best interest to deliver the decryption key promptly.  All signs pointed to a well organized setup.  The criminals even have even had  a working web support and billing system where you can get answers to your questions regarding on how to submit your Bitcoin payment.  The response time to a general question ended up being about 4 hours – which is not bad.   Actually better than support at many large enterprises.  Still, there are so many things that could go wrong.  Maybe the criminals are just plain evil and see this as a onetime money grab.  Maybe they could not care less about their reputation.  Maybe the criminals get shut down just before having the chance to send you the key.  Maybe the criminals screw up and lose the key, so even though they might have wanted to give it to you, they simply don’t have it.  Maybe the criminals are stupid and don’t use business sense for deciding how to behave.  You are doomed if you send payment, and you are doomed if you don’t.  I was glad that it wasn’t me in this situation.  I would have no idea what to do next.

My client on the other hand had to make that decision – and quickly.  He decided to pay.  Trouble was that the only accepted method of payment was through bit Bitcoin (BTC).  Because he did not have $500 USD in BTC, he had to buy Bitcoins first.  Even though it’s becoming easier and easier to buy Bitcoin, it’s still quite a process.  You either have to go through 2-5 day verification process or you have to fully trust the Bitcoin vendor.   Neither of those options was good because we were running out of time  and we were not in the mood to blindly “trust people”.  In the end we were lucky because now there is a Bitcoin ATM at Café Blanca in downtown Calgary.  There you can buy and sell Bitcoins with cash instantly.  You insert cash and the bitcoins appear in your Bitcoin wallet 2 seconds later. My client ended up gradually putting in about $700 CAD in cash to purchase 1.1 Bitcoins which the criminals demanded in order to cover the $500 USD fee.  With 1.1 bitcoins ready, we went back to the criminal’s web payment portal.  Just as we were reading the payment instructions for the 3rd time to be extra sure there is no mistake, we noticed that the criminals raised the cost to 1.25 bitcoins.  We had to top up the bitcoin wallet and proceeded to make the payment.  We sent the payment and included the transaction ID which proved receipt by the criminals.  Then we waited.  10 minutes, nothing.   1 Hour – nothing.  5 Hours – nothing.  1 day – nothing.  That was it.  The criminals got the money and we’ll never the key.  The files were still encrypted and useless.

During all the waiting, I resumed trying my hand at decryption.  To my amazement teslacrack.py method actually worked.  Not only it worked, but once I figured out the right tools and the right steps,  the actual computation time to crack the key was only 30 seconds.  This was amazing because normally these methods take weeks if you are extremely lucky.   It turns out that in addition to a mountain worth of luck on my side, the criminals made a mistake in the way they encrypted the files.  If the criminals didn’t make that mistake, no amount of luck would have helped.   Cracking  the key would take millions of years.  It is also interesting to note that the criminals encrypted different sets of files using 2 different keys meaning that even if they let  you decrypt one set of files, the other set of files would still be encrypted (presumably to force you to pay an additional fee).   Furthermore the key for one set of files was cracked in 30 seconds, the second key took more than 5 days with no success.  Luckily, the first set contained the critical files, and the client was not too concerned about the second set, so we stopped there.  By the way, eventually the link that took us to the criminal’s billing/support site went blank.  We’re never going to see that key or the $700+ CAD.

 Lessons reinforced:

  • Backup, backup, backup, backup.  There is not going to be a second chance like this again.  A mistake like this is not going to show up again in their next version of ransomware.  In fact there are other variants of ransomware that are already impossible to crack.  Did I mention backup?
  • When dealing with criminals never expect a favorable outcome. If you’re sending any money for ransom, consider it lost the moment you hit send.  What’s more, be prepared for escalation.  Once the criminals know you are willing to send money they may come back at you with asking for more.  They will either just ask for more even though you already send exactly what they asked for, or you may find out that after part of your files are decrypted, the other part is still encrypted.

How to recover:

Step 1) Take backup and work from the backup

  • This is important because even though your files are encrypted and apparently useless, your day will get considerably worse if you lose them somehow. What if the ransomware detects you are trying to decrypt and deletes everything right now and then?

Step 2) Install programs:

  • Python 2.7 (32 bit because it must match library below)
  • Pycrypto-2.6.win32-py2.7 library (I ended up using this because I had trouble compiling the library from source)
  • Msieve150_win32 (I also tried optimized versions for Intel processors and CUDA, but reading about possible bugs didn’t give me confidence, so try them, and if they work then fine, but don’t forget to have version 150 as your fail safe)

Step 3) Identify the AES key that you will try to crack

  • Place some vvv files in the same folder as teslacrack.py and run teslacrack.py
  • You will get something like this:

 

Cannot decrypt ./IMG_1111.JPG.vvv, unknown key

Software has encountered the following unknown AES keys, please crack them first using msieve: 

346FA15D6F7106A05553587E67AD068EBF0CE65C9ECBA74BAE144661AB502CEFFEBCFA9FBB3CDFD9E4043B3402F970051E55063D96C94AB66B443A0F9D088A23 found in ./IMG_1111.JPG.vvv 

Alternatively, you can crack the following Bitcoin key(s) using msieve, and use them with TeslaDecoder: 

E82E090D9A73DC4E93201BC56394544493EFD0DD2631F588C4083F006C1CD419F096F1D6E646AA0DE8D0230CB18D009B231DEA6EF7CAFED03C6C53830E51074A found in ./IMG_1111.JPG.vvv

 

Step 4) Crack the AES key

Run msieve with the AES key found in step 3 (notice 0x in front):

msieve -v -e 0x346FA15D6F7106A05553587E67AD068EBF0CE65C9ECBA74BAE144661AB502CEFFEBCFA9FBB3CDFD9E4043B3402F970051E55063D96C94AB66B443A0F9D088A23

You should see something like this

random seeds: 4e305a00 6fb1837c

factoring 2746299090781689070444389534863512001481868057180589068197106350690661

98749363149686207988485815608295042086154388677331498764717631003373547132362899

7155 (154 digits)

searching for 15-digit factors

P-1 stage 1 factor found

searching for 20-digit factors

P-1 stage 2 factor found

searching for 25-digit factors

P-1 stage 2 factor found

commencing quadratic sieve (33-digit input)

using multiplier of 3

using VC8 32kb sieve core

sieve interval: 4 blocks of size 32768

processing polynomials in batches of 51

using a sieve bound of 4909 (341 primes)

using large prime bound of 196360 (17 bits)

polynomial 'A' values have 4 factors



sieving in progress (press Ctrl-C to pause)

696 relations (302 full + 394 combined from 2196 partial), need 437

696 relations (302 full + 394 combined from 2196 partial), need 437

sieving complete, commencing postprocessing

begin with 2498 relations

reduce to 1007 relations in 2 passes

attempting to read 1007 relations

recovered 1007 relations

recovered 24 polynomials

attempting to build 696 cycles

found 696 cycles in 1 passes

distribution of cycle lengths:

   length 1 : 302

   length 2 : 394

largest cycle: 2 relations

matrix is 341 x 696 (0.1 MB) with weight 11065 (15.90/col)

sparse part has weight 11065 (15.90/col)

filtering completed in 1 passes

matrix is 341 x 405 (0.0 MB) with weight 5168 (12.76/col)

sparse part has weight 5168 (12.76/col)

commencing Lanczos iteration

memory use: 0.0 MB

lanczos halted after 7 iterations (dim = 330)

recovered 63 nontrivial dependencies

commencing quadratic sieve (69-digit input)

using multiplier of 23

using VC8 32kb sieve core

sieve interval: 12 blocks of size 32768

processing polynomials in batches of 17

using a sieve bound of 209771 (9278 primes)

using large prime bound of 19508703 (24 bits)

using trial factoring cutoff of 24 bits

polynomial 'A' values have 9 factors



sieving in progress (press Ctrl-C to pause)

9427 relations (4473 full + 4954 combined from 52519 partial), need 9374

9427 relations (4473 full + 4954 combined from 52519 partial), need 9374

sieving complete, commencing postprocessing

begin with 56992 relations

reduce to 13757 relations in 2 passes

attempting to read 13757 relations

recovered 13757 relations

recovered 11859 polynomials

attempting to build 9427 cycles

found 9427 cycles in 1 passes

distribution of cycle lengths:

   length 1 : 4473

   length 2 : 4954

largest cycle: 2 relations

matrix is 9278 x 9427 (1.3 MB) with weight 274633 (29.13/col)

sparse part has weight 274633 (29.13/col)

filtering completed in 3 passes

matrix is 8448 x 8512 (1.2 MB) with weight 244661 (28.74/col)

sparse part has weight 244661 (28.74/col)

commencing Lanczos iteration

memory use: 1.2 MB

lanczos halted after 135 iterations (dim = 8443)

recovered 61 nontrivial dependencies

p1 factor: 3

p1 factor: 5

p6 factor: 418819

p8 factor: 10304417

prp13 factor: 8162073202471

prp14 factor: 84794311049579

prp19 factor: 3135407003350317697

prp25 factor: 2560807722929541167424011

prp26 factor: 19683723106610479028057093

prp45 factor: 387847886921773814156469727175786645600806381

elapsed time 00:00:37 

 

Depending how lucky you are, this process will run for anywhere from minutes to weeks.

Step 5)  Unfactor based on prime factors from step 4

 

  • take the 10 factors at the very end of the file and feed them into unfactor-ecdsa.py.   If any factors are listed multiple times, repeat them also.
unfactor-ecdsa.py ./IMG_1111.JPG.vvv 3 5 418819 10304417 8162073202471 84794311049579 3135407003350317697 2560807722929541167424011 19683723106610479028057093 387847886921773814156469727175786645600806381

Found AES private key: b'\x6d\xb8\x64\x76\x72\x31\xc4\xff\xfc\x22\x48\x20\xa5\xbc\xcd\x6c\x4c\x30\x2f\xc3\x4d\xd6\xfa\x23\x4b\x4b\x9e\x0c\x1d\xaf\xec\x07' (6DB864767231C4FFFC224820A5BCCD6C4C302FC34DD6FA234B4B9E0C1DAFEC07)

 

Step 6) Use AES private key to decrypt all your files

  • Edit teslacrack.py and add AES private key to the list of keys at the beginning of the file
  • Run teslacrack.py C:\
  • It will decrypt every vvv file that was encrypted by this specific key

 

Step 7) Backup, backup, backup.  Next time this will NOT work.

 

 

 

 

 

 

 

MS SQL 1807 Error – Solved

While trying to create a new database in MS SQL , I kept getting a 1807 error, complaining that MS SQL can’t obtain exclusive lock on database ‘model’

Create failed for Database ‘x’. (Microsoft.SqlServer.Smo)

ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

Could not obtain exclusive lock on database ‘model’. Retry the operation later.

CREATE DATABASE failed. Some file names listed could not be created. Check related errors. (Microsoft SQL Server, Error: 1807)

 

To solve this:

  • run sp_who2 query
  • find out which process holds a lock on ‘model’ database
  • kill that process using :

KILL $SPID
GO

Notes regarding killing processes:

  • Be careful killing the following types of processes AWAITING COMMAND, CHECKPOINT SLEEP, LAZY WRITER, LOCK MONITOR, SIGNAL HANDLER, it’s probably a bad idea killing them forcefully
  • Instead of killing processes it sometimes helps to restart the following services / applications
    • SQL Management Studio
    • Appassure Agent service
    • SQL Server VSS Writer service

Asterisk – Seamless dialing of remote extension through DTMF

Problem Description:

There are two offices.  Office A runs Asterisk / FreePBX while office B runs a closed system with auto attendant.

The guys at office A would like to be able to dial office B extensions as if they were local.

 

Solution Overview:

Program the office A extensions in this way:

  1. Local extension picks up
  2. Remote office number is called
  3. When the remote office picks up
  4. DTMF key presses are sent to select the right extension
  5. Call is connected

Solution Details:

First I attempted to program this into freePBX through the GUI, but I wasn’t having any luck because the default macros were not letting me craft the dial command in such a way that it sends the key presses after the call is placed.   Although it would have been nice to have everything in the GUI, the FreePBX GUI method seems to be a dead end.  I ended up relying on good old /etc/asterisk/extensions_custom.conf configuration file, and I just created my own extensions there.

[ext-local]
exten => 102,1,Dial(SIP/v-outbound/4031112222,30,rD(ww11))
exten => 103,1,Dial(SIP/v-outbound/4032223333,30,rD(ww12))

[ext-local] sets the right context so that these extensions are picked up as if they were local.  You could also put these into other contexts like [ivr-1] etc.

D tells the Dial command to send DTMF button presses after the remote end picks up

w tells the Dial command to wait 0.5 seconds

 

Quick and easy iptables based proxy

Today was a busy day dealing with power outage that affected 2100 businesses in downtown Calgary. Of course, couple of my clients were in the zone that went dark. I offered them to run their key infrastructure from my place for couple of days. Everything went great, except I have only 1 IP address on my connection. That’s not good when both clients want to come in on port 443. What to do?

Call up my ISP and order another IP? Nope: Takes too long, too expensive, I just need this temporarily. Also, ISP might mess it up and take me offline for a while.

Get VM with IPv4 IP and proxy the traffic over? Yes, but why go with something heavy handed like nginx?

I prefer this elegant solution brought to you by iptables:


# echo 1 >| /proc/sys/net/ipv4/ip_forward
# iptables -t nat -A PREROUTING -p tcp -d $IP_OF_VM --dport 443 -j DNAT --to $IP_WHERE_IM_FORWARDING_TO:8443
# iptables -t nat -A POSTROUTING -j MASQUERADE

[Solved] Linux PPTP client NATed behind pfsense firewall

When migrating my PPTP client configuration from an older Linux server to a new one, I could not get a PPTP tunnel up and running on the new server.   I kept getting this error flow:


using channel 15
Using interface ppp0
Connect: ppp0 <--> /dev/pts/1
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xxxxx6a93> <pcomp> <accomp>]
Script pptp vpn.xxxxxxxx.com --nolaunchpppd finished (pid 23704), status = 0x0
Modem hangup

So I was sending, but getting nothing back.

I tripple checked my configuration, and tweaked a few settings.  No luck.  Then I stumbled on an article that talked about the challenges of PPTP behind NAT devices.    I already knew about the common issue of not being able to dial out with more than one client session to a remote PPTP server.  For that reason I was careful not to have  more than one open at the same time,  but I thought I’d dig a bit deeper to see if NAT was the culprit.

Long story short, I noticed that pfsense -> diagnostics -> pftop was showing a GRE state from old server to the destination VPN server.  It showed age of 3+ hours (forgot the exact number) even though I was sure that the PPTP session on the old server was shut down.   I reset the firewall state on pfsense, and it started to work immediately.

The moral of the story is that pfsense likes to keep the GRE state open for hours after it’s been disconnected.   That is a problem.   Packets go out, but they are NATed to the wrong server when they come back.

Version details:

Pfsense: 2.1.4-RELEASE (i386)
PPTP: 1.7.2
Linux: Ubuntu 14.04.1 LTS