System design

  • Tiny URL: how to hash the key.

MD5: https://mkyong.com/java/java-md5-hashing-example/ Random String: Easy Hash:

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;

public class URLHashing {

    private static final Charset UTF_8 = StandardCharsets.UTF_8;
    private static final String OUTPUT_FORMAT = "%-20s:%s";
    public static void main(String[] args)
    {

        String pText = "Hello MD5";
        System.out.println(String.format(OUTPUT_FORMAT, "Input (string)", pText));
        System.out.println(String.format(OUTPUT_FORMAT, "Input (length)", pText.length()));

        byte[] md5InBytes = getMD5Hash(pText);
        System.out.println(String.format(OUTPUT_FORMAT, "MD5 (hex) ", convertToHex(md5InBytes)));
        // fixed length, 16 bytes, 128 bits.
        System.out.println(String.format(OUTPUT_FORMAT, "MD5 (hex length)", convertToHex(md5InBytes).length()));

        System.out.println(String.format("MD5 (base64) " + convertToBase64(md5InBytes)));
        System.out.println(String.format("MD5 (base 64 length)" +  convertToBase64(md5InBytes).length()));

    }

    public static byte[] getMD5Hash(String path)
    {
        MessageDigest md ;
        try{
            md = MessageDigest.getInstance("MD5");
        }
        catch(final NoSuchAlgorithmException ex)
        {   
            throw new IllegalArgumentException(ex);
        }
        byte[] bytes = md.digest(path.getBytes(UTF_8));
        return bytes;
    }

    public static String convertToHex(byte[] bytes)
    {
        // convert to hex
        StringBuilder sb = new StringBuilder();
        for(byte b: bytes)
        {
            //%x is a format specifier that format and output the hex value. 
            //If you are providing int or long value, it will convert it to hex value. 
            //%02x means if your provided value is less than two digits then 0 will be prepended. 
            //You provided value 16843009 and it has been converted to 1010101 which a hex value.
            sb.append(String.format("%02x", b));
        }
        return sb.toString();
    }

    public static String convertToBase64(byte[] bytes)
    {
        String base64 = Base64.getEncoder().encodeToString(bytes);
        return base64;
    }
}

不过题目实际上简化了条件,新建URL的条数是100条/天,但是是heavily read,1m/s。所以这一轮重点就放在database,cache策略上.

Estimation:

Traffic:

Write: 100 URLs/day --> QPS: 100/24/3600

Read: 1m/s -> QPS: 1m/s

Decide how long is the shortURL length (base64)

1 year can generate 100*365 = 36,500

6 chars -- > 64^6 = 68.7 billion --> exhausted after 68.7billon/36500 years (enough)

Storage:

Record Size: 6 chars + 2083 chars+ 4chars ~ 2000 Bytes

5 year : 36,500*5* 2000bytes = 365,000,000 bytes = 365MB

BandWidth:

2000bytes* 1m/s = 2000m bytes = 2GB/s

Cache:

8:2 per day

1m/s * 3600*24h* 2000 bytes* 20% = 34560 GB

URL shortener. Wayfair 有一个media team 要发twitter, 所以wayfair 要一个URL shortener service 把long URL 变成 short URL 放在tweet 里。要求 1)create short URL 2) assess short URL。 有一些比较常规的url shortener 的相关问题,比如short key length, 需要多少server, 多少个DB 也问了一些很细节/我也不知道怎么答的问题,说给大家想想吧。比如

  • http status code的range 和每个range 的定义,

  • NoSQL 可不可以在不是key 的feild 上create index,:

To speed up queries on non-key attributes, you can create a global secondary index in DynamoDB. A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table. It doesn't even need to have the same key schema as a table.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

  • API gateway 里可不可以cache API的JSON response

Caching in REST APIs: https://restfulapi.net/caching/

Being cacheable is one of the architectural constraints of REST.

  • GET requests should be cachable by default – until a special condition arises. Usually, browsers treat all GET requests as cacheable.

  • POST requests are not cacheable by default but can be made cacheable if either an Expires header or a Cache-Control header with a directive, to explicitly allows caching, is added to the response.

  • Responses to PUT and DELETE requests are not cacheable at all.

设计 shortUrl。同上,该面试官也是非常熟悉题目的,你走一步,他问一个,不会让你把high level 的设计解释完,逮着一个知识点,就会深挖。这里给大家一些经验,面对喜欢十万个为什么的,直接给最简洁的答案,千万不要去尝试深度分析,除非他 问,不然时间是不够的。我光解释怎么把longUrl hash成 shortUrl就被他整了10分钟,三种方法的利弊都给他介绍了。最后是说完了,但是没有时间给他 深入其他的扩展问题了,所以估计这轮也是减分的.

tiny url 还问了analytics怎么实现:

i.e.

  • Total click how many

  • Unique click how many

  • How long your URL click history is kept

  • Map where is your user click from

  • popularity by time of day, day of week

  • OS , Broswer and Device Type

-----------------------------------------------------------------------------------------

题目是做一个类似shazam的系统,用户可以上传音乐片段10-30秒来判断是什么音乐。后台早就预存了很多音乐的signature。和面试官探讨了哪些需要用db,是否需要shard,是否需要load balancing,是否需要考虑region,是否需要cache等。我在计‍‍‌‌‍‌‍‌‍‌‍‌‍‌‌‍‌‌‌‍算存储和带宽上花了不少时间。中间也有一个点我始终get不到,可能我沟通还是缺点火候吧

--------------------------------------------------------------------------------------

设计地点‍‍‌‌‌‌‌‍‌‌‍‌‌‌‍‍‌‍‌‌打卡 参考YELP打卡

Last updated