一亩三分地论坛

 找回密码
 获取更多干货,去instant注册!

扫码关注一亩三分地公众号
查看: 1122|回复: 2
收起左侧

[字符串] Google Challenge重复删除字符串

[复制链接] |试试Instant~ |关注本帖
qetu133 发表于 2016-5-17 03:16:30 | 显示全部楼层 |阅读模式

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?获取更多干货,去instant注册!

x
本帖最后由 qetu133 于 2016-5-17 09:14 编辑

我遇到个题目,想了很久都不知道怎么做好点,这个题目就是直接匹配是绝对会超时的,不知道用什么样的方法才能解出来。Your spy, Beta Rabbit, has managed to infiltrate a lab of mad scientists who are turning rabbits into zombies. He sends a text transmission to you, but it is intercepted by a pirate, who jumbles the message by repeatedly inserting the same word into the text some number of times. At each step, he might have inserted the word anywhere, including at the beginning or end, or even into a copy of the word he inserted in a previous step. By offering the pirate a dubloon, you get him to tell you what that word was. A few bottles of rum later, he also tells you that the original text was the shortest possible string formed by repeated removals of that word, and that the text was actually the lexicographically earliest string from all the possible shortest candidates. Using this information, can you work out what message your spy originally sent?
If the final chunk of text was "lolol," and the inserted word was "lol," the shortest possible strings are "ol" (remove "lol" from the beginning) and "lo" (remove "lol" from the end). The original text therefore must have been "lo," the lexicographically earliest string.

Write a function called answer(chunk, word) that returns the shortest, lexicographically earliest string that can be formed by removing occurrences of word from chunk. Keep in mind that the occurrences may be nested, and that removing one occurrence might result in another. For example, removing "ab" from "aabb" results in another "ab" that was not originally present. Also keep in mind that your spy's original message might have been an empty string.

chunk and word will only consist of lowercase letters [a-z].
chunk will have no more than 20 characters.
word will have at least one character, and no more than the number of characters in chunk.

Test cases
==========

Inputs:
   (string) chunk = "lololololo"
   (string) word = "lol"
Output:
   (string) "looo"

Inputs:
   (string) chunk = "goodgooogoogfogoood"
   (string) word = "goo"
Output:
   (string) “dogfood"

我用python写了个 answer 函数,这个函数可以应对这样的问题,但是效率太低了,我实在不知道怎么样才能提高它的效率了。至于为什么超时就是我注解的那两行,一行是O(n)的复杂度,n代表加密后的字符串长度,另一行一行是递归调用,换句话说复杂度=O(插入次数*n)。
def answer(chunk, word):
   if len(chunk) < len(word):
       return chunk
   indexs = []
   #找到符合word的index保存(可以改成KMP算法,依然超时)
   for i in range(len(chunk)-len(word)+1):
       if chunk[i:i+len(word)] == word:
           indexs.append(i)
   ans = chunk
   for i in indexs:
        #逐个index查找最短的原始信息
       tmp = answer(chunk[:i]+chunk[i+len(word):],word)
       if len(tmp) < len(ans) or len(tmp) == len(ans) and tmp < ans:
           ans = tmp
   return ans

print(answer('6bb6b67b7b67b67bbc' ,'b67b'))
print(answer("lololololo","lol"))
print(answer('goodgooogoogfogoood','goo'))
print(answer('0d960750d960d9607500750d960750002a','0d960750'))
=================自动验证脚本==================================
  1. import uuid
  2. import random

  3. def answer(chunk, word):
  4.     # your code here
  5.     if len(chunk) < len(word):
  6.         return chunk
  7.     indexs = []
  8.     for i in range(len(chunk)-len(word)+1):
  9.         if chunk[i:i+len(word)] == word:
  10.             indexs.append(i)
  11.     ans = chunk
  12.     for i in indexs:
  13.         tmp = answer(chunk[:i]+chunk[i+len(word):],word)
  14.         if len(tmp) < len(ans) or len(tmp) == len(ans) and tmp < ans:
  15.             ans = tmp
  16.     return ans

  17. def my_random_string(string_length=10):
  18.     """Returns a random string of length string_length."""
  19.     rand = str(uuid.uuid4()) # Convert UUID format to a Python string.
  20.     rand = rand.replace("-","") # Remove the UUID '-'.
  21.     return rand[0:string_length] # Return the random string.

  22. while 1:
  23.     orig_str = my_random_string(random.randrange(3,5))
  24.     noise_str = my_random_string(random.randrange(2,21))
  25.     if orig_str in noise_str or noise_str in orig_str:
  26.         continue
  27.     insert_num = random.randrange(2,5)
  28.     output_str = orig_str
  29.     for i in xrange(insert_num):
  30.         rand_pos = random.randrange(0,len(output_str))
  31.         output_str = output_str[:rand_pos]+noise_str+output_str[rand_pos:]
  32.     if orig_str != answer(output_str, noise_str):
  33.         print(output_str+" "+orig_str+","+noise_str)
  34.         raw_input()
复制代码

stellari 发表于 2016-5-17 06:21:44 | 显示全部楼层
这题最重要的描述部分你没有贴全吧?
回复 支持 反对

使用道具 举报

 楼主| qetu133 发表于 2016-5-17 08:13:35 来自手机 | 显示全部楼层
stellari 发表于 2016-5-17 06:21
这题最重要的描述部分你没有贴全吧?

这就是最重要的部分啦,就是递归删除子串呢。前面讲的是什么兔博士要解密什么的,然后原文信息被插入很多的杂信息,现在已知加入的pattern,然后寻找原来的最短的信息。
回复 支持 反对

使用道具 举报

本版积分规则

请点这里访问我们的新网站:一亩三分地Instant.

Instant搜索更强大,不扣积分,内容组织的更好更整洁!目前仍在beta版本,努力完善中!反馈请点这里

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明 ( 沪ICP备11015994号 )

custom counter

GMT+8, 2016-12-11 16:32

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表