一亩三分地

 找回密码 注册账号

扫描二维码登录本站

BBS
Offer多多
Salarytics
交友
Learn
Who's Hiring?
疫情动态
指尖新闻
Instant
客户端
微信公众号
扫码关注公众号
留学申请公众号
扫码关注留学申请公众号
Youtube频道
留学博客
关于我们
查看: 648|回复: 3
收起左侧

[学Python/Perl] coursera上的一题,求大神指出错误

[复制链接] |只看干货
我的人缘0

升级   21.5%


分享帖子到朋友圈
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (18)
 
 
0% (0)    👎

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?注册账号

x
麻烦请管理员忽略上一个帖子,不好意思不好意思不好意思,我重新编辑了一下再请求发布。

是UM的Python Functions, Files, and Dictionaries这门课的结课作业,最后一步实在搞不定了才来地里求大神指出错误。如果不合适我删帖...
绞尽脑汁想了哪不对,也是改了N遍才得到现在的代码,但还是有问题😢
谢谢大家🙏

题目如下:
We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file named project_twitter_data.csv which has the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet. We have also words that express positive sentiment and negative sentiment, in the files positive_words.txt and negative_words.txt.

Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. You will create a csv file, which contains columns for the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score for each tweet. At the end, you upload the csv file to Excel or Google Sheets, and produce a graph of the Net Score vs Number of Retweets.

To start, define a function called strip_punctuation which takes one parameter, a string which represents a word, and removes characters considered punctuation from everywhere in the word. (Hint: remember the .replace() method for strings.)

Next, copy in your strip_punctuation function and define a function called get_pos which takes one parameter, a string which represents one or more sentences, and calculates how many words in the string are considered positive words. Use the list, positive_words to determine what words will count as positive. The function should return a positive integer - how many occurrences there are of positive words in the text. Note that all of the words in positive_words are lower cased, so you’ll need to convert all the words in the input string to lower case as well.

Next, copy in your strip_punctuation function and define a function called get_neg which takes one parameter, a string which represents one or more sentences, and calculates how many words in the string are considered negative words. Use the list, negative_words to determine what words will count as negative. The function should return a positive integer - how many occurrences there are of negative words in the text. Note that all of the words in negative_words are lower cased, so you’ll need to convert all the words in the input string to lower case as well.

卡在最后一步:
Finally, copy in your previous functions and write code that opens the file project_twitter_data.csv which has the fake generated twitter data (the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet). Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. Copy the code from the code windows above, and put that in the top of this code window. Now, you will write code to create a csv file called resulting_data.csv, which contains the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score (how positive or negative the text is overall) for each tweet. The file should have those headers in that order. Remember that there is another component to this project. You will upload the csv file to Excel or Google Sheets and produce a graph of the Net Score vs Number of Retweets. Check coursera for that portion of the assignment, if you’re accessing this textbook from Coursera.

下一段是原题给出的代码:

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
# lists of words to use
positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())


negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())

我写的代码如下:
def strip_punctuation(s):
    for i in s:
        if i in punctuation_chars:
            s = s.replace(i, " ")
    return s            

def get_pos(s):
    s_lower = s.lower()
    s_lower_new = strip_punctuation(s_lower)
    s_list = s_lower_new.split(" ")
    num = 0
    for i in s_list:
        if i in positive_words:
            num = num + 1
        else:
            num = num
    return num

def get_neg(s):
    s_lower = s.lower()
    s_lower_new = strip_punctuation(s_lower)
    s_list = s_lower_new.split(" ")
    num = 0
    for i in s_list:
        if i in negative_words:
            num = num + 1
        else:
            num = num
    return num

------------------------以上都是对的,下面不知道哪里错了----------------------------------------

resultfile = open("resulting_data.csv", "w")
resultfile.write('Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score')
resultfile.write('\n')


csv_file = open('project_twitter_data.csv', "r")
lines = csv_file.readlines()
for line in lines:   
    line_list = line.strip().split(",")
    resultfile.write('{},{},{},{},{}'.format(line_list[1], line_list[2], get_pos(line_list[0]), get_neg(line_list[0]), get_pos(line_list[0]) - get_neg(line_list[0])))
    resultfile.write("\n")

csv_file.close()
resultfile.close()


得到的结果下:
result
Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score
retweet_count,reply_count,0,0,0 (不明白这是哪里来的)
3,0,0,0,0
1,0,2,2,0
1,2,1,0,1
3,1,1,0,1
6,0,2,0,2
9,5,2,0,2
19,0,2,0,2
0,0,0,3,-3
0,0,0,2,-2
82,2,4,0,4
0,0,0,1,-1
0,0,1,0,1
47,0,2,0,2
2,1,1,0,1
0,2,1,0,1
0,0,2,1,1
4,6,3,0,3
19,0,3,1,2
0,0,1,1,0

Result        Actual Value        Expected Value        Notes
Pass        'Numbe...core\n'        'Numbe...core\n'        checking that the headers are set correctly.
Fail   '9'  '19'   checking that the value for a particular cell matches.
Fail   '2'   '-3'   checking that the value of the net score is correct for a particular cell.
Fail   21  20    checking that the file has the correct number of rows.
Pass  5    5     checking that the file has the correct number of columns.


评分

参与人数 1大米 +6 收起 理由
14417335 + 6

查看全部评分


上一篇:Coding也是有颜值的,刷题的时候保持颜值在线很重要
下一篇:Python 他人+自制 cheat sheet分享 求大米看面经
我的人缘0

升级   21.5%

 楼主| flywei0228LRZG 2020-8-8 04:26:19 | 显示全部楼层
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (18)
 
 
0% (0)    👎
自己顶一下~  😂
回复

使用道具 举报

我的人缘0

升级   21.5%

 楼主| flywei0228LRZG 2020-8-8 04:27:09 | 显示全部楼层
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (18)
 
 
0% (0)    👎
这题完成不了,拿不到证书💔    请大神高抬贵手帮帮忙🙏
回复

使用道具 举报

我的人缘0

升级   21.5%

 楼主| flywei0228LRZG 2020-8-8 11:17:22 | 显示全部楼层
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (18)
 
 
0% (0)    👎

不知道有GitHub可以找答案,已经解决了,打扰大家了~

错误在两处:(get_pos(line_list[0]) - get_neg(line_list[0]),我少了括号。另一个是没有用headerdotuse = lines.pop(0)  
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册账号

本版积分规则

隐私提醒:
■为防止被骚扰甚至人肉,不要公开留微信等联系方式,请以论坛私信方式发送。
■特定版块可以超级匿名:https://pay.1point3acres.com/tools/thread
■其他版块匿名方法:http://www.1point3acres.com/bbs/thread-405991-1-1.html

手机版|||一亩三分地

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

Some icons made by Freepik from flaticon.com

快速回复 返回顶部 返回列表