97精品人妻系列无码人妻,丁香色婷婷国产精品视频

首頁

後端開發(fā)

Python教學

【Python】B站影片評論與彈幕處理分析腳本

Barbara Streisand

Jan 05, 2025 pm 07:54 PM

[Python] A Script for Processing and Analysing Bilibili Video Comments and Bullet Chats

免責聲明：僅供個人學習和研究之用。嚴禁用於其他用途。

介紹

該腳本是為人文學科的學術目的而開發(fā)的：具體而言，用於網(wǎng)路平臺話語分析的研究。它可以對B站彈幕和評論進行全面研究。重點是涉及次文化和社會問題的大量內容（根據(jù)查閱的材料），需要深入調查、分析、補充和總結。

鑑於內容廣泛，結果顯示在連結中：

次文化視野下的評論與彈幕研究：
https://nbviewer.org/github/Excalibra/scripts/blob/main/d-ipynb/Subculture Perspective Review and Bullet Screen Research.ipynb

計劃完成「次文化」和「社會問題」部分的研究後再公開。不過，考慮到該領域研究人員和學生的需求，現(xiàn)在已經(jīng)分享了。

特點與原理

腳本特點：

收集影片標題、作者、發(fā)布日期、觀看次數(shù)、收藏、分享、累積彈幕、評論次數(shù)、影片描述、類別、影片連結和封面圖片連結等資料。
擷取 100 條彈幕聊天，包含情緒評分、詞性分析、時間戳記和使用者 ID。
檢索 20 則熱門評論，以及按讚數(shù)、情緒分數(shù)、主題回覆、會員 ID、姓名和評論時間戳。

增強功能：

彈幕聊天：使用者名稱、生日、註冊日期、追蹤者數(shù)量和追蹤數(shù)量（使用 cookie）。
評論：顯示評論者的 IP 位置（透過網(wǎng)路介面）。
將資料輸出到 Excel 文件，其中包含情緒中位數(shù)、詞頻統(tǒng)計、詞雲(yún)和長條圖。

工作原理：

透過API取得JSON訊息，處理成Excel文件，利用SnowNLP、ThuNLP、Jieba等語言模型進行文字分詞、停用詞過濾、詞性分析、詞頻統(tǒng)計等。 Matplotlib 用於產生圖表。

快速入門

（Windows使用者可以使用pip和python。Mac使用者預設使用pip3和python3。）

腳本原始碼：GitHub 儲存庫。

必備庫：
安裝所需的庫：

pip3 install --no-cache-dir -r https://ghproxy.com/https://github.com/Excalibra/scripts/blob/main/d-txt/requirements.txt

然後執(zhí)行腳本（線上）：

python3 -c "$(curl -fsSL https://ghproxy.com/https://github.com/Excalibra/scripts/blob/main/d-python/get_bv_baseinfo.py)"

import json
import time
import requests
import os
from datetime import datetime
import re
from bs4 import BeautifulSoup
from openpyxl import Workbook
from openpyxl.styles import Alignment, Font
from snownlp import SnowNLP
import statistics
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import platform
import thulac
import matplotlib.font_manager as fm
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By


'''''''''

# Reference Links

## General

Regex: https://regex101.com/
Zhihu - Two ways to obtain Bilibili video bullet comments using Python: https://zhuanlan.zhihu.com/p/609154366
Juejin - Parsing Bilibili video bullet comments: https://juejin.cn/post/7137928570080329741
CSDN - Bilibili historical bullet comment crawler: https://blog.csdn.net/sinat_18665801/article/details/104519838
CSDN - How to write a Bilibili bullet comment crawler: https://blog.csdn.net/bigbigsman/article/details/78639053?utm_source=app
Bilibili - Bilibili bullet comment notes: https://www.bilibili.com/read/cv5187469/
Bilibili third-party API: https://www.bookstack.cn/read/BilibiliAPIDocs/README.md

## Reverse Lookup by UID

https://github.com/esterTion/BiliBili_crc2mid
https://github.com/cwuom/GetDanmuSender/blob/main/main.py
https://github.com/Aruelius/crc32-crack

## User Basic Information

https://api.bilibili.com/x/space/acc/info?mid=298220126
https://github.com/ria-klee/bilibili-uid
https://github.com/SocialSisterYi/bilibili-API-collect/blob/master/docs/user/space.md

## Comments

https://www.bilibili.com/read/cv10120255/
https://github.com/SocialSisterYi/bilibili-API-collect/blob/master/docs/comment/readme.md

## JSON

https://json-schema.apifox.cn
https://bbs.huaweicloud.com/blogs/279515
https://www.cnblogs.com/mashukui/p/16972826.html

## Cookie

https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Cookies

## Unpacking

https://www.cnblogs.com/will-wu/p/13251545.html
https://www.w3schools.com/python/python_tuples.asp

'''''''''''

class BilibiliAPI:
    @staticmethod
    # Parse video link basic information JSON and return it in JSON format
    def get_bv_json(video_url):
        video_id = re.findall(r'BV\w+', video_url)[0]
        api_url = f'https://api.bilibili.com/x/web-interface/view?bvid={video_id}'
        bv_json = requests.get(api_url).json()
        return bv_json

    @staticmethod
    # Parse video link bullet comments XML using the 'cid' field in JSON
    def get_danmu_xml(bv_json):
        cid = bv_json['data']["cid"]
        api_url = f'https://comment.bilibili.com/{cid}.xml'
        danmu_xml = api_url
        return danmu_xml

    @staticmethod
    # Parse video link comments JSON using the 'aid' field in JSON
    def get_comment_json(bv_json):
        aid = bv_json['data']["aid"]
        api_url = f'https://api.bilibili.com/x/v2/reply/main?next=1&type=1&oid={aid}'
        comment_json = requests.get(api_url).json()
        return comment_json

    @staticmethod
    # Enhanced parsing of video link comments JSON using the 'aid' field in JSON
    def get_comment_json_to_webui(bv_json):
        aid = bv_json['data']["aid"]
        api_url = f'https://api.bilibili.com/x/v2/reply/main?next=1&type=1&oid={aid}'

        # Determine the current operating system type
        if platform.system() == "Windows":
            # Windows platform
            driver = webdriver.Chrome()
        else:
            # Other platforms
            driver = webdriver.Chrome(ChromeDriverManager().install())

        # Provide login time
        print("Provide 45 seconds for Bilibili login")
        time.sleep(45)

        # Open the link
        driver.get(api_url)

        # Provide view effect time
        print("Provide 15 seconds to check the effects")
        time.sleep(15)

        # Find the <pre class="brush:php;toolbar:false"> element
        pre_element = driver.find_element(By.TAG_NAME, 'pre')

        # Get the text content of the element
        text_content = pre_element.text

        # Close WebDriver
        driver.quit()

        return text_content

    @staticmethod
    # Traverse user information and return basic parameters, preparing for XLSX write-in
    def get_user_card(mid, cookies):
            api_url = f'https://account.bilibili.com/api/member/getCardByMid?mid={mid}'
            try:
                response = requests.get(api_url, cookies=cookies)
                user_card_json = response.json()
            except json.JSONDecodeError:
                return {"error": "Failed to parse JSON. Ensure a good network environment. Too many API calls might trigger restrictions; try again later."}

            if 'message' in user_card_json:
                message = user_card_json['message']
                if 'request blocked' in message or 'frequent requests' in message:
                    return {"warning": "Ensure a good network environment. Too many API calls might trigger restrictions; try again later."}

            return user_card_json

class CRC32Checker:
    ''''''''''
    # CRC32 cracking
    # Source: https://github.com/Aruelius/crc32-crack
    # Author: Aruelius
    # Note: This section has been slightly adjusted and encapsulated as a class for easier use.
    '''''''''

    CRCPOLYNOMIAL = 0xEDB88320
    crctable = [0 for x in range(256)]

    def __init__(self):
        self.create_table()

    def create_table(self):
        # Create a CRC table for quick CRC value computation
        for i in range(256):
            crcreg = i
            for _ in range(8):
                if (crcreg & 1) != 0:
                    crcreg = self.CRCPOLYNOMIAL ^ (crcreg >> 1)
                else:
                    crcreg = crcreg >> 1
            self.crctable[i] = crcreg

    def crc32(self, string):
        # Compute the CRC32 value for the given string
        crcstart = 0xFFFFFFFF
        for i in range(len(str(string))):
            index = (crcstart ^ ord(str(string)[i])) & 255
            crcstart = (crcstart >> 8) ^ self.crctable[index]
        return crcstart

    def crc32_last_index(self, string):
        # Compute the last character CRC table index for a given string
        crcstart = 0xFFFFFFFF
        for i in range(len(str(string))):
            index = (crcstart ^ ord(str(string)[i])) & 255
            crcstart = (crcstart >> 8) ^ self.crctable[index]
        return index

    def get_crc_index(self, t):
        # Find the index in the CRC table corresponding to the highest byte value
        for i in range(256):
            if self.crctable[i] >> 24 == t:
                return i
        return -1

    def deep_check(self, i, index):
        # Deep check based on index and previous CRC32 values to verify the assumption
        string = ""
        tc = 0x00
        hashcode = self.crc32(i)
        tc = hashcode & 0xff ^ index[2]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[2]] ^ (hashcode >> 8)
        tc = hashcode & 0xff ^ index[1]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[1]] ^ (hashcode >> 8)
        tc = hashcode & 0xff ^ index[0]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[0]] ^ (hashcode >> 8)
        return [1, string]

    def main(self, string):
        # Main function to compute and validate CRC32 for the given string
        index = [0 for x in range(4)]
        i = 0
        ht = int(f"0x{string}", 16) ^ 0xffffffff
        for i in range(3, -1, -1):
            index[3-i] = self.get_crc_index(ht >> (i*8))
            snum = self.crctable[index[3-i]]
            ht ^= snum >> ((3-i)*8)
        for i in range(100000000):
            lastindex = self.crc32_last_index(i)
            if lastindex == index[3]:
                deepCheckData = self.deep_check(i, index)
                if deepCheckData[0]:
                    break
        if i == 100000000:
            return -1
        return f"{i}{deepCheckData[1]}"
class Tools:
    @staticmethod
    # Get save path and format
    def get_save():
        return os.path.join(os.path.join(os.path.expanduser("~"), "Desktop"),
                            "Bilibili_Video_Analysis_{}.xlsx".format(datetime.now().strftime('%Y-%m-%d')))

    @staticmethod
    # Format timestamp
    def format_timestamp(timestamp):
        dt_object = datetime.fromtimestamp(timestamp)
        formatted_time = dt_object.strftime("%Y-%m-%d %H:%M:%S")
        return formatted_time

    @staticmethod
    # Calculate sentiment score
    def calculate_sentiment_score(text):
        s = SnowNLP(text)
        sentiment_score = s.sentiments
        return sentiment_score

    @staticmethod
    # Generate a word cloud
    def get_word_cloud(sheet_name: str, workbook: Workbook):
        sheet = workbook[sheet_name]

        # Read frequency data
        words = []
        frequencies = []
        for row in sheet.iter_rows(min_row=2, values_only=True):
            words.append(row[0])
            frequencies.append(row[1])

        system = platform.system()

        if system == 'Darwin':  # macOS
            font_path = '/System/Library/Fonts/STHeiti Light.ttc'
        elif system == 'Windows':
            font_path = 'C:/Windows/Fonts/simhei.ttf'
        else:  # Other OS
            font_path = 'simhei.ttf'

        wordcloud = WordCloud(background_color='white', max_words=100, font_path=font_path)
        word_frequency = dict(zip(words, frequencies))
        wordcloud.generate_from_frequencies(word_frequency)

        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis('off')
        plt.show()

    @staticmethod
    # Generate horizontal statistics chart
    def get_word_chart(sheet_name: str, workbook):
        sheet = workbook[sheet_name]

        words = []
        frequencies = []
        for row in sheet.iter_rows(min_row=2, values_only=True):
            words.append(row[0])
            frequencies.append(row[1])

        system = platform.system()

        if system == 'Darwin':  
            font_path = '/System/Library/Fonts/STHeiti Light.ttc'
        elif system == 'Windows':
            font_path = 'C:/Windows/Fonts/simhei.ttf'
        else:  
            font_path = 'simhei.ttf'

        custom_font = fm.FontProperties(fname=font_path)

        fig, ax = plt.subplots()
        ax.barh(words, frequencies)
        ax.set_xlabel("Frequency", fontproperties=custom_font)
        ax.set_ylabel("Words", fontproperties=custom_font)

        plt.yticks(fontproperties=custom_font)

        plt.show()

    @staticmethod
    def get_user_info_by_card(user_card_json):
        info = {
            'name': "N/A", 'birthday': "N/A", 'regtime': "N/A",
            'fans': "N/A", 'friend': "N/A"
        }

        try:
            info['name'] = user_card_json['card']['name']
            info['birthday'] = user_card_json['card']['birthday']
            info['regtime'] = Tools.format_timestamp(int(user_card_json['card']['regtime']))
            info['fans'] = user_card_json['card']['fans']
            info['friend'] = user_card_json['card']['friend']
        except KeyError:
            pass

        return tuple(info.values())

class BilibiliExcel:
    @staticmethod
    # Write video basic information
    def write_base_info(workbook, bv_json):
        sheet = workbook.create_sheet(title="Video Info")
        headers = ["Video Title", "Author", "Publish Time", "Views", "Favorites", "Shares", "Total Bullet Comments",
                   "Comments Count", "Video Description", "Category", "Video Link", "Thumbnail Link"]
        sheet.append(headers)

        data = [bv_json["data"]["title"],
                bv_json["data"]["owner"]["name"],
                Tools.format_timestamp(bv_json["data"]["pubdate"]),
                bv_json["data"]["stat"]["view"],
                bv_json["data"]["stat"]["favorite"],
                bv_json["data"]["stat"]["share"],
                bv_json["data"]["stat"]["danmaku"],
                bv_json["data"]["stat"]["reply"],
                bv_json["data"]["desc"],
                bv_json["data"]["tname"],
                video_url,
                bv_json["data"]["pic"]]

        sheet.append(data)

    @staticmethod
    def save_workbook(workbook):
        workbook.save(Tools.get_save())

class PrintInfo:
    # Print basic information
    @staticmethod
    def base_message():
        if 'Windows' == platform.system():
            os.system('cls')
        else:
            os.system('clear')

        text = '''
        ************************************

        Bilibili Video Analysis v2023.6.26
        Author: Github.com/hoochanlon
        Project URL: https://github.com/hoochanlon/scripts

        Features:
        1. Analyze and visualize Bilibili video data.

        Disclaimer: For research and learning purposes only.

        ************************************
        '''
        print(text.center(50, ' '))

if __name__ == '__main__':
    PrintInfo.base_message()

    while True:
        video_url = input("Paste the Bilibili video link: ")
        if re.match(r'.*BV\w+', video_url):
            break
        else:
            print("Invalid link format. Please re-enter.")

    bv_json = BilibiliAPI.get_bv_json(video_url)
    workbook = Workbook()
    workbook.remove(workbook.active)
    BilibiliExcel.write_base_info(workbook, bv_json)
    BilibiliExcel.save_workbook(workbook)

使用注意事項：

為了簡化cookie輸入，可以使用key=value；格式，例如“a=a;”，以跳過不必要的步驟。
查看 IP 位置需要透過網(wǎng)路驅動程式登入您的 Bilibili 帳戶。

以上是【Python】B站影片評論與彈幕處理分析腳本的詳細內容。更多資訊請關注PHP中文網(wǎng)其他相關文章！

本網(wǎng)站聲明

本文內容由網(wǎng)友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發(fā)現(xiàn)涉嫌抄襲或侵權的內容，請聯(lián)絡admin@php.cn

熱AI工具

熱工具

熱門話題

Java教學

1793

CakePHP 教程

1737

Laravel 教程

1588

NYT迷你填字遊戲答案

267

587

NYT連接提示和答案

120

836

Related knowledge

Python類中的多態(tài)性 Jul 05, 2025 am 02:58 AM

多態(tài)是Python面向對象編程中的核心概念，指“一種接口，多種實現(xiàn)”，允許統(tǒng)一處理不同類型的對象。 1.多態(tài)通過方法重寫實現(xiàn)，子類可重新定義父類方法，如Animal類的speak()方法在Dog和Cat子類中有不同實現(xiàn)。 2.多態(tài)的實際用途包括簡化代碼結構、增強可擴展性，例如圖形繪製程序中統(tǒng)一調用draw()方法，或遊戲開發(fā)中處理不同角色的共同行為。 3.Python實現(xiàn)多態(tài)需滿足：父類定義方法，子類重寫該方法，但不要求繼承同一父類，只要對象實現(xiàn)相同方法即可，這稱為“鴨子類型”。 4.注意事項包括保持方

什麼是python的列表切片？ Jun 29, 2025 am 02:15 AM

ListslicinginPythonextractsaportionofalistusingindices.1.Itusesthesyntaxlist[start:end:step],wherestartisinclusive,endisexclusive,andstepdefinestheinterval.2.Ifstartorendareomitted,Pythondefaultstothebeginningorendofthelist.3.Commonusesincludegetting

python`@classmethod'裝飾師解釋了 Jul 04, 2025 am 03:26 AM

類方法是Python中通過@classmethod裝飾器定義的方法，其第一個參數(shù)為類本身(cls)，用於訪問或修改類狀態(tài)。它可通過類或實例調用，影響的是整個類而非特定實例；例如在Person類中，show_count()方法統(tǒng)計創(chuàng)建的對像數(shù)量；定義類方法時需使用@classmethod裝飾器並將首參命名為cls，如change_var(new_value)方法可修改類變量；類方法與實例方法(self參數(shù))、靜態(tài)方法(無自動參數(shù))不同，適用於工廠方法、替代構造函數(shù)及管理類變量等場景；常見用途包括從

Python函數(shù)參數(shù)和參數(shù) Jul 04, 2025 am 03:26 AM

參數(shù)（parameters）是定義函數(shù)時的佔位符，而傳參（arguments）是調用時傳入的具體值。 1.位置參數(shù)需按順序傳遞，順序錯誤會導致結果錯誤；2.關鍵字參數(shù)通過參數(shù)名指定，可改變順序且提高可讀性；3.默認參數(shù)值在定義時賦值，避免重複代碼，但應避免使用可變對像作為默認值；4.args和*kwargs可處理不定數(shù)量的參數(shù)，適用於通用接口或裝飾器，但應謹慎使用以保持可讀性。

如何使用CSV模塊在Python中使用CSV文件？ Jun 25, 2025 am 01:03 AM

Python的csv模塊提供了讀寫CSV文件的簡單方法。 1.讀取CSV文件時，可使用csv.reader()逐行讀取，並將每行數(shù)據(jù)作為字符串列表返回；若需通過列名訪問數(shù)據(jù)，則可用csv.DictReader()，它將每行映射為字典。 2.寫入CSV文件時，使用csv.writer()並調用writerow()或writerows()方法寫入單行或多行數(shù)據(jù)；若要寫入字典數(shù)據(jù)，則使用csv.DictWriter()，需先定義列名並通過writeheader()寫入表頭。 3.處理邊緣情況時，模塊自動處理

解釋Python發(fā)電機和迭代器。 Jul 05, 2025 am 02:55 AM

迭代器是實現(xiàn)__iter__()和__next__()方法的對象，生成器是簡化版的迭代器，通過yield關鍵字自動實現(xiàn)這些方法。 1.迭代器每次調用next()返回一個元素，無更多元素時拋出StopIteration異常。 2.生成器通過函數(shù)定義，使用yield按需生成數(shù)據(jù)，節(jié)省內存且支持無限序列。 3.處理已有集合時用迭代器，動態(tài)生成大數(shù)據(jù)或需惰性求值時用生成器，如讀取大文件時逐行加載。注意：列表等可迭代對像不是迭代器，迭代器到盡頭後需重新創(chuàng)建，生成器只能遍歷一次。

如何在Python中結合兩個列表？ Jun 30, 2025 am 02:04 AM

合併兩個列表有多種方法，選擇合適方式可提升效率。 1.使用號拼接生成新列表，如list1 list2；2.使用 =修改原列表，如list1 =list2；3.使用extend()方法在原列表上操作，如list1.extend(list2)；4.使用號解包合併（Python3.5 ），如[list1,*list2]，支持靈活組合多個列表或添加元素。不同方法適用於不同場景，需根據(jù)是否修改原列表及Python版本進行選擇。

如何在Python中調用功能？ Jun 25, 2025 am 12:59 AM

要調用Python中的函數(shù)，需先定義函數(shù)再使用函數(shù)名加括號的形式進行調用。 1.使用def關鍵字定義函數(shù)，如defgreet():print("Hello,world!")；2.通過函數(shù)名後加括號調用函數(shù)，如greet()；3.若函數(shù)需要參數(shù)，調用時在括號內傳入對應值，如defgreet(name):print(f"Hello,{name}!")和greet("Alice")；4.可傳遞多個參數(shù)，如defadd(a,b):result=a

See all articles

国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

【Python】B站影片評論與彈幕處理分析腳本

介紹

特點與原理

腳本特點：

增強功能：

工作原理：

快速入門

熱AI工具

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

熱門文章

熱工具

記事本++7.3.1

SublimeText3漢化版

禪工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

熱門話題