{
  "issues": [
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/200",
      "id": 3948654796,
      "node_id": "I_kwDOO3Bfkc7rW7DM",
      "number": 200,
      "title": "[Feature Request]:",
      "user": {
        "login": "ayushmittalde",
        "id": 183465650,
        "node_id": "U_kgDOCu92sg",
        "avatar_url": "https://avatars.githubusercontent.com/u/183465650?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ayushmittalde",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2026-02-16T16:53:06Z",
      "updated_at": "2026-02-16T16:53:11Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nHi team \n\nOver the past week, I’ve been working with the repository and successfully configured it to use Azure OpenAI endpoints instead of the default OpenAI endpoint.\n\nThe setup works end-to-end and supports both chat models and embeddings. Since Azure OpenAI is widely used in enterprise and production environments, I thought it might be helpful to add an additional example in the examples folder demonstrating how to run the RAG pipeline with Azure configuration.\n\nMy plan would be to contribute:\n\n- A working Azure-based RAG example\n- A sample .env file showing the required environment variables\n\nClear setup instructions for Azure endpoints\n\nThis does not require any changes to the core library — it’s purely an example configuration to make Azure deployments easier for the community.\n\nIf this sounds useful, I would be happy to open a PR. Let me know your thoughts!\n\nThanks for the great project\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/200/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/184",
      "id": 3774642077,
      "node_id": "I_kwDOO3Bfkc7g_Hed",
      "number": 184,
      "title": "[Bug]: Getting 'ascii' codec can't encode character '\\u2018' in position 7: ordinal not in range(128) error with lightrag_openai_demo.py example",
      "user": {
        "login": "ahmedwaqar",
        "id": 21281602,
        "node_id": "MDQ6VXNlcjIxMjgxNjAy",
        "avatar_url": "https://avatars.githubusercontent.com/u/21281602?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ahmedwaqar",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2026-01-01T13:32:21Z",
      "updated_at": "2026-01-01T13:32:21Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nGetting the following error when running `lightrag_openai_demo.py` example on M1 Mac:\n\n> NFO: [] Created new empty graph file: ./dickens/graph_chunk_entity_relation.graphml\n> INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_entities.json'} 0 data\n> INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_relationships.json'} 0 data\n> INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_chunks.json'} 0 data\n> INFO: [] Process 72129 KV load full_docs with 0 records\n> INFO: [] Process 72129 KV load text_chunks with 0 records\n> INFO: [] Process 72129 KV load full_entities with 0 records\n> INFO: [] Process 72129 KV load full_relations with 0 records\n> INFO: [] Process 72129 KV load entity_chunks with 0 records\n> INFO: [] Process 72129 KV load relation_chunks with 0 records\n> INFO: [] Process 72129 KV load llm_response_cache with 0 records\n> INFO: [] Process 72129 doc status load doc_status with 0 records\n> INFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)\n> ERROR: Embedding func: Error in decorated function for task 4575884016_223046.773646959: 'ascii' codec can't encode character '\\u2018' in position 7: ordinal not in range(128)\n> An error occurred: 'ascii' codec can't encode character '\\u2018' in position 7: ordinal not in range(128)\n> INFO: Successfully finalized 12 storages\n> \n> Done!\n\n\n### Steps to reproduce\n\nMachine: Macbook pro Apple M1\nMemory: 16GB\nMacos Tahoe\n\n`python3 ./example/lightrag_openai_demo.py`\n\n### Expected Behavior\n\nPrint query results on ./book.txt on the terminal\n\n### LightRAG Config Used\n\n# Paste your config here\n* Successful installation of lightrag \n* Created python3 virtual environment with uv\n\n\n### Logs and screenshots\n\nINFO: [] Created new empty graph file: ./dickens/graph_chunk_entity_relation.graphml\nINFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_entities.json'} 0 data\nINFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_relationships.json'} 0 data\nINFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './dickens/vdb_chunks.json'} 0 data\nINFO: [] Process 72697 KV load full_docs with 0 records\nINFO: [] Process 72697 KV load text_chunks with 0 records\nINFO: [] Process 72697 KV load full_entities with 0 records\nINFO: [] Process 72697 KV load full_relations with 0 records\nINFO: [] Process 72697 KV load entity_chunks with 0 records\nINFO: [] Process 72697 KV load relation_chunks with 0 records\nINFO: [] Process 72697 KV load llm_response_cache with 0 records\nINFO: [] Process 72697 doc status load doc_status with 0 records\nINFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)\nERROR: Embedding func: Error in decorated function for task 4608291568_223550.498207375: 'ascii' codec can't encode character '\\u2018' in position 7: ordinal not in range(128)\nAn error occurred: 'ascii' codec can't encode character '\\u2018' in position 7: ordinal not in range(128)\nINFO: Successfully finalized 12 storages\n\nDone!\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System: macOS Tahoe\n- Python Version: 3.14.2\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/184/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/183",
      "id": 3772764900,
      "node_id": "I_kwDOO3Bfkc7g39Lk",
      "number": 183,
      "title": "[Question]: Does Parsers (Docling) support processing files from URL",
      "user": {
        "login": "AnSaradar",
        "id": 114799056,
        "node_id": "U_kgDOBtex0A",
        "avatar_url": "https://avatars.githubusercontent.com/u/114799056?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/AnSaradar",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-12-31T13:00:28Z",
      "updated_at": "2026-02-10T00:34:34Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nDocling itself has a feature to parse  the documents from a download / public URL?\nBut after checking the parser codebase, I didnt find any functions to handle url, all the functions handle only paths (local docs)\n\nDoes RAGAnything actually supports that?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/183/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": [
        195
      ]
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/181",
      "id": 3763852239,
      "node_id": "I_kwDOO3Bfkc7gV9PP",
      "number": 181,
      "title": "[Question]:实验结果是否进行了投票",
      "user": {
        "login": "zcc1146874411",
        "id": 213924223,
        "node_id": "U_kgDODMA5fw",
        "avatar_url": "https://avatars.githubusercontent.com/u/213924223?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/zcc1146874411",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-12-26T18:37:52Z",
      "updated_at": "2025-12-26T18:37:52Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n_No response_\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/181/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/178",
      "id": 3750709193,
      "node_id": "I_kwDOO3Bfkc7fj0fJ",
      "number": 178,
      "title": "[Feature Request]: PaddleOCR available for parser",
      "user": {
        "login": "shkim4u",
        "id": 2314727,
        "node_id": "MDQ6VXNlcjIzMTQ3Mjc=",
        "avatar_url": "https://avatars.githubusercontent.com/u/2314727?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/shkim4u",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-12-21T04:07:31Z",
      "updated_at": "2026-02-16T13:21:28Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nAs another SOTA-level parser PaddleOCR out there, it would be great if RAG-Anything support it as one of available parser options in addition to \"minerU\" and \"docling\"\n\n### Additional Context\n\nHere is the repository URL of PaddleOCR:\nhttps://github.com/PaddlePaddle/PaddleOCR",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/178/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/176",
      "id": 3725824478,
      "node_id": "I_kwDOO3Bfkc7eE5He",
      "number": 176,
      "title": "[Question]:",
      "user": {
        "login": "hexmSeeU",
        "id": 157289563,
        "node_id": "U_kgDOCWAMWw",
        "avatar_url": "https://avatars.githubusercontent.com/u/157289563?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/hexmSeeU",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-12-13T11:09:33Z",
      "updated_at": "2025-12-13T11:09:33Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi there, I met some problems when conducting following file processing code:\n\n```python\nfor data_file in data_files:\n    index = data_file.split(\"/\")[-1].split(\".\")[0]\n    working_dir = os.path.join(working_base_dir, f\"rag_storage_{index}\")\n    os.makedirs(working_dir, exist_ok=True)\n\n    rag_config = RAGAnythingConfig(\n        working_dir=working_dir,\n        parser=\"mineru\",  # Parser selection: mineru or docling\n        parse_method=\"auto\",  # Parse method: auto, ocr, or txt\n        enable_image_processing=True,\n        enable_table_processing=True,\n        enable_equation_processing=True,\n    )\n\n\n    # Initialize RAGAnything\n    rag = RAGAnything(\n        config=rag_config,\n        llm_model_func=llm_model_func,\n        vision_model_func=vision_model_func,\n        embedding_func=embedding_func,\n    )\n\n\n    # Process a document\n    TEST_DATA_PATHs = [data_file]\n    for file in TEST_DATA_PATHs:\n        try:\n            await rag.process_document_complete(\n                file_path=file,\n                output_dir=output_dir,\n                parse_method=\"auto\",\n                backend=\"vlm-vllm-engine\",\n                source=\"local\",\n            )\n        except:\n            continue\n```\n\nI found that the cache information from the first processed file is automatically carried over into the cache of the second processed file, and so on. In addition, the following log is printed when processing the first file, but it is not printed when processing the subsequent files. \n\n```bash\nlightrag - INFO - [_] Process 2051 KV load full_docs with 0 records\nlightrag - INFO - [_] Process 2051 KV load text_chunks with 0 records\nlightrag - INFO - [_] Process 2051 KV load full_entities with 0 records\nlightrag - INFO - [_] Process 2051 KV load full_relations with 0 records\nlightrag - INFO - [_] Process 2051 KV load entity_chunks with 0 records\nlightrag - INFO - [_] Process 2051 KV load relation_chunks with 0 records\nlightrag - INFO - [_] Process 2051 KV load llm_response_cache with 0 records\nlightrag - INFO - [_] Process 2051 doc status load doc_status with 0 records\nlightrag - INFO - [_] Process 2051 KV load parse_cache with 16 records\n```\n\n\n\nHow should I modify my code so that these files do not interfere with each other during processing?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/176/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/173",
      "id": 3699178529,
      "node_id": "I_kwDOO3Bfkc7cfPwh",
      "number": 173,
      "title": "[Question]: How to retrieval multimodal content path",
      "user": {
        "login": "hexmSeeU",
        "id": 157289563,
        "node_id": "U_kgDOCWAMWw",
        "avatar_url": "https://avatars.githubusercontent.com/u/157289563?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/hexmSeeU",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-12-05T14:23:39Z",
      "updated_at": "2025-12-06T22:11:44Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi there!\nI'd like to ask whether I can get the multimodal content path (e.g. image_path, table_image_path) based on the constructed knowledge graph? Although I found that a node in the KG contains the following keys:\nentity_id, entity_type, description, etc, I didn't find a key that stores the image path of the image if the content type is image.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/173/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/172",
      "id": 3697799274,
      "node_id": "I_kwDOO3Bfkc7cZ_Bq",
      "number": 172,
      "title": "[Bug]:The process got stuck at rag.process_document_complete due to network issues.",
      "user": {
        "login": "xtanitfy",
        "id": 13915015,
        "node_id": "MDQ6VXNlcjEzOTE1MDE1",
        "avatar_url": "https://avatars.githubusercontent.com/u/13915015?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/xtanitfy",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-12-05T07:24:04Z",
      "updated_at": "2025-12-05T07:41:15Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [ ] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nSince it's deployed in a local area network (LAN), can it be switched to offline mode?\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/172/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/168",
      "id": 3668073382,
      "node_id": "I_kwDOO3Bfkc7aolum",
      "number": 168,
      "title": "[Question]:[MinerU] OSError: Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).",
      "user": {
        "login": "zkailinzhang",
        "id": 10251153,
        "node_id": "MDQ6VXNlcjEwMjUxMTUz",
        "avatar_url": "https://avatars.githubusercontent.com/u/10251153?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/zkailinzhang",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-11-26T15:47:05Z",
      "updated_at": "2025-11-26T15:47:05Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nINFO: Starting document parsing: /appdata/rag/LightRAG/other_ok/Bad_prints.pdf\nINFO: Using mineru parser with method: auto\nINFO: Detected PDF file, using parser for PDF...\nERROR:root:[MinerU] 2025-11-26 23:34:20.355 | ERROR    | mineru.cli.client:parse_doc:211 - Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).\nERROR:root:[MinerU] raise EnvironmentError(\nERROR:root:[MinerU] OSError: Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).\nERROR: Mineru command failed: Mineru command failed with return code 0: ['2025-11-26 23:34:20.355 | ERROR    | mineru.cli.client:parse_doc:211 - Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).', 'raise EnvironmentError(', 'OSError: Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).']\nAn error occurred: Mineru command failed with return code 0: ['2025-11-26 23:34:20.355 | ERROR    | mineru.cli.client:parse_doc:211 - Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).', 'raise EnvironmentError(', 'OSError: Consistency check failed: file should be of size 713217212 but has size 776131772 (model.safetensors).']\n\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/168/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/167",
      "id": 3663162953,
      "node_id": "I_kwDOO3Bfkc7aV25J",
      "number": 167,
      "title": "[Bug]:run LightRAG/examples/raganything_example.py",
      "user": {
        "login": "zkailinzhang",
        "id": 10251153,
        "node_id": "MDQ6VXNlcjEwMjUxMTUz",
        "avatar_url": "https://avatars.githubusercontent.com/u/10251153?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/zkailinzhang",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-11-25T13:51:07Z",
      "updated_at": "2025-11-25T13:53:26Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nINFO: Reset 1 documents from PROCESSING/FAILED to PENDING status\nINFO: Processing 1 document(s)\nINFO: Extracting stage 1/1: Bad_prints.pdf\nINFO: Processing d-id: doc-b0c8009df6258966f19eb454deb422be\nINFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)\nERROR: Traceback (most recent call last):\nFile \"/home/zhangkailin/.conda/envs/rag/lib/python3.12/site-packages/lightrag/lightrag.py\", line 1830, in process_document\nawait asyncio.gather(*first_stage_tasks)\nFile \"/home/zhangkailin/.conda/envs/rag/lib/python3.12/site-packages/lightrag/kg/nano_vector_db_impl.py\", line 131, in upsert\nresults = client.upsert(datas=list_data)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFile \"/home/zhangkailin/.conda/envs/rag/lib/python3.12/site-packages/nano_vectordb/dbs.py\", line 116, in upsert\nself.__storage[\"matrix\"] = np.vstack([self.__storage[\"matrix\"], new_matrix])\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFile \"/home/zhangkailin/.conda/envs/rag/lib/python3.12/site-packages/numpy/core/shape_base.py\", line 289, in vstack\nreturn _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3072 and the array at index 1 has size 1024\n\nERROR: Failed to extract document 1/1: Bad_prints.pdf\nINFO: Enqueued document processing pipeline stopped\nINFO: Text content insertion complete\nINFO: Starting multimodal content processing...\nINFO: Starting to process 82 multimodal content items\nERROR: Error generating image description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating image description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nINFO: Multimodal chunk generation progress: 8/82 (9.8%)\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating image description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating image description: 'hashing_kv'\nERROR: Error generating image description: 'hashing_kv'\nINFO: Multimodal chunk generation progress: 16/82 (19.5%)\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error generating image description: 'hashing_kv'\nERROR: Error generating discarded description: 'hashing_kv'\nINFO: Multimodal chunk generation progress: 24/82 (29.3%)\nERROR: Error generating discarded description: 'hashing_kv'\nERROR: Error ge\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/167/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/164",
      "id": 3651493163,
      "node_id": "I_kwDOO3Bfkc7ZpV0r",
      "number": 164,
      "title": "[Question]:",
      "user": {
        "login": "a-rookie-create",
        "id": 121660490,
        "node_id": "U_kgDOB0BkSg",
        "avatar_url": "https://avatars.githubusercontent.com/u/121660490?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/a-rookie-create",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-11-21T12:27:39Z",
      "updated_at": "2025-11-21T12:27:39Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n我发现每次运行end to end示例时，都是VLM重新处理文档，生成新的KG。这会消耗大量时间和token，能不能直接导入已有的KG呢？\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/164/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/159",
      "id": 3594237864,
      "node_id": "I_kwDOO3Bfkc7WO7eo",
      "number": 159,
      "title": "[Bug]: Analyzed data of tables not complete",
      "user": {
        "login": "AbdelFatah22899",
        "id": 78560577,
        "node_id": "MDQ6VXNlcjc4NTYwNTc3",
        "avatar_url": "https://avatars.githubusercontent.com/u/78560577?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/AbdelFatah22899",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-11-06T05:59:01Z",
      "updated_at": "2025-11-06T05:59:01Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [ ] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nWhen running table or chunk analysis using a reasoning-enabled model (such as qwen2.5-think, deepseek-r1, or similar), the system sometimes stores only the model’s internal “thinking” text (<think> ... </think>) instead of the real analysis result.\n\n```\n{\n  \"content\": \"<think>\\nOkay, let's see. The user wants me to analyze this table and provide a JSON response with sp...\"\n}\n```\n\nThe final structured output or description (which should follow the reasoning) is missing completely.\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/159/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/156",
      "id": 3578179138,
      "node_id": "I_kwDOO3Bfkc7VRq5C",
      "number": 156,
      "title": "[Feature Request]: support incremental scan of a folder and update the changed file based on date and md5",
      "user": {
        "login": "ghost",
        "id": 10137,
        "node_id": "MDQ6VXNlcjEwMTM3",
        "avatar_url": "https://avatars.githubusercontent.com/u/10137?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ghost",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-11-01T16:27:07Z",
      "updated_at": "2025-11-01T16:27:07Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [ ] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nsupport incremental scan of a folder and update the changed file based on  date and md5\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/156/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/154",
      "id": 3573716385,
      "node_id": "I_kwDOO3Bfkc7VApWh",
      "number": 154,
      "title": "[Feature Request]: Enable Cache Data Insertion for New Graph Database(or other database)",
      "user": {
        "login": "Jarod-Leo",
        "id": 178803497,
        "node_id": "U_kgDOCqhTKQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/178803497?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Jarod-Leo",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-10-31T07:36:44Z",
      "updated_at": "2025-10-31T11:29:37Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nCurrently, when switching to a new graph database (e.g., from Neo4j to another provider), the system fails to insert cached data into the new database. This occurs because:\n\nThe `_process_multimodal_content`  in method `Processors.py` skips the insertion step if it detects cached data ( doc_id exists in cache).\nThere is no mechanism to re-insertion of cached data into the new database.\n\n### Additional Context\n\n```python\n   async def _process_multimodal_content(...):\n       ...\n   \n       try:\n            existing_doc_status = await self.lightrag.doc_status.get_by_id(doc_id)\n            if existing_doc_status: \n                # Check if multimodal content is already processed\n                multimodal_processed = existing_doc_status.get(\n                    \"multimodal_processed\", False\n                )\n\n                if multimodal_processed:\n                    self.logger.info(\n                        f\"Document {doc_id} multimodal content is already processed\"\n                    )\n                    return\n``` ",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/154/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/151",
      "id": 3572701998,
      "node_id": "I_kwDOO3Bfkc7U8xsu",
      "number": 151,
      "title": "[Feature Request]: [Question]: Bring-Your-Own-Parser beyond MinerU and Docling",
      "user": {
        "login": "semmyk-research",
        "id": 113531105,
        "node_id": "U_kgDOBsRY4Q",
        "avatar_url": "https://avatars.githubusercontent.com/u/113531105?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/semmyk-research",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-10-30T22:32:44Z",
      "updated_at": "2026-01-14T04:02:35Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nAs we understand thing, RAG-Anything defaults to `MinerU` and allows `Docling`: seems this is through the `MineruParser` and `DoclingParser` classes. How do we or would it be possible to allow third-party pdf/document parser like [Marker-pdf](https://github.com/datalab-to/marker) or others.\n\n\n\n### Additional Context\n\n[background]\nWe are exploring 'moving to' RAG-Anything from our current LightRAG implementation. The intent is to keep our current Marker pipeline.   \nI worked with [Marker-pdf](https://github.com/datalab-to/marker) and in the process of embedding our Marker-based [GitHub: ParserPDF](https://github.com/semmyk-research/parserPDF) into our LightRAG implementation [Github: SemmyKG](https://github.com/semmyk-research/semmyKG).   ",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/151/reactions",
        "total_count": 2,
        "+1": 2,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/150",
      "id": 3561449220,
      "node_id": "I_kwDOO3Bfkc7UR2cE",
      "number": 150,
      "title": "[Feature Request]: Provide a container image for testing",
      "user": {
        "login": "Bodanel",
        "id": 7878846,
        "node_id": "MDQ6VXNlcjc4Nzg4NDY=",
        "avatar_url": "https://avatars.githubusercontent.com/u/7878846?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Bodanel",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-10-28T12:53:31Z",
      "updated_at": "2025-12-16T05:50:56Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nContainer images will allow faster testing and without intervention on the machine where this is deploying. This will also provide isolation and prevent conflicts with other frameworks and/or libraries deployed on the current system.\n\n### Additional Context\n\nYou could also provide a Containerfile and people who would want to go this route they can build their own containers.",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/150/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/149",
      "id": 3559395609,
      "node_id": "I_kwDOO3Bfkc7UKBEZ",
      "number": 149,
      "title": "[Question]:检索时支持返回图片吗",
      "user": {
        "login": "UncleFB",
        "id": 38581531,
        "node_id": "MDQ6VXNlcjM4NTgxNTMx",
        "avatar_url": "https://avatars.githubusercontent.com/u/38581531?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/UncleFB",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-10-28T02:02:45Z",
      "updated_at": "2025-12-06T22:13:31Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nRAG 时希望除了返回文本的答案，还想要返回图片，目前有实现这个功能吗？\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/149/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/146",
      "id": 3548036225,
      "node_id": "I_kwDOO3Bfkc7TeryB",
      "number": 146,
      "title": "[Question]:能够支持本地模型了吗",
      "user": {
        "login": "wangquanyue1994",
        "id": 208732419,
        "node_id": "U_kgDODHEBAw",
        "avatar_url": "https://avatars.githubusercontent.com/u/208732419?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/wangquanyue1994",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 6,
      "created_at": "2025-10-24T07:29:31Z",
      "updated_at": "2026-01-09T05:22:28Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n能够支持本地模型部署了吗，例如：ollama huggingface\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/146/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/143",
      "id": 3542998678,
      "node_id": "I_kwDOO3Bfkc7TLd6W",
      "number": 143,
      "title": "[Question]:给的raganything_example.py里为何没有Reranker模型的配置？",
      "user": {
        "login": "HZWHH",
        "id": 31908659,
        "node_id": "MDQ6VXNlcjMxOTA4NjU5",
        "avatar_url": "https://avatars.githubusercontent.com/u/31908659?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/HZWHH",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-10-23T03:29:30Z",
      "updated_at": "2025-10-23T03:29:30Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n_No response_\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/143/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/136",
      "id": 3494716052,
      "node_id": "I_kwDOO3Bfkc7QTSKU",
      "number": 136,
      "title": "[Question]:我使用中文数据构建了RAG，但是为什么检索出来的内容是英文的？",
      "user": {
        "login": "Typhoona",
        "id": 90813810,
        "node_id": "MDQ6VXNlcjkwODEzODEw",
        "avatar_url": "https://avatars.githubusercontent.com/u/90813810?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Typhoona",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-10-08T09:51:07Z",
      "updated_at": "2025-10-23T03:35:39Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n我使用中文数据构建了一个RAG，查询的文字也是中文的，但是检索出来的内容是英文的，我还是希望检索出来的内容以中文的形式呈现。该怎么做？\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/136/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/135",
      "id": 3493251636,
      "node_id": "I_kwDOO3Bfkc7QNso0",
      "number": 135,
      "title": "[Bug]: Warning: Failed to finalize RAGAnything storages: There is no current event loop in thread 'MainThread'.",
      "user": {
        "login": "cktang88",
        "id": 10319942,
        "node_id": "MDQ6VXNlcjEwMzE5OTQy",
        "avatar_url": "https://avatars.githubusercontent.com/u/10319942?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/cktang88",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 7,
      "created_at": "2025-10-07T22:40:37Z",
      "updated_at": "2026-01-06T07:42:19Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nJust used the demo script in the README, got that warning\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/135/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/133",
      "id": 3485277251,
      "node_id": "I_kwDOO3Bfkc7PvRxD",
      "number": 133,
      "title": "[Feature Request]:Support Remote MinerU instance",
      "user": {
        "login": "voycey",
        "id": 1065098,
        "node_id": "MDQ6VXNlcjEwNjUwOTg=",
        "avatar_url": "https://avatars.githubusercontent.com/u/1065098?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/voycey",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-10-05T23:32:24Z",
      "updated_at": "2025-10-05T23:32:24Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nMinerU (and docling) have the ability to be run on a remote server, this is particularly useful when you have GPU resources external to your main instance (for example Google Cloud Run GPU) - There are implementations for a MinerU API and Docling has a remote server now - supporting these should be a priority for the project as local inference for the OCR part of this is extremely slow.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/133/reactions",
        "total_count": 8,
        "+1": 8,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/131",
      "id": 3471527570,
      "node_id": "I_kwDOO3Bfkc7O606S",
      "number": 131,
      "title": "[Question]: How does this have an MIT license when Mineru has an APGL license?",
      "user": {
        "login": "paperworksllc",
        "id": 225684812,
        "node_id": "U_kgDODXOtTA",
        "avatar_url": "https://avatars.githubusercontent.com/u/225684812?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/paperworksllc",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-10-01T00:31:15Z",
      "updated_at": "2025-10-30T22:23:24Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI would like to use this, but I am worried that because Mineru has an APGL license, I will have to use an APGL license or pay Mineru a licensing fee. \n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/131/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/130",
      "id": 3470385457,
      "node_id": "I_kwDOO3Bfkc7O2eEx",
      "number": 130,
      "title": "[Feature Request]:multi-modal embedding model support",
      "user": {
        "login": "BukeLy",
        "id": 19304666,
        "node_id": "MDQ6VXNlcjE5MzA0NjY2",
        "avatar_url": "https://avatars.githubusercontent.com/u/19304666?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/BukeLy",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-30T17:24:13Z",
      "updated_at": "2025-09-30T17:24:13Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nRequested Feature:\nWe propose adding direct support for native multi-modal embedding models. These models (e.g., OpenAI's CLIP, Google's SigLIP, or other excellent open-source alternatives) are capable of encoding both images and text directly into a shared semantic vector space.\n\nBenefits of this feature:\n\nRicher Semantic Representation: The vector representation of an image would be extracted directly from its pixels, potentially capturing richer and more nuanced visual details and semantics that are difficult to express in words.\n\nHigher Retrieval Accuracy: For tasks like image-text matching or cross-modal retrieval (searching for text with an image or vice versa), a shared vector space generally provides more accurate and reliable similarity calculations.\n\nArchitectural Simplification & Potential Efficiency Gains: This would bypass the intermediate \"VLM-generates-description\" step, simplifying the data ingestion pipeline and potentially improving overall efficiency.\n\nGreater Flexibility and Future-Proofing: It would allow the community and users to integrate various state-of-the-art open-source or proprietary multi-modal embedding models, keeping the RAG-Anything framework at the cutting edge.\n\n### Additional Context\n\nImplementing this feature would likely require some adjustments to the core architecture. Here are some preliminary ideas for the development team's consideration:\n\nExtend the embedding_func Interface:\nThe current embedding_func interface primarily accepts a list of strings (List[str]). A new, more flexible interface would be needed to handle a mixed list of types, such as List[Union[str, PIL.Image.Image, Path]], allowing the function to process text and images differently.\n\nModify the Data Ingestion Pipeline (Processor & ModalProcessors):\nA new or parallel processing workflow would be necessary. In _process_multimodal_content, instead of calling vision_model_func to generate a description for an image, the system would pass the image data (or its path) directly to the new multi-modal embedding_func for vectorization. The logic within ImageModalProcessor would need to be adapted accordingly.\n\nAdjust Query and Retrieval Logic:\nDuring querying, the user's text query must also be vectorized by the same multi-modal model to ensure vector space consistency. When a retrieved vector represents an image, the system needs a mechanism to retrieve the original image path or data from its metadata. This would allow it to be used in the final answer generation step, for instance, by passing it to a VLM for analysis.\n\nAn ideal user workflow might look like this (pseudo-code):\n\nPython\n```\n# 1. User provides a multi-modal capable embedding_func\ndef my_multimodal_embed_func(inputs: List[Union[str, Image.Image]]):\n    # ... logic to call the native multi-modal model ...\n    return vectors\n\n# 2. A new parameter might be introduced during initialization\nrag = RAGAnything(\n    config=config,\n    embedding_func=EmbeddingFunc(func=my_multimodal_embed_func, ...),\n    embedding_mode='multimodal' # Perhaps a new config option to switch modes\n)\n\n# 3. RAG-Anything internally passes image objects to the embedding function\nawait rag.process_document_complete(\"my_document_with_images.pdf\")\n```\nImplementing this feature might have dependencies on or require modifications to the underlying LightRAG framework, but it would significantly enhance the capabilities of RAG-Anything, making it an even more powerful and forward-looking \"RAG for Anything\" solution.\n\nThank you for your hard work on this excellent project!",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/130/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/129",
      "id": 3467198522,
      "node_id": "I_kwDOO3Bfkc7OqUA6",
      "number": 129,
      "title": "[Question]:",
      "user": {
        "login": "mipmap4k",
        "id": 150869050,
        "node_id": "U_kgDOCP4UOg",
        "avatar_url": "https://avatars.githubusercontent.com/u/150869050?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/mipmap4k",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-30T01:35:32Z",
      "updated_at": "2025-09-30T01:35:32Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\ntext parsing works well in English, but not so well in Russian. How can I fix this?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/129/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/127",
      "id": 3461916384,
      "node_id": "I_kwDOO3Bfkc7OWKbg",
      "number": 127,
      "title": "[Question]: Can we leverage LightRAG features to delete content from the knowledge base?",
      "user": {
        "login": "klehmann",
        "id": 4291861,
        "node_id": "MDQ6VXNlcjQyOTE4NjE=",
        "avatar_url": "https://avatars.githubusercontent.com/u/4291861?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/klehmann",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-28T13:53:46Z",
      "updated_at": "2025-09-28T13:53:46Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI just found RAG-Anything, deployed it locally and am very impressed about the results!\n\nLightRAG provides APIs to delete content from the knowledge base by name and ID. Can we leverage this functionality in RAG-Anything as well (by calling the underlying LightRAG API) or does this require additional effort to make it work with RAG-Anything?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/127/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/126",
      "id": 3461576866,
      "node_id": "I_kwDOO3Bfkc7OU3ii",
      "number": 126,
      "title": "[Bug]: mineru.cli.client:parse_doc:201 - numpy.core.multiarray failed to import ERROR:root:[MinerU] ImportError: numpy.core.multiarray failed to import",
      "user": {
        "login": "Thinking80s",
        "id": 338079,
        "node_id": "MDQ6VXNlcjMzODA3OQ==",
        "avatar_url": "https://avatars.githubusercontent.com/u/338079?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Thinking80s",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-28T10:43:09Z",
      "updated_at": "2025-09-28T10:43:09Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [ ] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nINFO: Using mineru parser with method: auto\nINFO: Detected PDF file, using parser for PDF...\nERROR:root:[MinerU] raise ImportError(msg)\nERROR:root:[MinerU] ImportError:\nERROR:root:[MinerU] raise ImportError(msg)\nERROR:root:[MinerU] ImportError:\nERROR:root:[MinerU] 2025-09-28 18:42:18.508 | ERROR    | mineru.cli.client:parse_doc:201 - numpy.core.multiarray failed to import\nERROR:root:[MinerU] ImportError: numpy.core.multiarray failed to import\nERROR: Mineru command failed: Mineru command failed with return code 0: ['raise ImportError(msg)', 'ImportError:', 'raise ImportError(msg)', 'ImportError:', '2025-09-28 18:42:18.508 | ERROR    | mineru.cli.client:parse_doc:201 - numpy.core.multiarray failed to import', 'ImportError: numpy.core.multiarray failed to import']\nERROR: Error processing with RAG: Mineru command failed with return code 0: ['raise ImportError(msg)', 'ImportError:', 'raise ImportError(msg)', 'ImportError:', '2025-09-28 18:42:18.508 | ERROR    | mineru.cli.client:parse_doc:201 - numpy.core.multiarray failed to import', 'ImportError: numpy.core.multiarray failed to import']\nERROR: Traceback (most recent call last):\n  File \"/Users/dengpeng/Documents/ai_project/RAG-Anything/examples/raganything_example.py\", line 206, in process_with_rag\n    await rag.process_document_complete(\n  File \"/Users/dengpeng/Documents/ai_project/RAG-Anything/raganything/processor.py\", line 1454, in process_document_complete\n    content_list, content_based_doc_id = await self.parse_document(\n                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/dengpeng/Documents/ai_project/RAG-Anything/raganything/processor.py\", line 331, in parse_document\n    content_list = await asyncio.to_thread(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/anaconda3/lib/python3.12/asyncio/threads.py\", line 25, in to_thread\n    return await loop.run_in_executor(None, func_call)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/anaconda3/lib/python3.12/concurrent/futures/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/dengpeng/Documents/ai_project/RAG-Anything/raganything/parser.py\", line 894, in parse_pdf\n    self._run_mineru_command(\n  File \"/Users/dengpeng/Documents/ai_project/RAG-Anything/raganything/parser.py\", line 768, in _run_mineru_command\n    raise MineruExecutionError(return_code, error_lines)\nraganything.parser.MineruExecutionError: Mineru command failed with return code 0: ['raise ImportError(msg)', 'ImportError:', 'raise ImportError(msg)', 'ImportError:', '2025-09-28 18:42:18.508 | ERROR    | mineru.cli.client:parse_doc:201 - numpy.core.multiarray failed to import', 'ImportError: numpy.core.multiarray failed to import']\n\nWarning: Failed to finalize RAGAnything storages: There is no current event loop in thread 'MainThread'.\n➜  RAG-Anything git:(main) ✗ python examples/raganything_example.py /Users/dengpeng/Documents/ai-document/提供素材要求标准清单.pdf --api-key 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 --parser mineru\n\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/126/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/124",
      "id": 3451040353,
      "node_id": "I_kwDOO3Bfkc7NsrJh",
      "number": 124,
      "title": "[Feature Request]:",
      "user": {
        "login": "SamiJohnson745",
        "id": 220575397,
        "node_id": "U_kgDODSW2pQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/220575397?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/SamiJohnson745",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-24T21:22:46Z",
      "updated_at": "2025-09-24T21:22:57Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nBot\n\n### Additional Context\n\nLe developement d’un bot de destruction ****",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/124/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/119",
      "id": 3439166019,
      "node_id": "I_kwDOO3Bfkc7M_YJD",
      "number": 119,
      "title": "[Bug]:TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'",
      "user": {
        "login": "gptbert",
        "id": 125624295,
        "node_id": "U_kgDOB3zf5w",
        "avatar_url": "https://avatars.githubusercontent.com/u/125624295?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/gptbert",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-09-22T04:45:07Z",
      "updated_at": "2025-10-16T11:01:50Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [ ] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nWhen I upload PDF then:\n\n500 Internal Server Error {\"detail\":\"DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'\"} /documents/paginated\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/119/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/118",
      "id": 3431124449,
      "node_id": "I_kwDOO3Bfkc7Mgs3h",
      "number": 118,
      "title": "[Bug]: Ollama is now not compatible with RAGAnything example",
      "user": {
        "login": "LaansDole",
        "id": 85084360,
        "node_id": "MDQ6VXNlcjg1MDg0MzYw",
        "avatar_url": "https://avatars.githubusercontent.com/u/85084360?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/LaansDole",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-09-18T16:30:37Z",
      "updated_at": "2025-09-30T07:30:27Z",
      "closed_at": null,
      "author_association": "CONTRIBUTOR",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nIn the example, the embedding model is defined as:\n```python\n    # Define embedding function\n    embedding_func = EmbeddingFunc(\n        embedding_dim=3072,\n        max_token_size=8192,\n        func=lambda texts: openai_embed(\n            texts,\n            model=\"text-embedding-3-large\",\n            api_key=api_key,\n            base_url=base_url,\n        ),\n    )\n```\nWhich will similar to `{base_url}/v1/embeddings`. However, the issue is that Ollama does not use this API to request embeddings model, instead:\n```bash\ncurl http://localhost:11434/api/embed -d '{\n  \"model\": \"mxbai-embed-large\",\n  \"input\": \"Llamas are members of the camelid family\"\n}'\n```\n```python\nollama.embed(\n  model='mxbai-embed-large',\n  input='Llamas are members of the camelid family',\n)\n```\nAs a result, if we want to use Ollama embedding models, we should use their library\n\n\n### Steps to reproduce\n\nIn `.env`, set Ollama environment variables\n\n### Expected Behavior\n\nCompatible\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System: MacOS 26\n- Python Version:\n- Related Issues:\n- Ollama Docs: https://ollama.com/blog/embedding-models\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/118/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/117",
      "id": 3430118925,
      "node_id": "I_kwDOO3Bfkc7Mc3YN",
      "number": 117,
      "title": "[Question]: Rag-Anything supports to AWS Bedrock ?",
      "user": {
        "login": "Ckakkireni",
        "id": 24586850,
        "node_id": "MDQ6VXNlcjI0NTg2ODUw",
        "avatar_url": "https://avatars.githubusercontent.com/u/24586850?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Ckakkireni",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-09-18T12:20:07Z",
      "updated_at": "2025-12-22T10:13:44Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHello, Can Anyone confirm is Rag-Anything supports to AWS Bedrock ?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/117/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/112",
      "id": 3405175038,
      "node_id": "I_kwDOO3Bfkc7K9tj-",
      "number": 112,
      "title": "[Bug]:验证mineru是否配置正确时报错",
      "user": {
        "login": "c200312",
        "id": 147580005,
        "node_id": "U_kgDOCMvkZQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/147580005?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/c200312",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-09-11T07:20:01Z",
      "updated_at": "2025-09-12T07:35:45Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\n1. **按照快速开始中的代码运行报错**\n\n(venv) PS D:\\project\\RAG-Anything> python -c \"from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_mineru_installation() else '❌ MinerU installation issue')\"\nINFO: Created working directory: ./rag_storage\nINFO: RAGAnything initialized with config:\nINFO:   Working directory: ./rag_storage\nINFO:   Parser: mineru\nINFO:   Parse method: auto\nINFO:   Multimodal processing - Image: True, Table: True, Equation: True\nINFO:   Max concurrent files: 1\nTraceback (most recent call last):\n  File \"<string>\", line 1, in <module>\nAttributeError: 'RAGAnything' object has no attribute 'check_mineru_installation'. Did you mean: 'check_parser_installation'?\nWarning: Failed to finalize RAGAnything storages: import of asyncio halted; None in sys.modules\n\n2. **将rag.check_mineru_installation()改为rag.check_parser_installation()**\n\n (venv) PS D:\\project\\RAG-Anything> python -c \"from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU安装正常' if rag.check_parser_installation() else '❌ MinerU安装有问题')\"\n\nINFO: RAGAnything initialized with config:\nINFO:   Working directory: ./rag_storage\nINFO:   Parser: mineru\nINFO:   Parse method: auto\nINFO:   Multimodal processing - Image: True, Table: True, Equation: True\nINFO:   Max concurrent files: 1\n✅ MinerU安装正常\nWarning: Failed to finalize RAGAnything storages: import of asyncio halted; None in sys.modules\n\n### Steps to reproduce\n\n(venv) PS D:\\project\\RAG-Anything> python -c \"from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_mineru_installation() else '❌ MinerU installation issue')\"\nINFO: Created working directory: ./rag_storage\nINFO: RAGAnything initialized with config:\nINFO:   Working directory: ./rag_storage\nINFO:   Parser: mineru\nINFO:   Parse method: auto\nINFO:   Multimodal processing - Image: True, Table: True, Equation: True\nINFO:   Max concurrent files: 1\nTraceback (most recent call last):\n  File \"<string>\", line 1, in <module>\nAttributeError: 'RAGAnything' object has no attribute 'check_mineru_installation'. Did you mean: 'check_parser_installation'?\nWarning: Failed to finalize RAGAnything storages: import of asyncio halted; None in sys.modules\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/112/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/111",
      "id": 3404518369,
      "node_id": "I_kwDOO3Bfkc7K7NPh",
      "number": 111,
      "title": "[Question]:How RAG-Anything supports rerank？",
      "user": {
        "login": "Barry-Zhou",
        "id": 24220199,
        "node_id": "MDQ6VXNlcjI0MjIwMTk5",
        "avatar_url": "https://avatars.githubusercontent.com/u/24220199?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Barry-Zhou",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-09-11T02:42:16Z",
      "updated_at": "2025-10-07T17:35:47Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n_No response_\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/111/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/110",
      "id": 3402548866,
      "node_id": "I_kwDOO3Bfkc7KzsaC",
      "number": 110,
      "title": "[Question]: Gemini Embedding Compatibility when `embedding_dim` != 3072",
      "user": {
        "login": "MillerQuintero2001",
        "id": 124924404,
        "node_id": "U_kgDOB3Ix9A",
        "avatar_url": "https://avatars.githubusercontent.com/u/124924404?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/MillerQuintero2001",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-09-10T13:42:04Z",
      "updated_at": "2025-09-27T13:47:09Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi, I'm using a Gemini API key with the `End-to-End Document Processing` script, since OpenAI API is a standard and Gemini API is compatible by setting the `base_url` parameter. So, RagAnything is working good with this, but there is a problem when I change de Embedding Output Dimension from 3072 to any other (for example 768 or 1536). the parameter `embedding_dim` is the only thing that I change when I want to use a smaller output embedding dimension (I don't know if is necessary more than this, I also try using enviroment variable but doesn't change anything)\n```python\n    embedding_func = EmbeddingFunc(\n        embedding_dim=1536, # This is what I change, it only works with 3072 \n        max_token_size=2048,\n        func=lambda texts: openai_embed(\n            texts,\n            model=\"gemini-embedding-001\",\n            api_key=api_key,\n            base_url=base_url,\n        ),\n    )\n```\nHere is the last part of the traceback output that I got (the key error is at the end, related with the `np.dot`  that says dimensions doesn't match):\n\n```bash\nTraceback (most recent call last):\n...\n\n  File \"/home/test-rag-anything/.venv/lib/python3.12/site-packages/nano_vectordb/dbs.py\", line 155, in query\n    return self.usable_metrics[self.metric](\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/home/test-rag-anything/.venv/lib/python3.12/site-packages/nano_vectordb/dbs.py\", line 179, in _cosine_query\n    scores = np.dot(use_matrix, query)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^\nValueError: shapes (0,1536) and (3072,) not aligned: 1536 (dim 1) != 3072 (dim 0)\nWarning: Failed to finalize RAGAnything storages: sys.meta_path is None, Python is likely shutting down\n```\nMy question, is there something that I'm forgetting or missing, that can help me to make this work with Gemini Embedding? or if is it something depper inside `LightRAG` library?\n\n### Additional Context\n\nHere is the page of Gemini Embedding information: [Embedding Size](https://ai.google.dev/gemini-api/docs/embeddings#control-embedding-size).\nThey said something like `By default, it outputs a 3072-dimensional embedding, but you can truncate it to a smaller size without losing quality to save storage space`. So, the word \"truncate\" make me think that Gemini Embedding (*gemini-embedding-001*) is a embedding with only 3072 output dimension, and there is no exist a real embedding of variable output dimension length.\nMy hypothesis is that the `lightrag` library expects the embedding to be returned with the specified `embedding_dim` directly from the API call, but the Gemini API is always returning a 3072-dimensional vector that needs to be truncated by the calling code.\n\nraganything -> Version: 1.2.7\nlightrag -> Version: 1.4.7\nPython3 -> Version: 3.12.3",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/110/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/109",
      "id": 3402457308,
      "node_id": "I_kwDOO3Bfkc7KzWDc",
      "number": 109,
      "title": "[Bug]: LightRAG Server + RAG-Anything: Multimodal Processing Error 500",
      "user": {
        "login": "sunshineinsandiego",
        "id": 56661595,
        "node_id": "MDQ6VXNlcjU2NjYxNTk1",
        "avatar_url": "https://avatars.githubusercontent.com/u/56661595?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/sunshineinsandiego",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 4,
      "created_at": "2025-09-10T13:16:28Z",
      "updated_at": "2026-02-06T10:25:55Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nHi - Trying to run LightRag Server as per directions from [here](https://github.com/HKUDS/LightRAG/blob/main/lightrag/api/README.md#lightrag-server-and-webui) on top of a working RAG-Anything database. When I start up the server, I receive an error about an unexpected keyword \"multimodal processed\".\n\nIt seems like the LightRag Server API has not been updated to reflect multimodal processing? Is there a recommended way to include multimodal processing with the Server API?\n\nThanks\n\n```\nLightRAG Server v1.4.7/0209\n\nERROR: Error getting paginated documents: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ERROR: Traceback (most recent call last): File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/api/routers/document_routes.py\", line 2364, in get_documents_paginated (documents_with_ids, total_count), status_counts = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 249, in get_docs_paginated doc_status = DocProcessingStatus(**data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' INFO: 127.0.0.1:51270 - \"POST /documents/paginated HTTP/1.1\" 500 INFO: [_] Subgraph query successful | Node count: 361 | Edge count: 173 ERROR: Error getting paginated documents: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ERROR: Traceback (most recent call last): File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/api/routers/document_routes.py\", line 2364, in get_documents_paginated (documents_with_ids, total_count), status_counts = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 249, in get_docs_paginated doc_status = DocProcessingStatus(**data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' INFO: 127.0.0.1:42354 - \"POST /documents/paginated HTTP/1.1\" 500 ERROR: Error getting paginated documents: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ERROR: Traceback (most recent call last): File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/api/routers/document_routes.py\", line 2364, in get_documents_paginated (documents_with_ids, total_count), status_counts = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 249, in get_docs_paginated doc_status = DocProcessingStatus(**data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' INFO: 127.0.0.1:45094 - \"POST /documents/paginated HTTP/1.1\" 500 ERROR: Error getting paginated documents: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ERROR: Traceback (most recent call last): File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/api/routers/document_routes.py\", line 2364, in get_documents_paginated (documents_with_ids, total_count), status_counts = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 249, in get_docs_paginated doc_status = DocProcessingStatus(**data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' INFO: 127.0.0.1:59446 - \"POST /documents/paginated HTTP/1.1\" 500 ERROR: Error getting paginated documents: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ERROR: Traceback (most recent call last): File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/api/routers/document_routes.py\", line 2364, in get_documents_paginated (documents_with_ids, total_count), status_counts = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File \"/home/user/miniconda3/envs/raganything/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 249, in get_docs_paginated doc_status = DocProcessingStatus(**data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' INFO: 127.0.0.1:35108 - \"POST /documents/paginated HTTP/1.1\" 500\n``` \n\n### Steps to reproduce\n\n$ python raganything_example.py\n$ lightrag-server\n\n### Expected Behavior\n\nNo error processing\n\n### LightRAG Config Used\n\nDefault .env\n\n### Logs and screenshots\n\nSee above\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/109/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/108",
      "id": 3402345229,
      "node_id": "I_kwDOO3Bfkc7Ky6sN",
      "number": 108,
      "title": "[Question]:Local Ollama models always timeout and sometimes return null via LiteLLM-Proxy, while direct API calls work",
      "user": {
        "login": "mikumiiku",
        "id": 91369930,
        "node_id": "MDQ6VXNlcjkxMzY5OTMw",
        "avatar_url": "https://avatars.githubusercontent.com/u/91369930?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/mikumiiku",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-10T12:46:27Z",
      "updated_at": "2025-09-10T12:46:27Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI'm having persistent timeout issues when trying to use local Ollama models through LiteLLM-Proxy in my RAG-Anything setup. The API endpoints work fine, but local model calls always timeout despite various attempts to fix it.\n\n### Additional Context\n\n### Environment Setup:​​\n- Ollama Service​​: Running locally on port 11434\n- ​​LiteLLM-Proxy​​: Running on port 9000, configured with base_url: http://ollama:11434/v1\n- RAG-Anything​​: Running on port 8801, using API base: http://litellm-proxy:9000/v1\n- Model​​: Testing with Qwen3-8B-Q6_K and smaller 4B models\n- ​​Hardware​​: Ubuntu 22.04, RTX 4090 (48GB modified)\n- litellm config:\n```yaml\n- model_name: Qwen3-8B-Q6_K\n    litellm_params:\n      model: openai/Qwen3-8B-Q6_K\n      api_base: http://ollama:11434/v1\n      api_key: dummy\n      enable_thinking: false\n      timeout: 600\n```\n### What I've Already Tried:​​\n- Increased timeout settings significantly\n- Limited max_workers to reduce load\n- Switched to smaller 4B parameter models\n- Verified API endpoints work independently\n- Confirmed Ollama service responds to direct calls\n### Some of the console logs:\n- app              |Received empty content from OpenAI API\n- app              |WARNING: limit_async: Worker timeout for task 140201160394224_2571.427 after 210s\n- app              |TimeoutError: [LLM func] limit_async: Worker execution timeout after 210s\n- ollama         | [GIN] 2025/09/10 - 12:32:42 | 500 |         10m0s |      172.18.0.2 | POST     \"/v1/chat/completions\"\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/108/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/107",
      "id": 3401703662,
      "node_id": "I_kwDOO3Bfkc7KweDu",
      "number": 107,
      "title": "[Bug]: mineru extras require conflicting versions of PyTorch",
      "user": {
        "login": "AbdelkarimAZZAZ",
        "id": 72475587,
        "node_id": "MDQ6VXNlcjcyNDc1NTg3",
        "avatar_url": "https://avatars.githubusercontent.com/u/72475587?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/AbdelkarimAZZAZ",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-09-10T09:32:37Z",
      "updated_at": "2025-09-29T13:13:19Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nERROR: Cannot install mineru, mineru[pipeline]==2.0.0 and mineru[vlm]==2.0.0 because these package versions have conflicting dependencies. The conflict is caused by: mineru[pipeline] 2.0.0 depends on torch!=2.5.0, !=2.5.1, <3 and >=2.2.2; extra == \"pipeline\" doclayout-yolo 0.0.4 depends on torch>=2.0.1 mineru[vlm] 2.0.0 depends on torch>=2.6.0; extra == \"vlm\" To fix this you could try to: \n\n1. loosen the range of package versions you've specified\n\n 2. remove package versions to allow pip to attempt to solve the dependency conflict\n\n### Steps to reproduce\n\npip install raganything\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System: MacOs\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/107/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/105",
      "id": 3389136275,
      "node_id": "I_kwDOO3Bfkc7KAh2T",
      "number": 105,
      "title": "[Question]: GUI/UI for interacting with RAG-Anything",
      "user": {
        "login": "sunshineinsandiego",
        "id": 56661595,
        "node_id": "MDQ6VXNlcjU2NjYxNTk1",
        "avatar_url": "https://avatars.githubusercontent.com/u/56661595?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/sunshineinsandiego",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-09-06T01:07:28Z",
      "updated_at": "2025-10-30T22:41:15Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi - I saw [this post ](https://github.com/HKUDS/RAG-Anything/issues/19) re: LightRAG's compatibility with WebUI. However, if I look at the LightRAG WebUI info [here](https://github.com/HKUDS/LightRAG/blob/main/lightrag/api/README.md#llm-and-embedding-backend-supported), this doesn't support running LightRAG+WebUI with local LLMS which I am doing with my RAG-Anything set up. Is there any UI integration with Rag-Anything that supports local, open source LLMs such as HuggingFace, etc?  Thanks!\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/105/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/104",
      "id": 3388721810,
      "node_id": "I_kwDOO3Bfkc7J-8qS",
      "number": 104,
      "title": "[Question]: Enforcing / Including Hierarchical Database Knowledge in RAG-Anything",
      "user": {
        "login": "sunshineinsandiego",
        "id": 56661595,
        "node_id": "MDQ6VXNlcjU2NjYxNTk1",
        "avatar_url": "https://avatars.githubusercontent.com/u/56661595?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/sunshineinsandiego",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-05T20:39:11Z",
      "updated_at": "2025-09-05T20:39:11Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi - I have a collection of complex pdfs (images, tables, text, graphics), that I'd like to LightRag. I have these pdfs organized into domain groups, and for each domain there are a number of subdomains. What is the best approach to including this type of hierarchical knowledge info (not document structure), into my LightRag database? If I include this information in metadata, for example, then it won't be embedded in the vector store, so that doesn't seem super helpful. Ideally, when a query is given, the query should pull information from the pdfs in the relevant domain / subdomain. Any best practices for enforcing this type of structure with LightRag? Thanks!\n\nP.S. I asked this question in the LightRAG [issue board](https://github.com/HKUDS/LightRAG/issues/2071) as well, and wanted to see if RAG-Anything might be a more appropriate solution.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/104/reactions",
        "total_count": 2,
        "+1": 2,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/102",
      "id": 3382962916,
      "node_id": "I_kwDOO3Bfkc7Jo-rk",
      "number": 102,
      "title": "[Question]:",
      "user": {
        "login": "ArdenZediker",
        "id": 141619404,
        "node_id": "U_kgDOCHDwzA",
        "avatar_url": "https://avatars.githubusercontent.com/u/141619404?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ArdenZediker",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-04T09:45:34Z",
      "updated_at": "2025-09-04T09:45:34Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nWhen RAG processing, if the MinerU model is specified as local, how should the model path be specified or is there a default path\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/102/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/100",
      "id": 3381793229,
      "node_id": "I_kwDOO3Bfkc7JkhHN",
      "number": 100,
      "title": "[Question]:目前是否支持长表格的导入",
      "user": {
        "login": "longsuyu",
        "id": 82940906,
        "node_id": "MDQ6VXNlcjgyOTQwOTA2",
        "avatar_url": "https://avatars.githubusercontent.com/u/82940906?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/longsuyu",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-09-04T01:21:50Z",
      "updated_at": "2025-09-04T01:21:50Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n之前是将一个表格作为一个文本块导入，但是如果表格过长效果不是很好比如如果一个表格在文档中有七八页那么后面查询的时候效果不是很好，目前有其它合适的方案吗？\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/100/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/98",
      "id": 3378670702,
      "node_id": "I_kwDOO3Bfkc7JYmxu",
      "number": 98,
      "title": "[Bug]: TypeError: openai.resources.chat.completions.completions.AsyncCompletions.create() got multiple values for keyword argument 'messages'",
      "user": {
        "login": "Nanye2362",
        "id": 25169656,
        "node_id": "MDQ6VXNlcjI1MTY5NjU2",
        "avatar_url": "https://avatars.githubusercontent.com/u/25169656?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Nanye2362",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-09-03T07:28:21Z",
      "updated_at": "2025-10-15T18:52:50Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nWhen I adapted “/example/raganything_example.py” for Azure OpenAI, I discovered that the corresponding azure_openai_complete_if_cache method in openai_complete_if_cache might be missing the following line:\n\nmessages = kwargs.pop(\"messages\", messages)\n\nWhen running the line\n\nif \"response_format\" in kwargs:\n    response = await openai_async_client.beta.chat.completions.parse(\n        model=model, messages=messages, **kwargs\n    )\nelse:\n    response = await openai_async_client.chat.completions.create(\n        model=model, messages=messages, **kwargs\n    )\n\n, an exception was thrown:\nTypeError: openai.resources.chat.completions.completions.AsyncCompletions.create() got multiple values ​​for keyword argument 'messages'.\n\n\nI tried inserting the following statement above this code block:\n\nmessages = kwargs.pop(\"messages\", messages)\n\nAfter that, it ran successfully.\n\n### Steps to reproduce\n\nJust run the command:\n\npython ./examples/raganything_example_with_azure_openai.py ./docs/input_test/test.pdf\n\n[raganything_example_with_azure_openai.py](https://github.com/user-attachments/files/22111866/raganything_example_with_azure_openai.py)\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n2025-09-03 15:02:06,897 - lightrag - ERROR - VLM call failed: openai.resources.chat.completions.completions.AsyncCompletions.create() got multiple values for keyword argument 'messages'\n2025-09-03 15:02:06,898 - lightrag - ERROR - Error processing with RAG: openai.resources.chat.completions.completions.AsyncCompletions.create() got multiple values for keyword argument 'messages'\n2025-09-03 15:02:06,900 - lightrag - ERROR - Traceback (most recent call last):\n  File \"/Users/xxx/Desktop/RagAnything/./examples/raganything_example_with_azure_openai.py\", line 237, in process_with_rag\n    result = await rag.aquery(query, mode=\"hybrid\")\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/raganything/query.py\", line 136, in aquery\n    return await self.aquery_vlm_enhanced(query, mode=mode, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/raganything/query.py\", line 347, in aquery_vlm_enhanced\n    result = await self._call_vlm_with_multimodal_content(messages)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/raganything/query.py\", line 692, in _call_vlm_with_multimodal_content\n    result = await self.vision_model_func(\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/asyncio/__init__.py\", line 189, in async_wrapped\n    return await copy(fn, *args, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/asyncio/__init__.py\", line 111, in __call__\n    do = await self.iter(retry_state=retry_state)\n         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/asyncio/__init__.py\", line 153, in iter\n    result = await action(retry_state)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/_utils.py\", line 99, in inner\n    return call(*args, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/__init__.py\", line 400, in <lambda>\n    self._add_action_func(lambda rs: rs.outcome.result())\n                                     ^^^^^^^^^^^^^^^^^^^\n  File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py\", line 449, in result\n    return self.__get_result()\n           ^^^^^^^^^^^^^^^^^^^\n  File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py\", line 401, in __get_result\n    raise self._exception\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/tenacity/asyncio/__init__.py\", line 114, in __call__\n    result = await fn(*args, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/xxx/Desktop/RagAnything/venvRagAnything/lib/python3.11/site-packages/lightrag/llm/azure_openai.py\", line 86, in azure_openai_complete_if_cache\n    response = await openai_async_client.chat.completions.create(\n                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: openai.resources.chat.completions.completions.AsyncCompletions.create() got multiple values for keyword argument 'messages'\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/98/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/95",
      "id": 3358855930,
      "node_id": "I_kwDOO3Bfkc7INBL6",
      "number": 95,
      "title": "[Bug]:UnboundLocalError: : cannot access local variable 'first_stage_tasks'",
      "user": {
        "login": "Kumneger49",
        "id": 132818323,
        "node_id": "U_kgDOB-qlkw",
        "avatar_url": "https://avatars.githubusercontent.com/u/132818323?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Kumneger49",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-27T10:28:57Z",
      "updated_at": "2025-08-27T10:29:39Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [ ] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nAnyone getting: \"UnboundLocalError: cannot access local variable 'first_stage_tasks' where it is not associated with a value\" when trying to change models from openai's model to local ones using ollama??\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/95/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/94",
      "id": 3358729959,
      "node_id": "I_kwDOO3Bfkc7IMibn",
      "number": 94,
      "title": "[Question]:How to use Rag anything using Azure Open AI keys",
      "user": {
        "login": "Chahatt01",
        "id": 95703502,
        "node_id": "U_kgDOBbRRzg",
        "avatar_url": "https://avatars.githubusercontent.com/u/95703502?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Chahatt01",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-27T09:43:19Z",
      "updated_at": "2025-08-27T16:44:41Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nIn the usage examples I found the code, but there openai keys are used. I want to use azure openai key.Need help with this.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/94/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/93",
      "id": 3357463995,
      "node_id": "I_kwDOO3Bfkc7IHtW7",
      "number": 93,
      "title": "[Question]:How to access and persist the knowledge graph generated by RAG-Anything",
      "user": {
        "login": "Sumit68",
        "id": 65176247,
        "node_id": "MDQ6VXNlcjY1MTc2MjQ3",
        "avatar_url": "https://avatars.githubusercontent.com/u/65176247?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Sumit68",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-26T23:46:04Z",
      "updated_at": "2025-08-26T23:46:55Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI am using RAG-Anything with custom embedding and LLM functions. I can successfully insert document chunks using insert_content_list, and I can see the cache file kv_store_llm_response_cache.json being created. However, I am unsure how to access the underlying knowledge graph built by RAG-Anything so I can store it in my own database.\n\n**Problem / Question:**\n\n1. Queries in hybrid mode sometimes return [no-context] even after inserting content.\n2. I want to access the actual knowledge graph that RAG-Anything generates internally (document chunks, embeddings, relationships between chunks, etc.) to persist it in a database.\n3. It is not clear which attributes or methods expose the knowledge graph or how to traverse stored nodes and edges.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/93/reactions",
        "total_count": 3,
        "+1": 3,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/91",
      "id": 3351999214,
      "node_id": "I_kwDOO3Bfkc7Hy3Lu",
      "number": 91,
      "title": "[Bug]:DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'",
      "user": {
        "login": "frngo001",
        "id": 136967951,
        "node_id": "U_kgDOCCn3Dw",
        "avatar_url": "https://avatars.githubusercontent.com/u/136967951?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/frngo001",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 7,
      "created_at": "2025-08-25T14:04:36Z",
      "updated_at": "2025-10-07T02:45:18Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nI’m encountering the following error when trying to process a document:\n\n\n`DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'\n`\n\nThe document is a Markdown file containing links but no images. \nHas anyone run into this issue before, or found a fix/workaround?\n\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/91/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/90",
      "id": 3350636612,
      "node_id": "I_kwDOO3Bfkc7HtqhE",
      "number": 90,
      "title": "mineru解析出来的图片如何与实体产生关联",
      "user": {
        "login": "linxianwang",
        "id": 29862266,
        "node_id": "MDQ6VXNlcjI5ODYyMjY2",
        "avatar_url": "https://avatars.githubusercontent.com/u/29862266?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/linxianwang",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-25T06:38:10Z",
      "updated_at": "2025-08-25T06:38:28Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nmineru解析出来的图片如何与实体产生关联,这在代码中是如何实现的\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/90/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/89",
      "id": 3348251059,
      "node_id": "I_kwDOO3Bfkc7HkkGz",
      "number": 89,
      "title": "[Bug]:mineru2.0: img_caption -> image_caption",
      "user": {
        "login": "Justin-12138",
        "id": 138972905,
        "node_id": "U_kgDOCEiO6Q",
        "avatar_url": "https://avatars.githubusercontent.com/u/138972905?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Justin-12138",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-23T13:27:47Z",
      "updated_at": "2025-08-23T13:27:47Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nBug Description\nWhen using mineru2.0, the output field for image captions is \"image_caption\", but in raganything the code still expects \"img_caption\". This mismatch causes issues when integrating mineru’s results into raganything.\n\nSteps to Reproduce\n\nRun mineru2.0 to extract image captions.\n\nCheck the JSON output → field name is \"image_caption\".\n\nPass this result into raganything.\n\nRaganything tries to read \"img_caption\" instead, leading to missing captions.\n\nExpected Behavior\nRaganything should align with mineru2.0 and use \"image_caption\" as the field name.\n\nActual Behavior\nRaganything still uses \"img_caption\", so captions are not recognized.\n\nSuggested Fix\nUpdate raganything to use \"image_caption\" (or provide backward compatibility for both \"image_caption\" and \"img_caption\").\n\n<img width=\"2880\" height=\"475\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/f58200f3-7fae-4b89-8379-5a08bdc8270d\" />\n\n<img width=\"1588\" height=\"382\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/9d725714-4f68-442d-8cfd-c6c0e4c05709\" />\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/89/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/87",
      "id": 3329705415,
      "node_id": "I_kwDOO3Bfkc7Gd0XH",
      "number": 87,
      "title": "[Feature Request]: Provide API to interact with RAG-Anything and docker deployment",
      "user": {
        "login": "AlvaroRojas",
        "id": 3367963,
        "node_id": "MDQ6VXNlcjMzNjc5NjM=",
        "avatar_url": "https://avatars.githubusercontent.com/u/3367963?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/AlvaroRojas",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-18T08:22:58Z",
      "updated_at": "2025-08-18T08:22:58Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nHello,\n\nAs the title says, provide api endpoints to perform operations in RAG-Anything and also docker deployment.\n\nThanks.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/87/reactions",
        "total_count": 6,
        "+1": 6,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/85",
      "id": 3324847690,
      "node_id": "I_kwDOO3Bfkc7GLSZK",
      "number": 85,
      "title": "[Feature Request]: promt \"Use {language} as output language\" support",
      "user": {
        "login": "Sergey-Baranenkov",
        "id": 50075840,
        "node_id": "MDQ6VXNlcjUwMDc1ODQw",
        "avatar_url": "https://avatars.githubusercontent.com/u/50075840?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Sergey-Baranenkov",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-15T09:40:49Z",
      "updated_at": "2025-08-15T09:40:49Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nWhile we can change the default text processing language, we should also be able to set the language for processing images, tables, and equations via an environment variable.\n\n\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/85/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/84",
      "id": 3324076149,
      "node_id": "I_kwDOO3Bfkc7GIWB1",
      "number": 84,
      "title": "[Question]: neo4j usage",
      "user": {
        "login": "debraj135",
        "id": 16231057,
        "node_id": "MDQ6VXNlcjE2MjMxMDU3",
        "avatar_url": "https://avatars.githubusercontent.com/u/16231057?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/debraj135",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-15T01:44:30Z",
      "updated_at": "2025-08-26T19:46:07Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nCan you please point me to or share an example showing how to use this with neo4j graph db?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/84/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/83",
      "id": 3319886422,
      "node_id": "I_kwDOO3Bfkc7F4XJW",
      "number": 83,
      "title": "[Question]: Getting 8192 token limit error",
      "user": {
        "login": "deonblaauw",
        "id": 6392285,
        "node_id": "MDQ6VXNlcjYzOTIyODU=",
        "avatar_url": "https://avatars.githubusercontent.com/u/6392285?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/deonblaauw",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-13T21:00:32Z",
      "updated_at": "2025-08-13T21:02:03Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nWhen processing the attached file, I'm getting the errors below. I've tried many different things to resolve, but struggling to get to the bottom of it. This is not the call to the embeddings model, this is a call to openai's gpt-4o model (I also tried gpt-4o-mini, same result). I used to get the below error message for many more tables in my documents, then I stripped away the superflous table data coming in from docling like bounding boxes etc, just feeding the table in markdown format in the prompt helped quite a bit since it drastically reduced token count, but unfortunately doesn't address the root cause.\n\n```\nFocus on extracting meaningful insights and relationships from the tabular data in the context of the surrounding content.\nINFO: Generated descriptions for 11/11 multimodal items using correct processors\nERROR: limit_async: Error in decorated function: Error code: 400 - {'error': {'message': \"This model's maximum context length is 8192 tokens, however you requested 35759 tokens (35759 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\", 'type': 'invalid_request_error', 'param': None, 'code': None}}\nERROR: Error storing chunks to storage: Error code: 400 - {'error': {'message': \"This model's maximum context length is 8192 tokens, however you requested 35759 tokens (35759 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\", 'type': 'invalid_request_error', 'param': None, 'code': None}}\nERROR: Error in multimodal processing: Error code: 400 - {'error': {'message': \"This model's maximum context length is 8192 tokens, however you requested 35759 tokens (35759 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\", 'type': 'invalid_request_error', 'param': None, 'code': None}}\n```\nThis is what I've done locally to try and reduce token count, but I'm afraid it's only a patch since I've clearly not found the root cause yet\n\n1. The Problem: Excessive Metadata in Table Data\n\nThe root cause of the token limit errors was identified in the output from the docling parser. For each table, the parser was generating a highly detailed JSON structure that included precise bounding box (bbox) coordinates for every single cell. This metadata, while useful for rendering, is unnecessary for LLM analysis and was inflating the size of our prompts by over 95%, causing them to exceed API token limits.\n\n2. The Solution: Pre-processing and Data Cleaning\n\nInstead of trying to increase token limits, we implemented a data cleaning step to strip out the unnecessary metadata before the content is sent to the language model.\n\nA new private method, _clean_table_data, was added to the TableModalProcessor class located in [raganything/modalprocessors.py](vscode-webview://08fivft3m77f0qu3g1l3ts9n759jo9mvjnvjte3ufrklibjulc71/raganything/modalprocessors.py).\n\n3. Implementation Details\n\nThe _clean_table_data method performs the following actions:\n\n- It takes the raw JSON-like table_body as input.\n- It iterates through the list of table_cells.\n- For each cell, it extracts only the text content, completely ignoring the bbox coordinates and other metadata.\n- It then reconstructs the table into a clean, lightweight, and LLM-friendly markdown format.\n- This cleaning function is now called at the beginning of both the generate_description_only and process_multimodal_content methods within the TableModalProcessor, ensuring that any table data is sanitized before being used in a prompt.\n\n4. The Impact: 95-99% Token Reduction\n\nThis change has had a dramatic and positive impact:\n\n- Token Efficiency: It reduced the character count for table data by 95-99% (e.g., from 93,647 to 1,620 characters for one table).\n- Error Resolution: It completely resolved the token limit errors for individual table processing.\n- Cost Savings: By sending significantly less data to the API, this will lead to substantial cost savings.\n- Robustness: The system is now more robust and can handle documents with very large and complex tables without issue.\n\nAnd yet, even though that all sounds grand, I'm still getting the 8192 token limit and can't figure out where to fix it! I've tried setting max_context_tokens directly in the constructor, but to no avail\n\n[Mosier et al 2019.pdf](https://github.com/user-attachments/files/21762113/Mosier.et.al.2019.pdf)\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/83/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/82",
      "id": 3317433627,
      "node_id": "I_kwDOO3Bfkc7FvAUb",
      "number": 82,
      "title": "[Question]:API_KEY",
      "user": {
        "login": "YuhengRR",
        "id": 131944984,
        "node_id": "U_kgDOB91SGA",
        "avatar_url": "https://avatars.githubusercontent.com/u/131944984?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/YuhengRR",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-13T08:29:46Z",
      "updated_at": "2025-08-13T08:29:46Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI would like to know if it is inconvenient to use this framework without an API_KEY.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/82/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/80",
      "id": 3312216855,
      "node_id": "I_kwDOO3Bfkc7FbGsX",
      "number": 80,
      "title": "[Question]: RAG-Anything和LightRAG集成",
      "user": {
        "login": "ghost",
        "id": 10137,
        "node_id": "MDQ6VXNlcjEwMTM3",
        "avatar_url": "https://avatars.githubusercontent.com/u/10137?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ghost",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-12T01:52:26Z",
      "updated_at": "2025-08-12T02:24:57Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nRAG-Anything和LightRAG如何集成呢，当前部署了一个LightRAG实例，如果用RAG-Anything去解析文档，如何把解析的结果放入LightRAG的数据库中，使用LightRAG的接口去检索，需要做哪些工作\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/80/reactions",
        "total_count": 1,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 1
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/79",
      "id": 3310795767,
      "node_id": "I_kwDOO3Bfkc7FVrv3",
      "number": 79,
      "title": "[Feature Request]:希望增加的功能",
      "user": {
        "login": "GarfieldHuang",
        "id": 761271,
        "node_id": "MDQ6VXNlcjc2MTI3MQ==",
        "avatar_url": "https://avatars.githubusercontent.com/u/761271?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/GarfieldHuang",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-11T16:32:56Z",
      "updated_at": "2026-01-12T07:13:38Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [ ] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\n1.輸出時請增加繁體中文的選項\n2.似乎還不支援pydantic結構化輸出\n3.minerU出來json有page_idx，但是使用的時候沒有將這個參數丟進LLM，導致回覆的頁碼都不正確，且想要前面的pydantic輸出準確回應頁碼\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/79/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/77",
      "id": 3306035703,
      "node_id": "I_kwDOO3Bfkc7FDhn3",
      "number": 77,
      "title": "[Feature Request]:dockerized version",
      "user": {
        "login": "slotfi909",
        "id": 82094903,
        "node_id": "MDQ6VXNlcjgyMDk0OTAz",
        "avatar_url": "https://avatars.githubusercontent.com/u/82094903?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/slotfi909",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-09T07:19:09Z",
      "updated_at": "2025-09-04T09:27:09Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nit would be much appreciated if there were a dockerized version of this Project just like LightRAG project. thank you for your amazing work.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/77/reactions",
        "total_count": 3,
        "+1": 3,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/75",
      "id": 3300743930,
      "node_id": "I_kwDOO3Bfkc7EvVr6",
      "number": 75,
      "title": "[Feature Request]: save layout information to text chunk storage by customizing lightrag chunker",
      "user": {
        "login": "tongda",
        "id": 653425,
        "node_id": "MDQ6VXNlcjY1MzQyNQ==",
        "avatar_url": "https://avatars.githubusercontent.com/u/653425?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/tongda",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-07T14:26:06Z",
      "updated_at": "2025-08-07T14:26:31Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\n## 当前实现\n\n当文档解析成content_list后，把文本合并成text_content，通过lightrag.ainsert插入到lightrag库中。\n\n这样实现的问题：原本parser有每个block的信息，比如页码，位置框等，但这样合并之后就无法对对应回去了。\n\n## 期望实现\n\n将结构化的parser结果传给lightrag.ainsert，通过定制chunker实现对parser结果的结构化分块，这样做的好处：\n\n* 可以在chunker中获取到解析的结构化block信息，保存在chunk中可以方便对搜索结果进行溯源\n* 通过block信息可以在chunker中实现复杂策略，比如层级化或者语义化chunking\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/75/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/74",
      "id": 3298588406,
      "node_id": "I_kwDOO3Bfkc7EnHb2",
      "number": 74,
      "title": "[Question]:运行起来太慢了是什么原因？",
      "user": {
        "login": "KiMomota",
        "id": 211089561,
        "node_id": "U_kgDODJT4mQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/211089561?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/KiMomota",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-08-07T01:38:21Z",
      "updated_at": "2025-12-03T02:15:18Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n输入一段Prompt和上传PDF文档，首先解析mineru，随后系统调用LLM和VL来解析，一段流程跑下来已经10分钟过去了\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/74/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/73",
      "id": 3285781428,
      "node_id": "I_kwDOO3Bfkc7D2Qu0",
      "number": 73,
      "title": "[Question]: TypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'",
      "user": {
        "login": "alameen1999",
        "id": 140149297,
        "node_id": "U_kgDOCFqCMQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/140149297?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/alameen1999",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 8,
      "created_at": "2025-08-02T09:32:04Z",
      "updated_at": "2025-08-12T09:24:15Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI'm encountering a TypeError when attempting to insert documents using RAGAnything.insert_content_list(). The error originates deep within the lightrag library, specifically when the system tries to load the document processing status. It seems the DocProcessingStatus data model is out of sync with the data being written to the status store, as it doesn't recognize the multimodal_processed field.\n\nThis issue occurs when I try to upload multiple file path.\n\n### Additional Context\n\nTo Reproduce\nSteps to reproduce the behavior:\n\nSet up a RAGAnything instance, connecting it to a LightRAG backend.\n\nUse a custom document parser (like the PyMuPDF example below) to create a content_list.\n\nCall the rag.insert_content_list() method to ingest the parsed content.\n\nThe pipeline fails with the traceback shown below.\n\n```\nTraceback (most recent call last):\n  File \"/home/alameenn/RAG-Anything/main.py\", line 76, in process_file\n    await rag.insert_content_list(\n  File \"/home/alameenn/RAG-Anything/raganything/processor.py\", line 1439, in insert_content_list\n    await insert_text_content(\n  File \"/home/alameenn/RAG-Anything/raganything/utils.py\", line 81, in insert_text_content\n    await lightrag.ainsert(\n  File \"/home/alameenn/RAG-Anything/.venv/lib/python3.11/site-packages/lightrag/lightrag.py\", line 730, in ainsert\n    await self.apipeline_process_enqueue_documents(\n  File \"/home/alameenn/RAG-Anything/.venv/lib/python3.11/site-packages/lightrag/lightrag.py\", line 1370, in apipeline_process_enqueue_documents\n    processing_docs, failed_docs, pending_docs = await asyncio.gather(\n                                                 ^^^^^^^^^^^^^^^^^^^^^\n  File \"/home/alameenn/RAG-Anything/.venv/lib/python3.11/site-packages/lightrag/kg/json_doc_status_impl.py\", line 108, in get_docs_by_status\n    result[k] = DocProcessingStatus(**data)\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'\n```\n\nsample code:\n\n```\nimport os\nimport asyncio\nimport fitz  # PyMuPDF\nfrom lightrag import LightRAG, RAGAnything, RAGAnythingConfig\nfrom lightrag.kg.shared_storage import initialize_pipeline_status\nfrom lightrag.llm.openai import openai_complete_if_cache, openai_embed\nfrom lightrag.utils import EmbeddingFunc, logger\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\ndef parse_document_with_pymupdf(file_path: str) -> list:\n    \"\"\"A sample parser that extracts text from a PDF.\"\"\"\n    content_list = []\n    try:\n        doc = fitz.open(file_path)\n        for page_num, page in enumerate(doc):\n            text = page.get_text()\n            if text.strip():\n                content_list.append({\n                    \"type\": \"text\",\n                    \"text\": text,\n                    \"page_idx\": page_num,\n                })\n        doc.close()\n        logger.info(f\"PyMuPDF parsing complete. Found {len(content_list)} content elements.\")\n        return content_list\n    except Exception as e:\n        logger.error(f\"Failed to parse PDF with PyMuPDF: {e}\")\n        return []\n\nasync def main():\n    \"\"\"Main function to run the example\"\"\"\n    api_key = os.getenv(\"OPENAI_API_KEY\") # Ensure this is set in your .env\n    if not api_key:\n        raise ValueError(\"OPENAI_API_KEY must be set in the environment.\")\n\n    # 1. Initialize LightRAG\n    lightrag_instance = LightRAG(\n        working_dir='./rag_storage',\n        llm_model_func=lambda prompt, **kwargs: openai_complete_if_cache(\"gpt-4o-mini\", prompt, api_key=api_key, **kwargs),\n        embedding_func=EmbeddingFunc(\n            embedding_dim=3072,\n            func=lambda texts: openai_embed(texts, model=\"text-embedding-3-large\", api_key=api_key),\n        )\n    )\n    await lightrag_instance.initialize_storages()\n    await initialize_pipeline_status()\n\n    # 2. Define a vision model function (required by RAGAnything)\n    def vision_model_func(prompt, image_data, **kwargs):\n        # Dummy function for this example\n        return \"Vision model not implemented for this test.\"\n\n    # 3. Configure and initialize RAGAnything\n    config = RAGAnythingConfig(enable_image_processing=True)\n    rag = RAGAnything(\n        config=config,\n        lightrag=lightrag_instance,\n        vision_model_func=vision_model_func,\n    )\n\n    # 4. Parse a document and attempt to insert it\n    file_path = \"/path/to/your/document.pdf\"  # <--- CHANGE THIS TO A VALID PDF PATH\n    if not os.path.exists(file_path):\n        logger.error(f\"File not found: {file_path}. Please create a dummy PDF or use a real one.\")\n        return\n        \n    content_list = parse_document_with_pymupdf(file_path)\n\n    logger.info(\"Inserting content list into RAGAnything...\")\n    await rag.insert_content_list(\n        content_list=content_list,\n        file_path=os.path.basename(file_path),\n        doc_id=\"demo-doc-001\",\n    )\n    logger.info(\"Content list insertion completed!\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/73/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/72",
      "id": 3285658458,
      "node_id": "I_kwDOO3Bfkc7D1yta",
      "number": 72,
      "title": "[Question]:案例代码中使用openai_complete_if_cache，但是输入了一个message，这个变量也完全不在输入参数表中",
      "user": {
        "login": "wycoal",
        "id": 44468372,
        "node_id": "MDQ6VXNlcjQ0NDY4Mzcy",
        "avatar_url": "https://avatars.githubusercontent.com/u/44468372?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/wycoal",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-08-02T06:26:41Z",
      "updated_at": "2025-08-02T06:26:41Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n我不理解vision_func的设置，案例代码中使用openai_complete_if_cache，但是输入了一个message，这个变量也完全不在输入参数表中，如下，奇怪\n\nasync def openai_complete_if_cache(\nmodel: str,\nprompt: str,\nsystem_prompt: str | None = None,\nhistory_messages: list[dict[str, Any]] | None = None,\nbase_url: str | None = None,\napi_key: str | None = None,\ntoken_tracker: Any | None = None,\n**kwargs: Any,\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/72/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/71",
      "id": 3284553044,
      "node_id": "I_kwDOO3Bfkc7Dxk1U",
      "number": 71,
      "title": "[Question]:is there a example for ollama, thanks alot.",
      "user": {
        "login": "wycoal",
        "id": 44468372,
        "node_id": "MDQ6VXNlcjQ0NDY4Mzcy",
        "avatar_url": "https://avatars.githubusercontent.com/u/44468372?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/wycoal",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-08-01T16:55:36Z",
      "updated_at": "2025-08-12T16:14:22Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nis there a example for ollama, thanks alot. \nespecally, how to define vision_func, \n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/71/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/70",
      "id": 3274051715,
      "node_id": "I_kwDOO3Bfkc7DJhCD",
      "number": 70,
      "title": "[Question]:Image is not analyzed properly",
      "user": {
        "login": "andytan0051",
        "id": 102530483,
        "node_id": "U_kgDOBhx9sw",
        "avatar_url": "https://avatars.githubusercontent.com/u/102530483?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/andytan0051",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 10,
      "created_at": "2025-07-29T15:20:59Z",
      "updated_at": "2025-08-25T17:04:53Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHello,\n\nI just started using RAGAnything recently because I wanted to try out the visual image analysis feature. I tried processing a pdf document consisting of images and everything worked fine without errors. However, when I check the image text chunks in the json file, the images were not analyzed at all. As an example, I get the following output for the following picture:\n\n<img width=\"633\" height=\"431\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/4d61c592-7db2-4f57-b3db-7475a280857d\" />\n\n\n  \"chunk-02b180aca3b7968fd536fe38050684bb\": {\n    \"content\": \"\\nImage Content Analysis:\\nImage Path: C:\\\\Users\\\\andy.tan\\\\LightRAGDemo\\\\output\\\\Bacherlorarbeit_Michal\\\\auto\\\\images\\\\b1a8494efeadac1f9c53930010d0b164395f6b8e8b252f0cb6b960a89fbfcb43.jpg\\nCaptions: None\\nFootnotes: None\\n\\n**Visual Analysis: {'type': 'image', 'img_path': 'C:\\\\\\\\Users\\\\\\\\andy.tan\\\\\\\\LightRAGDemo\\\\\\\\output\\\\\\\\Bacherlorarbeit_Michal\\\\\\\\auto\\\\\\\\images\\\\\\\\b1a8494efeadac1f9c53930010d0b164395f6b8e8b252f0cb6b960a89fbfcb43.jpg'**, 'image_caption': [], 'image_footnote': [], 'page_idx': 28}\",\n    \"tokens\": 192,\n    \"full_doc_id\": \"doc-f4f2c215ee23f8785bcf3a7096e21a85\",\n    \"chunk_order_index\": 33,\n    \"file_path\": \"Bacherlorarbeit_Michal.pdf\",\n    \"llm_cache_list\": [\n      \"default:extract:0c575849f26c75ded34feb8cf0aa467a\",\n      \"default:extract:b006d5bf2d8a5702c57156f3a57f87b2\"\n    ],\n    \"is_multimodal\": true,\n    \"modal_entity_name\": \"image_31f0cd5e8c826500c55cda6a159a7f1c\",\n    \"original_type\": \"image\",\n    \"page_idx\": 28,\n    \"create_time\": 1753728986,\n    \"update_time\": 1753729432,\n    \"_id\": \"chunk-02b180aca3b7968fd536fe38050684bb\"\n  }\n\n\nThis is the case for all the images text chunks. The generated visual analysis is somehow the path of the image. This makes me question if the problem is that the images were not even successfully sent over to Azure OpenAI API, but also I did not receive any errors running the code. The visual model function I use is down below in additional context. \n\nI would appreciate any support and guidance. Thank you!\n\n### Additional Context\n\n```\ndef vision_model_func(\n        prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs\n    ):\n        if image_data:\n            return azure_openai_complete_if_cache(\n                \"gpt-4o\",\n                \"\",\n                system_prompt=None,\n                history_messages=[],\n                messages=[\n                    {\"role\": \"system\", \"content\": system_prompt}\n                    if system_prompt\n                    else None,\n                    {\n                        \"role\": \"user\",\n                        \"content\": [\n                            {\"type\": \"text\", \"text\": prompt},\n                            {\n                                \"type\": \"image_url\",\n                                \"image_url\": {\n                                    \"url\": f\"data:image/jpeg;base64,{image_data}\"\n                                },\n                            },\n                        ],\n                    }\n                    if image_data\n                    else {\"role\": \"user\", \"content\": prompt},\n                ],\n                api_key=AZURE_OPENAI_API_KEY,\n                base_url=AZURE_OPENAI_ENDPOINT,\n                **kwargs,\n            )\n        else:\n            return llm_model_func(prompt, system_prompt, history_messages, **kwargs)\n```\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/70/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/69",
      "id": 3272616503,
      "node_id": "I_kwDOO3Bfkc7DECo3",
      "number": 69,
      "title": "[Question]: raganything_example.py 改写为ollama下的模型时出现的问题",
      "user": {
        "login": "zjhwtonywang",
        "id": 25818626,
        "node_id": "MDQ6VXNlcjI1ODE4NjI2",
        "avatar_url": "https://avatars.githubusercontent.com/u/25818626?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/zjhwtonywang",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 7,
      "created_at": "2025-07-29T08:15:34Z",
      "updated_at": "2025-10-23T05:22:23Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nlightrag-hk1.4.5 版本，raganything  1.2.5 ，我仿照LightARG里的example 下的raganything_example.py，但模型都使用本地OLLAMA 下的模型，改写如下：\n\nOLLAMA_HOST = \"http://localhost:11434\"\nOLLAMA_LLM_MODEL = \"qwen3:4b\"\nOLLAMA_EMBEDDING_MODEL = \"bge-m3:latest \"\nOLLAMA_VISION_MODEL = \"qwen2.5vl:latest\"\n\nDOC_FILE_PATH = r\"d:\\kswj\\21.pdf\"\nWORK_DIR = './wq_storage'\nOUT_FILE_PATH = \"./output\"\n\nasync def process_with_rag(\n        file_path: str,\n        output_dir: str,\n        working_dir: str = None,\n):\n    \"\"\"\n    Process document with RAGAnything\n\n    Args:\n        file_path: Path to the document\n        output_dir: Output directory for RAG results\n        working_dir: Working directory for RAG storage\n    \"\"\"\n    try:\n        # Create RAGAnything configuration\n        config = RAGAnythingConfig(\n            working_dir=working_dir,\n            #parser=\"docling\",\n            parse_method=\"auto\",\n            enable_image_processing=True,\n            enable_table_processing=True,\n            enable_equation_processing=True,\n        )\n\n        # Define LLM model function\n        def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):\n\n            return ollama_model_complete(\n                prompt=prompt,\n                model=OLLAMA_LLM_MODEL,   \n                system_prompt=system_prompt,\n                history_messages=history_messages,\n                base_url=OLLAMA_HOST,\n                **kwargs,\n            )\n\n        # Define vision model function for image processing\n        def vision_model_func(\n                prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs\n        ):\n            if image_data:\n                return ollama_model_complete(\n                    model=OLLAMA_VISION_MODEL,\n                    system_prompt=None,\n                    history_messages=[],\n                    messages=[\n                        {\"role\": \"system\", \"content\": system_prompt}\n                        if system_prompt\n                        else None,\n                        {\n                            \"role\": \"user\",\n                            \"content\": [\n                                {\"type\": \"text\", \"text\": prompt},\n                                {\n                                    \"type\": \"image_url\",\n                                    \"image_url\": {\n                                        \"url\": f\"data:image/jpeg;base64,{image_data}\"\n                                    },\n                                },\n                            ],\n                        }\n                        if image_data\n                        else {\"role\": \"user\", \"content\": prompt},\n                    ],\n                    base_url=OLLAMA_HOST,\n                    **kwargs,\n                )\n            else:\n                return llm_model_func(prompt, system_prompt, history_messages, **kwargs)\n\n        # Define embedding function\n        embedding_func = EmbeddingFunc(\n            embedding_dim=1024,\n            max_token_size=8192,\n            func=lambda texts: ollama_embed(\n                texts,\n                model=OLLAMA_EMBEDDING_MODEL,\n                base_url=OLLAMA_HOST,\n            ),\n        )\n\n\n        # Initialize RAGAnything with new dataclass structure\n        rag = RAGAnything(\n            config=config,\n            llm_model_func=llm_model_func,\n            vision_model_func=vision_model_func,\n            embedding_func=embedding_func,\n        )\n\n        # Process document\n        await rag.process_document_complete(\n            file_path=file_path, output_dir=output_dir, parse_method=\"auto\"\n        )\n\ndef main():\n    # Process with RAG\n    asyncio.run(\n        process_with_rag(\n            DOC_FILE_PATH, OUT_FILE_PATH, WORK_DIR\n        )\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n\n运行后发现：wq_storeage 下面只会出现kv_store_parse_cache.json ，没有其它文件了，最终输出：\nContent Information:\nINFO: * Total blocks in content_list: 0\nINFO: * Content block types:\nINFO: Content separation complete:\nINFO:   - Text content length: 0 characters\nINFO:   - Multimodal items count: 0\nINFO: Document d:\\kswj\\21.pdf processing complete!\nINFO: Storage Initialization completed!\nINFO: Storage Finalization completed!  \n我的PDF文件都正常，问题出在哪里？，搞了好长时间了，先谢谢大家！\n\n### Additional Context\n\n[21.pdf](https://github.com/user-attachments/files/21483846/21.pdf)",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/69/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/67",
      "id": 3267200315,
      "node_id": "I_kwDOO3Bfkc7CvYU7",
      "number": 67,
      "title": "[Question]: มวยล้มต้มคนดูไหมนะ",
      "user": {
        "login": "kenjiroe",
        "id": 2861122,
        "node_id": "MDQ6VXNlcjI4NjExMjI=",
        "avatar_url": "https://avatars.githubusercontent.com/u/2861122?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/kenjiroe",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-27T15:28:39Z",
      "updated_at": "2025-07-27T15:28:52Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nมันใช้งานได้จริงใช่ไหม ทำไมลองเอาไปเทสแล้ว issue เพียบเลย ระบบถามตอบก็เป็นแค่เพียงค้นหาธรรมดา ไม่ได้มี ai เลย \n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/67/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/63",
      "id": 3257066614,
      "node_id": "I_kwDOO3Bfkc7CIuR2",
      "number": 63,
      "title": "[Question]: \"Error processing image content: Image file not found\"",
      "user": {
        "login": "JJMocke",
        "id": 83647229,
        "node_id": "MDQ6VXNlcjgzNjQ3MjI5",
        "avatar_url": "https://avatars.githubusercontent.com/u/83647229?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/JJMocke",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 4,
      "created_at": "2025-07-23T16:49:46Z",
      "updated_at": "2025-07-24T15:19:25Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n\nI am working in Google Colab and using Python version 3.11\n\nWhy do I get this error when uploading the Common_print.pdf to the RAG?\n\n```\nError processing image content: Image file not found: \nError processing multimodal content: not enough values to unpack (expected 3, got 2)\n```\n\n```\nimport os\nimport asyncio\nfrom pathlib import Path\nfrom lightrag.llm.openai import openai_complete_if_cache, openai_embed\nfrom lightrag.utils import EmbeddingFunc\nfrom raganything import RAGAnything, RAGAnythingConfig\n\nimport os\nfrom google.colab import userdata\n\n# Set your API key directly or use environment variable\nOPENAI_API_KEY = userdata.get('OpenAI_API')\nWORKING_DIR = \"./rag_storage\"\nOUTPUT_DIR = \"./output\"\n\n# Build config\nconfig = RAGAnythingConfig(\n    working_dir=WORKING_DIR,\n    mineru_parse_method=\"auto\",\n    enable_image_processing=True,\n    enable_table_processing=True,\n    enable_equation_processing=True,\n)\n\n# LLM and vision model functions\ndef llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):\n    return openai_complete_if_cache(\n        \"gpt-4o-mini\", prompt, system_prompt=system_prompt,\n        history_messages=history_messages, api_key=OPENAI_API_KEY, **kwargs\n    )\n\ndef vision_model_func(prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs):\n    return openai_complete_if_cache(\n        \"gpt-4o\", \"\", system_prompt=system_prompt,\n        history_messages=history_messages,\n        messages=[\n            {\"role\": \"system\", \"content\": system_prompt} if system_prompt else None,\n            {\n                \"role\": \"user\",\n                \"content\": [\n                    {\"type\": \"text\", \"text\": prompt},\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_data}\"}},\n                ],\n            },\n        ],\n        api_key=OPENAI_API_KEY,\n        **kwargs\n    )\n\nembedding_func = EmbeddingFunc(\n    embedding_dim=3072,\n    max_token_size=8192,\n    func=lambda texts: openai_embed(texts, model=\"text-embedding-3-large\", api_key=OPENAI_API_KEY),\n)\n\n# Initialize RAGAnything\nrag = RAGAnything(\n    config=config,\n    llm_model_func=llm_model_func,\n    vision_model_func=vision_model_func,\n    embedding_func=embedding_func,\n)\n\npdf_paths = [\n    \"/content/Bad_prints.pdf\",\n    \"/content/Common_print.pdf\"\n]\n\nfor pdf_path in pdf_paths:\n    print(f\"📄 Processing: {pdf_path}\")\n    await rag.process_document_complete(file_path=pdf_path, output_dir=OUTPUT_DIR)\n```\n\nHere is the full output from running the script above: \n\n```\n📄 Processing: /content/Bad_prints.pdf\nRerank is enabled but no rerank_model_func provided. Reranking will be skipped.\n📄 Processing: /content/Common_print.pdf\nError processing image content: Image file not found: \nError processing multimodal content: not enough values to unpack (expected 3, got 2)\nError processing image content: Image file not found: \nError processing multimodal content: not enough values to unpack (expected 3, got 2)\n```\n\nThanks in advance any help would really be appreciated\n\n### Additional Context\n\n\nHere are the docs:\n[Common_print.pdf](https://github.com/user-attachments/files/21391890/Common_print.pdf)\n[Bad_prints.pdf](https://github.com/user-attachments/files/21391891/Bad_prints.pdf)",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/63/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/60",
      "id": 3254794480,
      "node_id": "I_kwDOO3Bfkc7CADjw",
      "number": 60,
      "title": "[Question]:How to build a vector database using preprocessed data already prepared in Mineru.",
      "user": {
        "login": "sbpark-0104",
        "id": 222352548,
        "node_id": "U_kgDODUDUpA",
        "avatar_url": "https://avatars.githubusercontent.com/u/222352548?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/sbpark-0104",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-07-23T04:18:39Z",
      "updated_at": "2025-07-24T11:04:01Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nWe are actively testing various functionalities, as the solution you provided is proving to be very useful.\n\nFrom what we’ve observed, when building a vector database (VDB) using multiple documents, it seems that each document needs to be processed one by one through minerU before being individually added to the VDB.\n\nHowever, we already have a set of documents that have been preprocessed by minerU and organized in the following folder structure:\n\npdf_ppc/\n├── doc1/\n├── doc2/\n├── doc3/\n├── ...\nEach docX/ folder contains data that has already been processed by minerU.\nIs there a way to batch-import all of these preprocessed folders into the VDB at once, instead of adding them individually?\n\nNote: Each folder (doc1/, doc2/, doc3/, ...) corresponds to an original PDF file named doc1.pdf, doc2.pdf, doc3.pdf, ...\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/60/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/57",
      "id": 3244782430,
      "node_id": "I_kwDOO3Bfkc7BZ3Ne",
      "number": 57,
      "title": "[Feature Request]:能否提供基于ollama的运行示例？包括视觉和语言方面的",
      "user": {
        "login": "long123524",
        "id": 73996375,
        "node_id": "MDQ6VXNlcjczOTk2Mzc1",
        "avatar_url": "https://avatars.githubusercontent.com/u/73996375?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/long123524",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-19T02:28:18Z",
      "updated_at": "2025-07-19T02:28:18Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [ ] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\n作者大大，能否像lightrag一样提供一个基于ollama的运行示例，ollama现在也能支持视觉—语言模型\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/57/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/56",
      "id": 3232035472,
      "node_id": "I_kwDOO3Bfkc7ApPKQ",
      "number": 56,
      "title": "[Bug]:examples/text_format_test.py fails within MinerU",
      "user": {
        "login": "sibbi77",
        "id": 1971630,
        "node_id": "MDQ6VXNlcjE5NzE2MzA=",
        "avatar_url": "https://avatars.githubusercontent.com/u/1971630?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/sibbi77",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-07-15T12:16:48Z",
      "updated_at": "2025-07-22T03:24:04Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nI check the install with:\n\n```\nroot@bca78e10cda2:/# mineru --version                                                                                                      \n2025-07-15 11:55:27.855 | WARNING  | mineru.backend.vlm.predictor:<module>:35 - sglang is not installed. If you are not using sglang, you can ignore this warning.\nCreating new Ultralytics Settings v0.0.6 file ✅ \nView Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'\nUpdate Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.\nmineru, version 2.1.0\nroot@bca78e10cda2:/# python -c \"from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_mineru_installation() else '❌ MinerU installation issue')\"\n✅ MinerU installed properly\nroot@bca78e10cda2:/tmp/RAG-Anything# python examples/office_document_test.py --check-libreoffice --file dummy\n🔧 Checking LibreOffice installation...\n✅ LibreOffice found: LibreOffice 7.4.7.2 40(Build:2)\n✅ LibreOffice installation check passed!\nroot@bca78e10cda2:/tmp/RAG-Anything# python examples/image_format_test.py --check-pillow --file dummy\n🔧 Checking PIL/Pillow installation...\n✅ PIL/Pillow found: PIL version 11.3.0\n✅ PIL/Pillow installation check passed!\nroot@bca78e10cda2:/tmp/RAG-Anything# python examples/text_format_test.py --check-reportlab --file dummy\n🔧 Checking ReportLab installation...\n✅ ReportLab found: version 4.4.2\n✅ ReportLab installation check passed!\n```\n\nRunning `examples/text_format_test.py` fails:\n\n```\nroot@bca78e10cda2:/tmp/RAG-Anything# python examples/text_format_test.py --file README.md \n🔧 Checking ReportLab installation...\n✅ ReportLab found: version 4.4.2\n🧪 Testing text format parsing: README.md\n📄 File format: .MD\n📏 File size: 40.0 KB\n📝 Text length: 40676 characters\n📋 Line count: 976\n\n🔄 Testing text parsing with MinerU...\nERROR:root:Error in parse_text_file: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\nError during parsing with specific parser: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\nFalling back to generic parser...\nERROR:root:Error in parse_text_file: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n\n❌ Text format parsing failed: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n   Full error: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1133, in parse_text_file\n    doc.build(story)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1322, in build\n    BaseDocTemplate.build(self,flowables, canvasmaker=canvasmaker)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1109, in build\n    self._endBuild()\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1044, in _endBuild\n    if getattr(self,'_doSave',1): self.canv.save()\n                                  ^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfgen/canvas.py\", line 1301, in save\n    self._doc.SaveToFile(self._filename, self)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 212, in SaveToFile\n    data = self.GetPDFData(canvas)\n           ^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 238, in GetPDFData\n    return self.format()\n           ^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 420, in format\n    IOf = IO.format(self)\n          ^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 867, in format\n    fcontent = format(self.content, document, toplevel=1)   # yes this is at top level\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 67, in format\n    f = element.format(document)\n        ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 1633, in format\n    return D.format(document)\n           ^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 680, in format\n    L = [(format(PDFName(k),document)+b\" \"+format(dict[k],document)) for k in keys]\n                                           ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 67, in format\n    f = element.format(document)\n        ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 1797, in format\n    if f is None: raise ValueError(\"format not resolved, probably missing URL scheme or undefined destination target for '%s'\" % self.name)\n                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nValueError: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.12/site-packages/raganything/processor.py\", line 94, in parse_document\n    content_list, md_content = MineruParser.parse_document(\n                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1204, in parse_document\n    return MineruParser.parse_text_file(file_path, output_dir, lang, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1144, in parse_text_file\n    raise RuntimeError(\nRuntimeError: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1133, in parse_text_file\n    doc.build(story)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1322, in build\n    BaseDocTemplate.build(self,flowables, canvasmaker=canvasmaker)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1109, in build\n    self._endBuild()\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/platypus/doctemplate.py\", line 1044, in _endBuild\n    if getattr(self,'_doSave',1): self.canv.save()\n                                  ^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfgen/canvas.py\", line 1301, in save\n    self._doc.SaveToFile(self._filename, self)\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 212, in SaveToFile\n    data = self.GetPDFData(canvas)\n           ^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 238, in GetPDFData\n    return self.format()\n           ^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 420, in format\n    IOf = IO.format(self)\n          ^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 867, in format\n    fcontent = format(self.content, document, toplevel=1)   # yes this is at top level\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 67, in format\n    f = element.format(document)\n        ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 1633, in format\n    return D.format(document)\n           ^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 680, in format\n    L = [(format(PDFName(k),document)+b\" \"+format(dict[k],document)) for k in keys]\n                                           ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 67, in format\n    f = element.format(document)\n        ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/reportlab/pdfbase/pdfdoc.py\", line 1797, in format\n    if f is None: raise ValueError(\"format not resolved, probably missing URL scheme or undefined destination target for '%s'\" % self.name)\n                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nValueError: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/tmp/RAG-Anything/examples/text_format_test.py\", line 74, in test_text_format_parsing\n    content_list, md_content = rag.parse_document(\n                               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/raganything/processor.py\", line 105, in parse_document\n    content_list, md_content = MineruParser.parse_document(\n                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1204, in parse_document\n    return MineruParser.parse_text_file(file_path, output_dir, lang, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/raganything/mineru_parser.py\", line 1144, in parse_text_file\n    raise RuntimeError(\nRuntimeError: Failed to convert text file README.md to PDF: format not resolved, probably missing URL scheme or undefined destination target for '-configuration'\n```\n\n\n### Steps to reproduce\n\nsee above\n\n### Expected Behavior\n\nThe result of the examples is not documented, but I don't expect an exception.\n\n### LightRAG Config Used\n\njust followed install instructions - no additional config used\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version: 1.3.9\n- Operating System: Debian GNU/Linux 12 (bookworm); python:3.12 docker image\n- Python Version: 3.12.7\n\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/56/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/55",
      "id": 3228443971,
      "node_id": "I_kwDOO3Bfkc7AbiVD",
      "number": 55,
      "title": "[Question]: 生态支持问题",
      "user": {
        "login": "Skyorca",
        "id": 27476428,
        "node_id": "MDQ6VXNlcjI3NDc2NDI4",
        "avatar_url": "https://avatars.githubusercontent.com/u/27476428?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Skyorca",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-14T11:34:05Z",
      "updated_at": "2025-07-14T11:34:05Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n1. 请问如何把lightrag/raganything整合到langchain框架中？\n2. 请问支持华为昇腾生态吗？\n谢谢\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/55/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/51",
      "id": 3220046408,
      "node_id": "I_kwDOO3Bfkc6_7gJI",
      "number": 51,
      "title": "[Bug]: Files with the same name are stored in the same output location",
      "user": {
        "login": "jesse-merhi",
        "id": 79823012,
        "node_id": "MDQ6VXNlcjc5ODIzMDEy",
        "avatar_url": "https://avatars.githubusercontent.com/u/79823012?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/jesse-merhi",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-07-10T16:52:26Z",
      "updated_at": "2025-12-03T13:07:21Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nI have noticed that if you process two files with the same name, for example `file.pdf`, even if they are in different directories, RAG anything will overwrite the output from the processing of the first file with the second file's output. \n\nI believe this can be fixed if the output saved the entire filepath of a file (or even its hash), but neither of these seem to be the case.\n\n### Steps to reproduce\n\nCreate two files with the same name, but in different subdirectories and index the directory containing both.\n\n```\n❯ ls ~/files\nsubdir paper.pdf\n❯ ls ~/files/subdir\npaper.pdf\n```\n\n### Expected Behavior\n\nWe should separate the indexes for these files, which avoids overwriting. \n\n### LightRAG Config Used\n\n# Paste your config here\n```\nlr = LightRAG(\n        working_dir=SHARED_WORKDIR,\n        llm_model_func=_llm(_api_key),\n        embedding_func=_embed(_api_key),\n    )\n    await lr.initialize_storages()\n    await initialize_pipeline_status()\n    logging.info(f\"Creating new rag on {SHARED_WORKDIR}\")\n    _global_rag = RAGAnything(\n        lightrag=lr,\n        llm_model_func=lr.llm_model_func,\n        vision_model_func=_vision(_api_key),\n        embedding_func=lr.embedding_func,\n        config=RAGAnythingConfig(\n            working_dir=SHARED_WORKDIR,\n            mineru_parse_method=\"auto\",\n            enable_image_processing=True,\n            enable_table_processing=True,\n            enable_equation_processing=True,\n        ),\n    )\n```\n\n\n\n### Logs and screenshots\n\nThe output directory looks quite strange. Something interesting is the `images` directory contains images from both pdfs but the rest of the data is just from one.\n<img width=\"534\" height=\"358\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/381e85df-8745-42ab-bb94-b8acf2a9e493\" />\n\n### Additional Information\n\n- LightRAG Version: latest\n- Operating System: Ubuntu 24.04.2 LTS\n- Python Version: >=3.12\n- Related Issues: None.\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/51/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/50",
      "id": 3218085208,
      "node_id": "I_kwDOO3Bfkc6_0BVY",
      "number": 50,
      "title": "[Question]:ImportError: cannot import name 'LightRAG' from 'lightrag' (unknown location)",
      "user": {
        "login": "lhx20011226",
        "id": 55992900,
        "node_id": "MDQ6VXNlcjU1OTkyOTAw",
        "avatar_url": "https://avatars.githubusercontent.com/u/55992900?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/lhx20011226",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 5,
      "created_at": "2025-07-10T06:31:06Z",
      "updated_at": "2025-08-28T07:38:43Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n<img width=\"1637\" height=\"367\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/dc56872c-c4dd-4000-a5c5-a80f3f9121fa\" />\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/50/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/49",
      "id": 3215047771,
      "node_id": "I_kwDOO3Bfkc6_obxb",
      "number": 49,
      "title": "[Question]: Not able to Process documents, Document Processing stuck",
      "user": {
        "login": "pankaj-2k01",
        "id": 55687044,
        "node_id": "MDQ6VXNlcjU1Njg3MDQ0",
        "avatar_url": "https://avatars.githubusercontent.com/u/55687044?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/pankaj-2k01",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 9,
      "created_at": "2025-07-09T08:40:37Z",
      "updated_at": "2025-10-07T22:09:07Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nI have configured everything in the configuration, But pipeline is getting stuck here every-time. What could be the possible reason for that.It is getting stuck here only.\n\nModel used : gpt-4o-mini\nembedding model - text-3-large\n\nIs there something else that I need to configure ?\n\nINFO: Process 185328 Shared-Data created for Single Process\n2025-07-09 08:37:18 - pipmaster.async_package_manager - INFO - [Async] Initialized for Python: /home/jovyan/work/Pankaj/.rag-anything/bin/python\nINFO:nano-vectordb:Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': './rag_storage/vdb_entities.json'} 0 data\nINFO:nano-vectordb:Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': './rag_storage/vdb_relationships.json'} 0 data\nINFO:nano-vectordb:Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': './rag_storage/vdb_chunks.json'} 0 data\nINFO: Process 185328 initialized updated flags for namespace: [full_docs]\nINFO: Process 185328 ready to initialize storage namespace: [full_docs]\nINFO: Process 185328 initialized updated flags for namespace: [text_chunks]\nINFO: Process 185328 ready to initialize storage namespace: [text_chunks]\nINFO: Process 185328 initialized updated flags for namespace: [entities]\nINFO: Process 185328 initialized updated flags for namespace: [relationships]\nINFO: Process 185328 initialized updated flags for namespace: [chunks]\nINFO: Process 185328 initialized updated flags for namespace: [chunk_entity_relation]\nINFO: Process 185328 initialized updated flags for namespace: [llm_response_cache]\nINFO: Process 185328 ready to initialize storage namespace: [llm_response_cache]\nINFO: Process 185328 initialized updated flags for namespace: [doc_status]\nINFO: Process 185328 ready to initialize storage namespace: [doc_status]\nINFO: Process 185328 storage namespace already initialized: [full_docs]\nINFO: Process 185328 storage namespace already initialized: [text_chunks]\nINFO: Process 185328 storage namespace already initialized: [llm_response_cache]\nINFO: Process 185328 storage namespace already initialized: [doc_status]\nINFO: Process 185328 Pipeline namespace initialized\n\nCODE\n```\nimport asyncio, os, httpx\nfrom raganything import RAGAnything, RAGAnythingConfig\nfrom lightrag.utils import EmbeddingFunc\nimport base64\n\n# === Your self-hosted API ===\nAPI_BASE = \"http://*****************\"\nAPI_KEY = \"*********************\"\nHEADERS = {\"Authorization\": f\"Bearer {API_KEY}\"}\n\n# === Optional proxy ===\nos.environ[\"http_proxy\"] = \"\"\nos.environ[\"https_proxy\"] = \"\"\n\n# === Embedding function ===\nasync def ada_embed(texts: list[str]) -> dict:\n    url = f\"{API_BASE}/v1/embeddings\"\n    payload = {\"model\": \"azure-embedding-model\", \"input\": texts}\n    async with httpx.AsyncClient(timeout=60.0) as client:\n        response = await client.post(url, headers=HEADERS, json=payload)\n        response.raise_for_status()\n        return response.json()  # OpenAI-compatible: { \"data\": [...] }\n\nada_embed.embedding_dim = 3072\n\n# === LLM function ===\nasync def gpt_4o_generate(prompt: str, system_prompt=None, history_messages=[], **kwargs) -> str:\n    url = f\"{API_BASE}/v1/chat/completions\"\n    messages = []\n    if system_prompt:\n        messages.append({\"role\": \"system\", \"content\": system_prompt})\n    messages.extend(history_messages)\n    messages.append({\"role\": \"user\", \"content\": prompt})\n\n    payload = {\n        \"model\": \"gpt-4o-mini\",\n        \"messages\": messages,\n        \"temperature\": 0.0\n    }\n\n    async with httpx.AsyncClient(timeout=60.0) as client:\n        response = await client.post(url, headers=HEADERS, json=payload)\n        response.raise_for_status()\n        data = response.json()\n\n    return data['choices'][0]['message']['content']\n\n# === Vision function ===\nasync def vision_model_func(prompt: str, system_prompt=None, history_messages=[], image_data=None, **kwargs) -> str:\n    url = f\"{API_BASE}/v1/chat/completions\"\n    messages = []\n    if system_prompt:\n        messages.append({\"role\": \"system\", \"content\": system_prompt})\n    if history_messages:\n        messages.extend(history_messages)\n\n    if image_data:\n        messages.append({\n            \"role\": \"user\",\n            \"content\": [\n                {\"type\": \"text\", \"text\": prompt},\n                {\n                    \"type\": \"image_url\",\n                    \"image_url\": {\n                        \"url\": f\"data:image/jpeg;base64,{image_data}\"\n                    }\n                }\n            ]\n        })\n    else:\n        messages.append({\"role\": \"user\", \"content\": prompt})\n\n    payload = {\n        \"model\": \"gpt-4o-mini\",\n        \"messages\": messages,\n        \"temperature\": 0.0\n    }\n\n    async with httpx.AsyncClient(timeout=60.0) as client:\n        response = await client.post(url, headers=HEADERS, json=payload)\n        response.raise_for_status()\n        data = response.json()\n\n    return data['choices'][0]['message']['content']\n\n# === LightRAG config ===\nconfig = RAGAnythingConfig(\n    working_dir=\"./rag_storage\",\n    mineru_parse_method=\"auto\",\n    enable_image_processing=True,\n    enable_table_processing=True,\n    enable_equation_processing=True,\n)\n\n# === Init embedding wrapper ===\nembedding_func = EmbeddingFunc(\n    embedding_dim=ada_embed.embedding_dim,\n    max_token_size=8192,\n    func=ada_embed,\n)\n\n# === Main ===\nasync def main():\n    \n    rag =RAGAnything(\n        config=config,\n        llm_model_func=gpt_4o_generate,\n        vision_model_func=vision_model_func,\n        embedding_func=embedding_func,\n    )\n    print(\"-----------------PROCESSING DOCUMENT-----------------\")\n    # Process document\n    await rag.process_document_complete(\n        file_path=\"documents/Lorem_ipsum.pdf\",\n        output_dir=\"output/\",\n        parse_method=\"auto\",\n        display_stats=True\n    )\n\n    # Query\n    text_result = await rag.aquery(\n        \"What are the main findings shown in the figures and tables?\",\n        mode=\"hybrid\"\n    )\n    print(\"Text query result:\", text_result)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n\n```\n\nPlease help, I am stuck here\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/49/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/48",
      "id": 3212923109,
      "node_id": "I_kwDOO3Bfkc6_gVDl",
      "number": 48,
      "title": "[Question]: Is it normal for documents to take a while to process?",
      "user": {
        "login": "jesse-merhi",
        "id": 79823012,
        "node_id": "MDQ6VXNlcjc5ODIzMDEy",
        "avatar_url": "https://avatars.githubusercontent.com/u/79823012?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/jesse-merhi",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 4,
      "created_at": "2025-07-08T15:14:04Z",
      "updated_at": "2025-12-10T08:07:46Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHey there, I have some pretty extreme hardware,\n\nCPU: 9800x3d\nGPU: 5090\n\nYet, doing something as simple as processing a couple of ~12-page PDFs with a normal amount of images takes 15 minutes... Do you know if this is normal? \n\nMy GPU is not being utilised at all, and my CPU has very minimal utilisation... \n\nIt seems the `MinerU command executed successfully` happens quickly, but after this, the processing is super slow.\n\n```\n[07/09/25 00:52:04] INFO     Creating llm function with model: gpt-4o-mini                                                                                                                                                                                                              main.py:55\n                    INFO     Creating embedding function with model: text-embedding-3-large                                                                                                                                                                                             main.py:93\nINFO: Process 28784 Shared-Data created for Single Process\n                    INFO     Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': '/shared_workspace/vdb_entities.json'} 0 data                                                                                                                     dbs.py:81\n                    INFO     Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': '/shared_workspace/vdb_relationships.json'} 0 data                                                                                                                dbs.py:81\n                    INFO     Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': '/shared_workspace/vdb_chunks.json'} 0 data                                                                                                                       dbs.py:81\nINFO: Process 28784 initialized updated flags for namespace: [full_docs]\nINFO: Process 28784 ready to initialize storage namespace: [full_docs]\nINFO: Process 28784 initialized updated flags for namespace: [text_chunks]\nINFO: Process 28784 ready to initialize storage namespace: [text_chunks]\nINFO: Process 28784 initialized updated flags for namespace: [entities]\nINFO: Process 28784 initialized updated flags for namespace: [relationships]\nINFO: Process 28784 initialized updated flags for namespace: [chunks]\nINFO: Process 28784 initialized updated flags for namespace: [chunk_entity_relation]\nINFO: Process 28784 initialized updated flags for namespace: [llm_response_cache]\nINFO: Process 28784 ready to initialize storage namespace: [llm_response_cache]\nINFO: Process 28784 initialized updated flags for namespace: [doc_status]\nINFO: Process 28784 ready to initialize storage namespace: [doc_status]\nINFO: Process 28784 storage namespace already initialized: [full_docs]\nINFO: Process 28784 storage namespace already initialized: [text_chunks]\nINFO: Process 28784 storage namespace already initialized: [llm_response_cache]\nINFO: Process 28784 storage namespace already initialized: [doc_status]\nINFO: Process 28784 Pipeline namespace initialized\nMinerU command executed successfully\nMinerU command executed successfully\nError parsing equation analysis response: Invalid \\escape: line 2 column 689 (char 690)\nError parsing equation analysis response: Invalid \\escape: line 2 column 62 (char 63)\nError parsing equation analysis response: Invalid \\escape: line 2 column 262 (char 263)\n[07/09/25 01:04:39] INFO     Creating llm function with model: gpt-4o-mini \nIndexed 2 new or updated file(s) into shared workspace '/home/jmerh/.rag_anything/shared_workspace'.\n```\n\nIs there some sort of config I need to change?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/48/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/45",
      "id": 3206301408,
      "node_id": "I_kwDOO3Bfkc6_HEbg",
      "number": 45,
      "title": "[Feature Request]: Support for Dynamic request routing and other model providers other than OpenAI",
      "user": {
        "login": "m23ayou2",
        "id": 178153976,
        "node_id": "U_kgDOCp5p-A",
        "avatar_url": "https://avatars.githubusercontent.com/u/178153976?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/m23ayou2",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-06T08:36:38Z",
      "updated_at": "2025-07-28T03:36:02Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [x] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\nHello,\n\nI have noticed the model providers only support OpenAI and wanted to ask if there are any efforts to support  other model providers and inference providers for open-source models like LLaMA, other than that it would be very interesting to support dynamic user request routing like https://github.com/NVIDIA-AI-Blueprints/llm-router for cost optimization, I have already worked on a similar projects and would be happy to integrate it into this.\n\nThanks a lot.\n\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/45/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/44",
      "id": 3203623227,
      "node_id": "I_kwDOO3Bfkc6-82k7",
      "number": 44,
      "title": "[Question]:Ollama fail processing multimodal content",
      "user": {
        "login": "yurvon-screamo",
        "id": 109030262,
        "node_id": "U_kgDOBn-rdg",
        "avatar_url": "https://avatars.githubusercontent.com/u/109030262?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/yurvon-screamo",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-07-04T20:09:05Z",
      "updated_at": "2025-07-24T11:28:15Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nWhen creating a document, errors appear, but the document itself is created, possibly without pictures. What am I doing wrong?\n\n```bash\nError processing image content: 'hashing_kv'\nError processing multimodal content: not enough values to unpack (expected 3, got 2)\nError processing image content: 'hashing_kv'\nError processing multimodal content: not enough values to unpack (expected 3, got 2)\nError processing image content: 'hashing_kv'\nError processing multimodal content: not enough values to unpack (expected 3, got 2)\nError processing image content: 'hashing_kv'\n```\n\n### Additional Context\n\nOS: try on Ubuntu or Docker.\n\nModels: try on `qwen2.5vl:7b/qwen3:30b` or `mistral-small3.2:24b` + `nomic-embed-text`\n\nConfig:\n\n```python\n\nasync def vision_model_func(\n    prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs\n):\n    \"\"\"\n    Handles vision model requests. If image_data is provided, sends it to the vision model; otherwise, uses the text model.\n    \"\"\"\n    if image_data:\n        if isinstance(image_data, bytes):\n            img_b64 = base64.b64encode(image_data).decode(\"utf-8\")\n        elif isinstance(image_data, str):\n            img_b64 = image_data\n        else:\n            raise ValueError(\"image_data must be bytes or base64 string\")\n        payload = {\n            \"model\": OLLAMA_VISION_MODEL,\n            \"prompt\": prompt,\n            \"images\": [img_b64],\n            \"stream\": False\n        }\n        try:\n            async with aiohttp.ClientSession() as session:\n                async with session.post(f\"{OLLAMA_HOST}/api/generate\", json=payload) as resp:\n                    resp.raise_for_status()\n                    data = await resp.json()\n                    return data.get(\"response\", \"[No description]\")\n        except Exception as e:\n            return f\"[Error describing image: {e} ]\"\n    else:\n        return await ollama_model_complete(prompt, system_prompt, history_messages, **kwargs)\n\n\n# ....\n\nlightrag_instance = LightRAG(\n    working_dir=WORKING_DIR,\n    llm_model_func=ollama_model_complete,\n    llm_model_name=OLLAMA_MODEL,\n    llm_model_kwargs={\n        \"host\": OLLAMA_HOST,\n        \"options\": {\"num_ctx\": OLLAMA_NUM_CTX},\n    },\n    embedding_func=EmbeddingFunc(\n        embedding_dim=EMBEDDING_DIM,\n        max_token_size=MAX_EMBED_TOKENS,\n        func=lambda texts: ollama_embed(\n            texts,\n            embed_model=EMBEDDING_MODEL,\n            host=EMBEDDING_HOST,\n        ),\n    ),\n)\nawait lightrag_instance.initialize_storages()\nawait initialize_pipeline_status()\n\nrag = RAGAnything(\n    lightrag=lightrag_instance,\n    vision_model_func=vision_model_func,\n)\n\n# ....\n\nawait rag.process_document_complete(\n    file_path=file_path,\n    output_dir=OUTPUT_DIR,\n    parse_method=\"auto\",\n)\n\n```\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/44/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/40",
      "id": 3201836610,
      "node_id": "I_kwDOO3Bfkc6-2CZC",
      "number": 40,
      "title": "[Bug]: uv add raganything==1.2.0 error （python_full_version >= '3.12' and sys_platform == 'win32')",
      "user": {
        "login": "nice-kai",
        "id": 18050781,
        "node_id": "MDQ6VXNlcjE4MDUwNzgx",
        "avatar_url": "https://avatars.githubusercontent.com/u/18050781?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/nice-kai",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-04T08:42:40Z",
      "updated_at": "2025-08-11T09:25:18Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\n```\nrag-anythinglimk@limk-MS-7D36:~/CursorProject/rag_anything$ uv add raganything==1.2.0\n  × No solution found when resolving dependencies for split (python_full_version >= '3.12' and sys_platform == 'win32'):\n  ╰─▶ Because there is no version of raganything==1.2.0 and your project depends on raganything==1.2.0, we can conclude that your\n      project's requirements are unsatisfiable.\n      And because your project requires rag-anything-demo[all], we can conclude that your project's requirements are unsatisfiable.\n\n      hint: The resolution failed for an environment that is not the current one, consider limiting the environments with\n      `tool.uv.environments`.\n  help: If you want to add the package regardless of the failed resolution, provide the `--frozen` flag to skip locking and syncing.\n```\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/40/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/38",
      "id": 3201627317,
      "node_id": "I_kwDOO3Bfkc6-1PS1",
      "number": 38,
      "title": "[Bug]:auto_manage_storages_states=True 失效了",
      "user": {
        "login": "nice-kai",
        "id": 18050781,
        "node_id": "MDQ6VXNlcjE4MDUwNzgx",
        "avatar_url": "https://avatars.githubusercontent.com/u/18050781?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/nice-kai",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-04T07:17:05Z",
      "updated_at": "2025-07-04T07:17:05Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\n环境：python = 3.11\npython管理工具：uv\n场景：pdf生成图谱\n核心问题：auto_manage_storages_states=True，生成图谱过程中，自动初始化机制的时机错误\n问题不在于参数设置，而在于 LightRAG的自动存储管理机制的初始化时机：\n设计预期：当 auto_manage_storages_states=True 时，存储应该在需要时自动初始化\n实际情况：存储锁 _storage_lock 在实例创建时仍然是 None\n失效原因：当调用 ainsert() 方法时，代码尝试使用 async with self._storage_lock:，但此时 _storage_lock 仍然是 None\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/38/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/32",
      "id": 3198792116,
      "node_id": "I_kwDOO3Bfkc6-qbG0",
      "number": 32,
      "title": "[Question]:如何控制嵌入行数与控制LLM调用频率",
      "user": {
        "login": "kaichen1007",
        "id": 77338775,
        "node_id": "MDQ6VXNlcjc3MzM4Nzc1",
        "avatar_url": "https://avatars.githubusercontent.com/u/77338775?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/kaichen1007",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-03T09:55:43Z",
      "updated_at": "2025-07-03T09:55:43Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nllm_model:qwen-plus-2025-04-28\nvision_model:qwen-omni-turbo\nembedding_model:text-embedding-v4\n在使用过程中，会出现两个问题导致无法进行。\n1. 限流问题\n```python\nexcept RateLimitError as e:\n        logger.error(f\"触发了限流,model: {model}, base_url: {base_url},messages: {messages}\")\n        logger.error(f\"OpenAI API Rate Limit Error: {e}\")\n        await openai_async_client.close()  # Ensure client is closed\n        raise\n```\n触发了限流,model: qwen-plus-2025-04-28, base_url: https://dashscope.aliyuncs.com/compatible-mode/v1,messages:\n添加了await asyncio.sleep(1)也无济于事\n```python\ntry:\n        logger.info(f\"openai_complete_if_cache messages: {messages}\")\n        # Don't use async with context manager, use client directly\n        if \"response_format\" in kwargs:\n            response = await openai_async_client.beta.chat.completions.parse(\n                model=model, messages=messages, **kwargs\n            )\n        else:\n            response = await openai_async_client.chat.completions.create(\n                model=model, messages=messages, **kwargs\n            )\n\n        await asyncio.sleep(1)\n```\n2. 嵌入模型行数限制问题\n当前使用的是Qwen的text-embedding-v4其中一个行数限制10，在执行中会出现超过行数的现象\n```python\nopenai_async_client = create_openai_async_client(\n        api_key=api_key, base_url=base_url, client_configs=client_configs\n    )\n    logger.error(f\"嵌入Embedding {len(texts)} texts using model {model}\")\n    await asyncio.sleep(1)\n    async with openai_async_client:    \n        response = await openai_async_client.embeddings.create(\n            model=model, input=texts, encoding_format=\"float\"\n        )\n        return np.array([dp.embedding for dp in response.data])\n```\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 32 texts using model text-embedding-v4\n嵌入Embedding 8 texts using model text-embedding-v4\n\n以上这两个问题，不知道是否有办法可控制。\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/32/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/31",
      "id": 3198370445,
      "node_id": "I_kwDOO3Bfkc6-o0KN",
      "number": 31,
      "title": "[Question]:在进行图片的视觉分析时，使用的是相对路径加载图片的base64？",
      "user": {
        "login": "LiZiTi",
        "id": 203536820,
        "node_id": "U_kgDODCG5tA",
        "avatar_url": "https://avatars.githubusercontent.com/u/203536820?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/LiZiTi",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-03T07:31:21Z",
      "updated_at": "2025-07-15T10:01:14Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n代码文件：modelprocessors.py 247 行的函数process_multimodal_content：\n`\n            image_path = content_data.get(\"img_path\")\n            captions = content_data.get(\"img_caption\", [])\n            footnotes = content_data.get(\"img_footnote\", [])\n\n            # Build detailed visual analysis prompt\n            vision_prompt = PROMPTS[\"vision_prompt\"].format(\n                entity_name=entity_name\n                if entity_name\n                else \"unique descriptive name for this image\",\n                image_path=image_path,\n                captions=captions if captions else \"None\",\n                footnotes=footnotes if footnotes else \"None\",\n            )\n\n            # If image path exists, try to encode image\n            image_base64 = \"\"\n            if image_path and Path(image_path).exists():\n                image_base64 = self._encode_image_to_base64(image_path)\n`\n\n我发现传递进来的content_data中图片的路径是相对路径，而且函数process_multimodal_content也没有地方去获取存储图片的根路径，导致无法成功加载图片的base64。\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/31/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/29",
      "id": 3197840584,
      "node_id": "I_kwDOO3Bfkc6-myzI",
      "number": 29,
      "title": "[Question]:如何切换默认的openai llm到任意openai compatible的api？",
      "user": {
        "login": "garyxj",
        "id": 85046041,
        "node_id": "MDQ6VXNlcjg1MDQ2MDQx",
        "avatar_url": "https://avatars.githubusercontent.com/u/85046041?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/garyxj",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-03T03:16:27Z",
      "updated_at": "2025-07-03T06:49:02Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n如何切换默认的openai llm到任意openai compatible的api？\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/29/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/28",
      "id": 3197568571,
      "node_id": "I_kwDOO3Bfkc6-lwY7",
      "number": 28,
      "title": "[Bug]:Calling openai_complete_if_cache request may result in message conflicts.",
      "user": {
        "login": "hyfrom-Zhku",
        "id": 201232043,
        "node_id": "U_kgDOC_6Oqw",
        "avatar_url": "https://avatars.githubusercontent.com/u/201232043?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/hyfrom-Zhku",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-07-03T00:43:59Z",
      "updated_at": "2025-07-03T00:43:59Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [ ] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nIt seems that openai_complete_if_cache does not support receiving a massage, and keyword conflicts will occur.\nChanging it to directly call the open AI library works without any issues.\n\n<img width=\"440\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/064ddf1c-d560-4e56-8a29-d882092e4cb8\" />\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/28/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/27",
      "id": 3197334603,
      "node_id": "I_kwDOO3Bfkc6-k3RL",
      "number": 27,
      "title": "[Question]: Image Descriptions in kv_store_text_chunks.json Seem Incorrect — Where to Adjust Vision Prompt?",
      "user": {
        "login": "JJMocke",
        "id": 83647229,
        "node_id": "MDQ6VXNlcjgzNjQ3MjI5",
        "avatar_url": "https://avatars.githubusercontent.com/u/83647229?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/JJMocke",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-02T22:23:04Z",
      "updated_at": "2025-07-04T18:49:35Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi, I have a question regarding the image analysis and description output generated by the RAGAnything system.\n\n\nI converted the following [page](https://wiki.bambulab.com/en/knowledge-sharing/troubleshooting-printing-issues-backup) into a PDF and processed it with RAGAnything. When querying the RAG or inspecting the generated knowledge graph, I noticed unusual or unrelated nodes.\n\n\nWhile inspecting `kv_store_text_chunks.json`, I noticed that the visual analysis descriptions for some image chunks appear to be completely unrelated to the content. Here are two examples:\n\nBelow are 2 examples:\n\n![Image](https://github.com/user-attachments/assets/97beaef1-7699-4e2b-a029-6993f1d95997)\n\n```\n  \"chunk-6dc8a852f240d02a355e3d4eed18f9f7\": {\n    \"tokens\": 216,\n    \"content\": \"\\nImage Content Analysis:\\nImage Path: images/5f13ca0e5e0a9510b405014cf10bbbedcce9af27308bef168fd099c66349b5e9.jpg\\nCaptions: None\\nFootnotes: None\\n\\nVisual Analysis: The image showcases a scenic landscape featuring an expansive mountainous background with a clear blue sky. Dominating the foreground is a lush green valley dotted with vibrant wildflowers, indicating a flourishing ecosystem. In the center, a winding river can be seen flowing gently through the valley, reflecting the azure sky. There are no visible human figures; however, the serenity of the scene suggests a location suitable for outdoor activities such as hiking or picnicking. The colors in the image are rich and saturated, with lush greens contrasting against the deep blues of the water and sky, creating a tranquil atmosphere. The lighting appears to be natural and bright, enhancing the colors and details of the landscape. Overall, the composition is balanced, emphasizing the beauty of nature and the peaceful coexistence of its elements.\",\n    \"chunk_order_index\": 0,\n    \"full_doc_id\": \"chunk-6dc8a852f240d02a355e3d4eed18f9f7\",\n    \"file_path\": \"Bad_prints.pdf\"\n  },\n\n```\n\n![Image](https://github.com/user-attachments/assets/4afaabbd-4b56-4eea-9a9c-1c051bc72248)\n\n```\n  \"chunk-f7411eb32c33220d0f46aa042283a776\": {\n    \"tokens\": 254,\n    \"content\": \"\\nImage Content Analysis:\\nImage Path: images/aa1ebcd5c274f7dd1b35382c3b031b714be28d1a8a7dc3d25909163de4919ac3.jpg\\nCaptions: None\\nFootnotes: None\\n\\nVisual Analysis: The image encompasses a vibrant urban street scene during the day, featuring a bustling marketplace. In the foreground, a diverse group of people is seen engaging with various market stalls. There's a woman in a red dress to the left, examining fresh produce, while a man in a blue jacket interacts with a vendor behind a table laden with colorful fruits. Behind them, a child in a yellow shirt is holding a balloon, looking excited. The background displays several stalls, each adorned with bright awnings and colorful signage, contributing to an energetic atmosphere. The sky is a clear blue, illuminating the scene with natural light, creating strong contrasts between the vivid colors of the merchandise and the warm hues of the skin tones of the individuals present. The overall mood is lively and engaging, indicative of a community-centered marketplace experience. There are no texts or graphs in the image; instead, it focuses on human interactions and the vibrant environment, emphasizing the social aspect of street life.\",\n    \"chunk_order_index\": 0,\n    \"full_doc_id\": \"chunk-f7411eb32c33220d0f46aa042283a776\",\n    \"file_path\": \"Bad_prints.pdf\"\n  },\n\n```\n\nWould I need to modify the following section in [prompt.py](https://github.com/HKUDS/RAG-Anything/blob/main/raganything/prompt.py) to improve the quality of these visual descriptions?\n\n```\n`PROMPTS[\n    \"vision_prompt\"\n] = \"\"\"Please analyze this image in detail and provide a JSON response with the following structure:\n\n{{\n    \"detailed_description\": \"A comprehensive and detailed visual description of the image following these guidelines:\n    - Describe the overall composition and layout\n    - Identify all objects, people, text, and visual elements\n    - Explain relationships between elements\n    - Note colors, lighting, and visual style\n    - Describe any actions or activities shown\n    - Include technical details if relevant (charts, diagrams, etc.)\n    - Always use specific names instead of pronouns\",\n    \"entity_info\": {{\n        \"entity_name\": \"{entity_name}\",\n        \"entity_type\": \"image\",\n        \"summary\": \"concise summary of the image content and its significance (max 100 words)\"\n    }}\n}}\n\nAdditional context:\n- Image Path: {image_path}\n- Captions: {captions}\n- Footnotes: {footnotes}\n\nFocus on providing accurate, detailed visual analysis that would be useful for knowledge retrieval.\"\"\"`\n```\n\nCan anybody direct me to where I can change this to fit my case better? Below I have attached additional files. Please let me know if you need any additional info.\n\nThank you in advance.\n\n### Additional Context\n\n**Here is the pdf I used:** [Bad_prints.pdf](https://github.com/user-attachments/files/21025644/Bad_prints.pdf)\n\n**Other files in working directory:**\n[vdb_relationships.json](https://github.com/user-attachments/files/21025888/vdb_relationships.json)\n[vdb_chunks.json](https://github.com/user-attachments/files/21025891/vdb_chunks.json)\n[kv_store_llm_response_cache.json](https://github.com/user-attachments/files/21025889/kv_store_llm_response_cache.json)\n[kv_store_text_chunks.json](https://github.com/user-attachments/files/21025892/kv_store_text_chunks.json)\n[kv_store_doc_status.json](https://github.com/user-attachments/files/21025894/kv_store_doc_status.json)\n[kv_store_full_docs.json](https://github.com/user-attachments/files/21025890/kv_store_full_docs.json)\n[vdb_entities.json](https://github.com/user-attachments/files/21025893/vdb_entities.json)\n\n\n**Images extracted:**\n[images.zip](https://github.com/user-attachments/files/21025955/images.zip)\n\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/27/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/26",
      "id": 3197036600,
      "node_id": "I_kwDOO3Bfkc6-jug4",
      "number": 26,
      "title": "[Question]: How to Customize the Prompt for Knowledge Graph Generation in RAGAnything?",
      "user": {
        "login": "JJMocke",
        "id": 83647229,
        "node_id": "MDQ6VXNlcjgzNjQ3MjI5",
        "avatar_url": "https://avatars.githubusercontent.com/u/83647229?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/JJMocke",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 5,
      "created_at": "2025-07-02T20:00:25Z",
      "updated_at": "2025-08-16T19:50:30Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi, I would like to know how I can edit the prompt which generates the knowledge-graph for the RAG. For LightRAG I would edit the following code in the [prompt.py](https://github.com/HKUDS/LightRAG/blob/main/lightrag/prompt.py) file.\n\n`PROMPTS[\"DEFAULT_ENTITY_TYPES\"] = [\"organization\", \"person\", \"geo\", \"event\", \"category\"]`\n`PROMPTS[\"DEFAULT_USER_PROMPT\"] = \"n/a\"`\n`PROMPTS[\"entity_extraction\"]`\n`PROMPTS[\"entity_extraction_examples\"]`\n\nFor my use case—building a root cause analysis (RCA) system for manufacturing—I would like to define the following entity types:\n\n`[\"defects\", \"machines\", \"parameters\"]`\n\nIs there a similar way to adjust or override these prompt settings in RAGAnything?\n\nPlease let me know if you need more context or examples.\nThanks in advance!\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/26/reactions",
        "total_count": 1,
        "+1": 1,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/25",
      "id": 3194909267,
      "node_id": "I_kwDOO3Bfkc6-bnJT",
      "number": 25,
      "title": "[Question]支持本地下载好的模型吗",
      "user": {
        "login": "ZSXPROMAX",
        "id": 140056178,
        "node_id": "U_kgDOCFkWcg",
        "avatar_url": "https://avatars.githubusercontent.com/u/140056178?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ZSXPROMAX",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-07-02T07:34:59Z",
      "updated_at": "2025-07-02T09:15:30Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n我想问问RAG-Anything支持本地下载好的模型吗，不是Ollama，是我从modelscope下载的\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/25/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/24",
      "id": 3191259552,
      "node_id": "I_kwDOO3Bfkc6-NsGg",
      "number": 24,
      "title": "[Bug]: Chinese font setup in `MineruParser.parse_text_file` never works",
      "user": {
        "login": "Glinte",
        "id": 96855131,
        "node_id": "U_kgDOBcXkWw",
        "avatar_url": "https://avatars.githubusercontent.com/u/96855131?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/Glinte",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 3,
      "created_at": "2025-07-01T08:06:19Z",
      "updated_at": "2025-08-12T09:01:01Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\n...because you are trying to initialize `UnicodeCIDFont`s with font names that are not supported, but no one ever noticed this is failed because you suppressed all the errors. You used `[\"SimSun\", \"SimHei\", \"Microsoft YaHei\"]` on Windows and `[\"STSong-Light\", \"STHeiti\"]` on MacOS, but the only supported fonts to be used in `UnicodeCIDFont` is one of the 6 hardcoded values. By a quick read of the `UnicodeCIDFont` class docstring, I believe `CIDFont` should be used instead.\n\nhttps://github.com/HKUDS/RAG-Anything/blob/9c725265857da7bba7d94b3eacfe74d089b8de98/raganything/mineru_parser.py#L647-L681\n\nhttps://github.com/eduardocereto/reportlab/blob/98758940eeae30db80bbc9c555e42b8c89b86be8/src/reportlab/pdfbase/cidfonts.py#L390-L395\n\n```python\nclass UnicodeCIDFont(CIDFont):\n    def __init__(self, face, isVertical=False, isHalfWidth=False):\n        #pass\n        try:\n            lang, defaultEncoding = defaultUnicodeEncodings[face]\n        except KeyError:\n            raise KeyError(\"don't know anything about CID font %s\" % face)\n```\n\nhttps://github.com/eduardocereto/reportlab/blob/98758940eeae30db80bbc9c555e42b8c89b86be8/src/reportlab/pdfbase/_cidfontdata.py#L130-L141\n\n```python\ndefaultUnicodeEncodings = {\n    #we ddefine a default Unicode encoding for each face name;\n    #this should be the most commonly used horizontal unicode encoding;\n    #also define a 3-letter language code.\n    'HeiseiMin-W3': ('jpn','UniJIS-UCS2-H'),\n    'HeiseiKakuGo-W5': ('jpn','UniJIS-UCS2-H'),\n    'STSong-Light': ('chs', 'UniGB-UCS2-H'),\n    'MSung-Light': ('cht', 'UniGB-UCS2-H'),\n    #'MHei-Medium': ('cht', 'UniGB-UCS2-H'),\n    'HYSMyeongJo-Medium': ('kor', 'UniKS-UCS2-H'),\n    'HYGothic-Medium': ('kor','UniKS-UCS2-H'),\n    }\n```\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/24/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/21",
      "id": 3187951028,
      "node_id": "I_kwDOO3Bfkc6-BEW0",
      "number": 21,
      "title": "[Bug]:Query: What is the main content of the document? Answer: Sorry, I'm not able to provide an answer to that question.[no-context]",
      "user": {
        "login": "lucheng07082221",
        "id": 3146209,
        "node_id": "MDQ6VXNlcjMxNDYyMDk=",
        "avatar_url": "https://avatars.githubusercontent.com/u/3146209?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/lucheng07082221",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 5,
      "created_at": "2025-06-30T10:22:48Z",
      "updated_at": "2025-08-20T12:44:39Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nontent Information:\nINFO: * Total blocks in content_list: 22\nINFO: * Markdown content length: 73100 characters\nINFO: * Content block types:\nINFO:   - unknown: 22\nINFO: Content separation complete:\nINFO:   - Text content length: 0 characters\nINFO:   - Multimodal items count: 0\nINFO: Document /home/quiana/Documents/空间特殊轨道理论与设计方法.pdf processing complete!\n\nQuerying processed document:\n\nQuery: What is the main content of the document?\nAnswer: Sorry, I'm not able to provide an answer to that question.[no-context]\n\nQuery: Describe the images and figures in the document\nAnswer: Sorry, I'm not able to provide an answer to that question.[no-context]\n\nQuery: Tell me about the experimental results and data tables\nAnswer: Sorry, I'm not able to provide an answer to that question.[no-context]\n\n\nwhy？没有结果\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n_No response_\n\n### Additional Information\n\n- LightRAG Version:\n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/21/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/20",
      "id": 3185190184,
      "node_id": "I_kwDOO3Bfkc692iUo",
      "number": 20,
      "title": "[Question]: Can we replace minerU with Docling?",
      "user": {
        "login": "ayowu1981",
        "id": 197264617,
        "node_id": "U_kgDOC8IE6Q",
        "avatar_url": "https://avatars.githubusercontent.com/u/197264617?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ayowu1981",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-06-28T16:20:38Z",
      "updated_at": "2025-06-29T14:29:12Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\n_No response_\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/20/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/19",
      "id": 3181956636,
      "node_id": "I_kwDOO3Bfkc69qM4c",
      "number": 19,
      "title": "[Question]: how to visualize knowledge-graph using neo4j?",
      "user": {
        "login": "long123524",
        "id": 73996375,
        "node_id": "MDQ6VXNlcjczOTk2Mzc1",
        "avatar_url": "https://avatars.githubusercontent.com/u/73996375?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/long123524",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-06-27T08:45:51Z",
      "updated_at": "2025-06-29T14:28:07Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHow to visualize knowledge-graph using neo4j?\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/19/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/18",
      "id": 3181876195,
      "node_id": "I_kwDOO3Bfkc69p5Pj",
      "number": 18,
      "title": "[Feature Request]:如何添加neo4j的逻辑代码，我想从pdf获取实体和对应的关系，并且存入neo4j，以供我后期rag",
      "user": {
        "login": "yukaijun2001",
        "id": 112839705,
        "node_id": "U_kgDOBrnMGQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/112839705?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/yukaijun2001",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-06-27T08:20:22Z",
      "updated_at": "2025-06-29T14:26:27Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [ ] I have searched the existing feature request and this feature request is not already filed.\n- [ ] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\n_No response_\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/18/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/14",
      "id": 3177095077,
      "node_id": "I_kwDOO3Bfkc69Xp-l",
      "number": 14,
      "title": "[Question]:how this project is related to minirag",
      "user": {
        "login": "ianhe8x",
        "id": 39037239,
        "node_id": "MDQ6VXNlcjM5MDM3MjM5",
        "avatar_url": "https://avatars.githubusercontent.com/u/39037239?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ianhe8x",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-06-25T22:05:09Z",
      "updated_at": "2025-07-24T10:41:30Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [ ] I have searched the existing question and discussions and this question is not already answered.\n- [ ] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nHi team, great credit to you, awesome work.\ni want to ask, how this project is related to minirag?\ni have a project currently building on minirag. I'm concerned if i can upgrade it to rag anything afterwards, especially when i already have data indexed by minirag.\n\n### Additional Context\n\n_No response_",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/14/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/11",
      "id": 3167425550,
      "node_id": "I_kwDOO3Bfkc68yxQO",
      "number": 11,
      "title": "[Bug]: magic_pdf module missing",
      "user": {
        "login": "theauAg",
        "id": 210550372,
        "node_id": "U_kgDODIy-ZA",
        "avatar_url": "https://avatars.githubusercontent.com/u/210550372?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/theauAg",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-06-23T09:18:10Z",
      "updated_at": "2025-06-23T10:27:50Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nmagic_pdf module is not installed\n\n### Steps to reproduce\n\nInstall raganything and import RAGAnything from raganything \n\n### Expected Behavior\n\nModule\n\n### LightRAG Config Used\n\nModuleNotFoundError: No module named 'magic_pdf\n\n### Logs and screenshots\n\nModuleNotFoundError                       Traceback (most recent call last)\nCell In[1], line 2\n     \n----> 2 from raganything import RAGAnything\n  \n\nFile .venv\\Lib\\site-packages\\raganything\\__init__.py:1\n----> 1 from .raganything import RAGAnything as RAGAnything\n      3 __version__ = \"0.0.1\"\n      4 __author__ = \"Zirui Guo\"\n\nFile .venv\\Lib\\site-packages\\raganything\\raganything.py:24\n     21 from lightrag.utils import EmbeddingFunc, setup_logger\n     23 # Import parser and multimodal processors\n---> 24 from lightrag.mineru_parser import MineruParser\n     26 # Import specialized processors\n     27 from lightrag.modalprocessors import (\n     28     ImageModalProcessor,\n     29     TableModalProcessor,\n     30     EquationModalProcessor,\n     31     GenericModalProcessor,\n     32 )\n\nFile .venv\\Lib\\site-packages\\lightrag\\mineru_parser.py:52\n     49     from magic_pdf.data.read_api import read_local_office, read_local_images\n     50 else:\n     51     # MinerU imports\n---> 52     from magic_pdf.data.data_reader_writer import (\n     53         FileBasedDataWriter,\n     54         FileBasedDataReader,\n     55     )\n     56     from magic_pdf.data.dataset import PymuDocDataset\n     57     from magic_pdf.model.doc_analyze_by_custom_model import doc_analyze\n\nModuleNotFoundError: No module named 'magic_pdf\n\n### Additional Information\n\n- LightRAG Version: \n- Operating System:\n- Python Version:\n- Related Issues:\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/11/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/10",
      "id": 3165837102,
      "node_id": "I_kwDOO3Bfkc68stcu",
      "number": 10,
      "title": "[Feature Request]:",
      "user": {
        "login": "zhongc12",
        "id": 186529921,
        "node_id": "U_kgDOCx44gQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/186529921?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/zhongc12",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742545,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e0Q",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/enhancement",
          "name": "enhancement",
          "color": "a2eeef",
          "default": true,
          "description": "New feature or request"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 1,
      "created_at": "2025-06-22T10:19:14Z",
      "updated_at": "2025-06-23T10:26:01Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file a feature request?\n\n- [ ] I have searched the existing feature request and this feature request is not already filed.\n- [x] I believe this is a legitimate feature request, not just a question or bug.\n\n### Feature Request Description\n\n我想把导入和问答分开，之前lightRAG都可以但是现在一直不行；\n但目前 RAGAnything 并没有暴露一个 load_db() 之类的接口能自动从已有索引中加载 LightRAG 实例。\n\n### Additional Context\n\nimport os\nimport json\nimport asyncio\nimport traceback\nimport aiohttp\nimport requests\nimport concurrent.futures\nfrom dotenv import load_dotenv\nfrom raganything import RAGAnything\n\n# 加载环境变量\nload_dotenv()\nQWEN_API_KEY = os.getenv(\"QWEN_API_KEY\")\nEMBEDDING_API_KEY = os.getenv(\"EMBEDDING_API_KEY\")\n\nWORKING_DIR = \"./rag_storage\"\nOUTPUT_DIR = \"./output\"\n\nEMBEDDING_API_URL = \"https://api.siliconflow.cn/v1/embeddings\"\nQWEN_TEXT_URL = \"https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation\"\nQWEN_VISION_URL = \"https://dashscope.aliyuncs.com/api/v1/services/multimodal-generation/generation\"\n\n# --- 各种适配器 ---\ndef safe_serialize(obj):\n    if obj is None:\n        return None\n    if hasattr(obj, 'to_dict'):\n        return obj.to_dict()\n    if hasattr(obj, '__dict__'):\n        return {k: safe_serialize(v) for k, v in obj.__dict__.items()}\n    if isinstance(obj, (list, tuple)):\n        return [safe_serialize(i) for i in obj]\n    if isinstance(obj, dict):\n        return {k: safe_serialize(v) for k, v in obj.items()}\n    return str(obj)\n\nasync def qwen_complete_async(prompt, system_prompt=None, history_messages=None, **kwargs):\n    headers = {\"Content-Type\": \"application/json\", \"Authorization\": f\"Bearer {QWEN_API_KEY}\"}\n    messages = []\n    if system_prompt:\n        messages.append({\"role\": \"system\", \"content\": system_prompt})\n    if history_messages:\n        messages.extend(history_messages)\n    messages.append({\"role\": \"user\", \"content\": prompt})\n    data = {\n        \"model\": \"qwen-plus\",\n        \"input\": {\"messages\": messages},\n        \"parameters\": kwargs.get(\"parameters\", {\"result_format\": \"text\"})\n    }\n    async with aiohttp.ClientSession() as session:\n        async with session.post(QWEN_TEXT_URL, headers=headers, json=data, timeout=120) as resp:\n            res_json = await resp.json()\n            return res_json.get(\"output\", {}).get(\"text\", \"通义千问返回异常\")\n\nasync def qwen_vision_complete_async(prompt, image_data=None, system_prompt=None, **kwargs):\n    headers = {\"Content-Type\": \"application/json\", \"Authorization\": f\"Bearer {QWEN_API_KEY}\"}\n    content = [{\"text\": prompt}]\n    if image_data:\n        content.insert(0, {\"image\": image_data})\n    messages = [{\"role\": \"user\", \"content\": content}]\n    if system_prompt:\n        messages.insert(0, {\"role\": \"system\", \"content\": system_prompt})\n    data = {\n        \"model\": \"qwen-vl-plus\",\n        \"input\": {\"messages\": messages},\n        \"parameters\": {\"result_format\": \"text\"}\n    }\n    async with aiohttp.ClientSession() as session:\n        async with session.post(QWEN_VISION_URL, headers=headers, json=data, timeout=120) as resp:\n            res_json = await resp.json()\n            return res_json.get(\"output\", {}).get(\"text\", \"通义视觉API返回异常\")\n\ndef embedding_sync(texts):\n    headers = {\"Content-Type\": \"application/json\", \"Authorization\": f\"Bearer {EMBEDDING_API_KEY}\"}\n    data = {\"input\": texts, \"model\": \"BAAI/bge-m3\"}\n    resp = requests.post(EMBEDDING_API_URL, headers=headers, json=data, timeout=60)\n    return [item[\"embedding\"] for item in resp.json().get(\"data\", [])]\n\nasync def embedding_adapter(texts):\n    loop = asyncio.get_running_loop()\n    with concurrent.futures.ThreadPoolExecutor() as pool:\n        return await loop.run_in_executor(pool, embedding_sync, texts)\n\nasync def llm_adapter(prompt, system_prompt=None, history_messages=None, **kwargs):\n    return await qwen_complete_async(prompt, system_prompt, history_messages, **kwargs)\n\nasync def vision_adapter(prompt, image_data=None, system_prompt=None, **kwargs):\n    if image_data:\n        return await qwen_vision_complete_async(prompt, image_data, system_prompt, **kwargs)\n    return await llm_adapter(prompt, system_prompt, **kwargs)\n\n# --- 主函数 ---\nasync def main():\n    rag = RAGAnything(\n        working_dir=WORKING_DIR,\n        llm_model_func=llm_adapter,\n        vision_model_func=vision_adapter,\n        embedding_func=embedding_adapter,\n        embedding_dim=1024,\n        max_token_size=8192,\n    )\n\n    # ⚠️ 使用已有向量数据库，跳过文档处理\n    print(\"📄 加载现有索引并开始问答...\")\n\n    queries = [\n        \"提取图中关键内容\",\n        \"请简述文档的主要观点\",\n        \"这个图片说明了什么？\"\n    ]\n\n    for i, query in enumerate(queries, 1):\n        try:\n            print(f\"\\n❓ 问题 #{i}: {query}\")\n            answer = await rag.query_with_multimodal(query, mode=\"hybrid\")\n            print(\"✅ 答案:\", answer)\n        except Exception as e:\n            print(f\"❌ 查询出错: {e}\")\n            traceback.print_exc()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n\nD:\\anaconda\\envs\\raganything\\python.exe D:\\RAGanything\\RAG-Anything-main\\examples\\问答.py \n📄 加载现有索引并开始问答...\n\n❓ 问题 #1: 提取图中关键内容\n❌ 查询出错: No LightRAG instance available. Please either:\n1. Provide a pre-initialized LightRAG instance when creating RAGAnything, or\n2. Process documents first using process_document_complete() or process_folder_complete() to create and populate the LightRAG instance.\n\n❓ 问题 #2: 请简述文档的主要观点\n❌ 查询出错: No LightRAG instance available. Please either:\n1. Provide a pre-initialized LightRAG instance when creating RAGAnything, or\n2. Process documents first using process_document_complete() or process_folder_complete() to create and populate the LightRAG instance.\n\n❓ 问题 #3: 这个图片说明了什么？\n❌ 查询出错: No LightRAG instance available. Please either:\n1. Provide a pre-initialized LightRAG instance when creating RAGAnything, or\n2. Process documents first using process_document_complete() or process_folder_complete() to create and populate the LightRAG instance.",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/10/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/9",
      "id": 3161461460,
      "node_id": "I_kwDOO3Bfkc68cBLU",
      "number": 9,
      "title": "[Bug]:Error processing image content: 'NoneType' object is not callable",
      "user": {
        "login": "NikhielRahulSingh",
        "id": 113216325,
        "node_id": "U_kgDOBr-LRQ",
        "avatar_url": "https://avatars.githubusercontent.com/u/113216325?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/NikhielRahulSingh",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-06-19T22:36:16Z",
      "updated_at": "2025-08-25T17:27:41Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nfailed to process image contained in pdf file\n\n### Steps to reproduce\n\ncreate a pdf file with words and 1 image\n\n### Expected Behavior\n\nthe image content should not have been none\n\n### LightRAG Config Used\n\n# Paste your config here\nlatest\n\n### Logs and screenshots\n\nINFO: Starting complete document processing: test/Titanic.pdf\nINFO: Starting document parsing: test/Titanic.pdf\nINFO: Detected PDF file, using PDF parser (method=auto)...\nINFO: Parsing complete! Extracted 5 content blocks\nINFO: Markdown text length: 1040 characters\nINFO: \nContent Information:\nINFO: * Total blocks in content_list: 5\nINFO: * Markdown content length: 1040 characters\nINFO: * Content block types:\nINFO:   - text: 4\nINFO:   - image: 1\nINFO: Content separation complete:\nINFO:   - Text content length: 960 characters\nINFO:   - Multimodal items count: 1\nINFO:   - Multimodal type distribution: {'image': 1}\nINFO: Starting text content insertion into LightRAG...\nINFO: Text content insertion complete\nINFO: Starting multimodal content processing...\nINFO: Processing item 1/1: image content\nError processing image content: 'NoneType' object is not callable\nINFO: image processing complete: image_6359b2d39a77de43682988592cc69472\nINFO: Multimodal content processing complete\nINFO: Document test/Titanic.pdf processing complete!\nMinerU command executed successfully\n\n### Additional Information\n\n- LightRAG Version:Latest\n- Operating System: Windows\n- Python Version:12.8\n- Related Issues: None\n\nI am using Ollama with Gemma3:latest\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/9/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/6",
      "id": 3158176392,
      "node_id": "I_kwDOO3Bfkc68PfKI",
      "number": 6,
      "title": "[Bug]:CategoryType returns int, breaks _separate_content content type detection",
      "user": {
        "login": "micos-tfyxz",
        "id": 188126094,
        "node_id": "U_kgDOCzaTjg",
        "avatar_url": "https://avatars.githubusercontent.com/u/188126094?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/micos-tfyxz",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742529,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9ewQ",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/bug",
          "name": "bug",
          "color": "d73a4a",
          "default": true,
          "description": "Something isn't working"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 5,
      "created_at": "2025-06-18T21:12:01Z",
      "updated_at": "2025-07-01T16:08:56Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to file an issue?\n\n- [x] I have searched the existing issues and this bug is not already filed.\n- [x] I believe this is a legitimate bug, not just a question or feature request.\n\n### Describe the bug\n\nIn MinerU, the output results use CategoryType to define content types, but they are returned as numeric values instead of readable type names. This causes no issue when inspecting the generated JSON file, but in the raganything.py function _separate_content, the code：\n{\nfor item in content_list:  \n    content_type = item.get(\"type\", \"text\")  \n}\ncannot find any valid type field. As a result, the content is not properly separated, and no content gets processed in the later stages.\n\n### Steps to reproduce\n\n_No response_\n\n### Expected Behavior\n\n_No response_\n\n### LightRAG Config Used\n\n# Paste your config here\n\n\n### Logs and screenshots\n\n![Image](https://github.com/user-attachments/assets/81096e2a-2db5-4ef9-9d5b-fe216a604ac5)\n\n### Additional Information\n\ncontent_list:\n![Image](https://github.com/user-attachments/assets/46e9b65f-a983-4902-b08f-011c68633293)\nMinerU:\n![Image](https://github.com/user-attachments/assets/950b0f80-b241-477c-821d-4710f05ca263)\njson:\n![Image](https://github.com/user-attachments/assets/089a5265-a116-422c-8737-bc8383017f85)\n",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/6/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/3",
      "id": 3155404148,
      "node_id": "I_kwDOO3Bfkc68E6V0",
      "number": 3,
      "title": "[Question]: is this still good for excel file with embedded images?",
      "user": {
        "login": "imrankh46",
        "id": 103720343,
        "node_id": "U_kgDOBi6llw",
        "avatar_url": "https://avatars.githubusercontent.com/u/103720343?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/imrankh46",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 2,
      "created_at": "2025-06-18T04:13:16Z",
      "updated_at": "2025-06-19T09:37:21Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nDo you think, is this best solution, if we have excel file with embedded images?\n\nIs this current framework good for such type of document!\n\n### Additional Context\n\n---",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/3/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/2",
      "id": 3155403193,
      "node_id": "I_kwDOO3Bfkc68E6G5",
      "number": 2,
      "title": "We have excel files with embedded images,",
      "user": {
        "login": "imrankh46",
        "id": 103720343,
        "node_id": "U_kgDOBi6llw",
        "avatar_url": "https://avatars.githubusercontent.com/u/103720343?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/imrankh46",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "labels": {
        "0": {
          "id": 8737742562,
          "node_id": "LA_kwDOO3Bfkc8AAAACCM9e4g",
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything/labels/question",
          "name": "question",
          "color": "d876e3",
          "default": true,
          "description": "Further information is requested"
        }
      },
      "state": "open",
      "locked": false,
      "assignee": null,
      "assignees": {},
      "milestone": null,
      "comments": 0,
      "created_at": "2025-06-18T04:12:33Z",
      "updated_at": "2025-06-18T04:12:33Z",
      "closed_at": null,
      "author_association": "NONE",
      "type": null,
      "active_lock_reason": null,
      "sub_issues_summary": {
        "total": 0,
        "completed": 0,
        "percent_completed": 0
      },
      "issue_dependencies_summary": {
        "blocked_by": 0,
        "total_blocked_by": 0,
        "blocking": 0,
        "total_blocking": 0
      },
      "body": "### Do you need to ask a question?\n\n- [x] I have searched the existing question and discussions and this question is not already answered.\n- [x] I believe this is a legitimate question, not just a bug or feature request.\n\n### Your Question\n\nDo you think, is this best solution, if we have excel file with embedded images?\n\nIs this current framework good for such type of document!\n\n### Additional Context\n\n---",
      "closed_by": null,
      "reactions": {
        "url": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/2/reactions",
        "total_count": 0,
        "+1": 0,
        "-1": 0,
        "laugh": 0,
        "hooray": 0,
        "confused": 0,
        "heart": 0,
        "rocket": 0,
        "eyes": 0
      },
      "performed_via_github_app": null,
      "state_reason": null,
      "pinned_comment": null,
      "linked_prs": []
    }
  ],
  "pulls": [
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/199",
      "id": 3289974109,
      "node_id": "PR_kwDOO3Bfkc7EGQVd",
      "number": 199,
      "state": "open",
      "locked": false,
      "title": "feat(parser): add optional PaddleOCR backend",
      "user": {
        "login": "SaqlainXoas",
        "id": 104307095,
        "node_id": "U_kgDOBjeZlw",
        "avatar_url": "https://avatars.githubusercontent.com/u/104307095?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/SaqlainXoas",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "## Description\nThis PR adds an optional PaddleOCR parser backend to RAG-Anything while keeping default MinerU/Docling behavior unchanged.\n\nThe change is intentionally minimal and focused on:\n- adding `parser=\"paddleocr\"` support,\n- preserving optional dependency behavior (lazy imports),\n- keeping output compatible with existing `content_list` processing,\n- updating docs/config/examples,\n- adding CI-safe tests.\n\n## Related Issues\nRefs #178\n\n## Changes Made\n- Added `PaddleOCRParser` in `raganything/parser.py` with:\n  - lazy `paddleocr` import (no import-time hard dependency),\n  - support for both `ocr(...)` and `predict(...)` call styles,\n  - PDF OCR path using `pypdfium2` page rendering,\n  - normalized text-block `content_list` output including `page_idx`.\n- Added centralized parser registry/factory:\n  - `SUPPORTED_PARSERS = (\"mineru\", \"docling\", \"paddleocr\")`\n  - `get_parser(parser_type)`\n- Switched parser selection wiring to the shared factory in:\n  - `raganything/raganything.py`\n  - `raganything/processor.py`\n  - `raganything/batch_parser.py`\n  - parser CLI (`raganything/parser.py`)\n- Updated optional dependencies:\n  - `pyproject.toml`: `paddleocr` extra (`paddleocr`, `pypdfium2`)\n  - `setup.py`: matching `extras_require` updates\n- Updated docs/config/examples:\n  - `README.md`\n  - `docs/batch_processing.md`\n  - `env.example`\n  - `raganything/config.py`\n  - `examples/raganything_example.py`\n  - `examples/batch_dry_run_example.py`\n- Added tests:\n  - `tests/testpaddleocr_parser.py`\n  - `tests/testparser_wiring.py`\n\n## Checklist\n- [x] Changes tested locally\n- [x] Code reviewed\n- [x] Documentation updated (if necessary)\n- [x] Unit tests added (if applicable)\n\n## Additional Notes\nLocal validation commands:\n- `.venv/bin/python -m pytest -q tests/testpaddleocr_parser.py tests/testparser_wiring.py` -> 11 passed\n- `.venv/bin/ruff check raganything/parser.py raganything/batch_parser.py raganything/processor.py raganything/config.py tests/testpaddleocr_parser.py tests/testparser_wiring.py` -> passed\n\nNote: `pytest -q` over the whole repository still discovers existing `examples/*_test.py` scripts that expect a `file_path` fixture. This is pre-existing and unrelated to this PR.\n",
      "created_at": "2026-02-16T13:17:36Z",
      "updated_at": "2026-02-16T13:17:36Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": "a5b159613a73a947c9f7c81f740271169f0cb0d6",
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "SaqlainXoas:feat/paddleocr-parser",
        "ref": "feat/paddleocr-parser",
        "sha": "1419979230b9e82499c0c3a5d6e82b1afafd0dd4",
        "user": {
          "login": "SaqlainXoas",
          "id": 104307095,
          "node_id": "U_kgDOBjeZlw",
          "avatar_url": "https://avatars.githubusercontent.com/u/104307095?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/SaqlainXoas",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1159217278,
          "node_id": "R_kgDORRhAfg",
          "name": "RAG-Anything",
          "full_name": "SaqlainXoas/RAG-Anything",
          "private": false,
          "owner": {
            "login": "SaqlainXoas",
            "id": 104307095,
            "node_id": "U_kgDOBjeZlw",
            "avatar_url": "https://avatars.githubusercontent.com/u/104307095?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/SaqlainXoas",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": true,
          "url": "https://api.github.com/repos/SaqlainXoas/RAG-Anything",
          "created_at": "2026-02-16T13:13:23Z",
          "updated_at": "2026-02-16T13:13:23Z",
          "pushed_at": "2026-02-16T13:13:59Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3180,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": null,
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 0,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {},
          "visibility": "public",
          "forks": 0,
          "open_issues": 0,
          "watchers": 0,
          "default_branch": "main"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "4bec8f56869755181391ae028d2b6d4293951b07",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/199"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/199"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/199"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/199/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/199/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/199/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/1419979230b9e82499c0c3a5d6e82b1afafd0dd4"
        }
      },
      "author_association": "NONE",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": [
        1
      ]
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/198",
      "id": 3284844824,
      "node_id": "PR_kwDOO3Bfkc7DysEY",
      "number": 198,
      "state": "open",
      "locked": false,
      "title": "fix: use a single docling command for json and md formats",
      "user": {
        "login": "wkpark",
        "id": 232347,
        "node_id": "MDQ6VXNlcjIzMjM0Nw==",
        "avatar_url": "https://avatars.githubusercontent.com/u/232347?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/wkpark",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "## Description\r\nCall docling once for both JSON and Markdown\r\n\r\nChanged DoclingParser to get both JSON and Markdown formats in one command.\r\n\r\n## Checklist\r\n\r\n- [x] Changes tested locally\r\n- [x] Code reviewed\r\n",
      "created_at": "2026-02-14T16:15:17Z",
      "updated_at": "2026-02-14T16:36:57Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": "f28c67b27e76dd27355e94fa1e7e8f472febf87b",
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "wkpark:fix-docling-cmd",
        "ref": "fix-docling-cmd",
        "sha": "98b07e7151f8b993892b722931bad3a970b70b7c",
        "user": {
          "login": "wkpark",
          "id": 232347,
          "node_id": "MDQ6VXNlcjIzMjM0Nw==",
          "avatar_url": "https://avatars.githubusercontent.com/u/232347?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/wkpark",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1157986949,
          "node_id": "R_kgDORQV6hQ",
          "name": "RAG-Anything",
          "full_name": "wkpark/RAG-Anything",
          "private": false,
          "owner": {
            "login": "wkpark",
            "id": 232347,
            "node_id": "MDQ6VXNlcjIzMjM0Nw==",
            "avatar_url": "https://avatars.githubusercontent.com/u/232347?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/wkpark",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": true,
          "url": "https://api.github.com/repos/wkpark/RAG-Anything",
          "created_at": "2026-02-14T16:09:30Z",
          "updated_at": "2026-02-14T16:09:30Z",
          "pushed_at": "2026-02-15T17:12:30Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3193,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": null,
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 2,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {},
          "visibility": "public",
          "forks": 0,
          "open_issues": 2,
          "watchers": 0,
          "default_branch": "main"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "4bec8f56869755181391ae028d2b6d4293951b07",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/198"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/198"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/198"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/198/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/198/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/198/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/98b07e7151f8b993892b722931bad3a970b70b7c"
        }
      },
      "author_association": "NONE",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/197",
      "id": 3284081470,
      "node_id": "PR_kwDOO3Bfkc7Dvxs-",
      "number": 197,
      "state": "open",
      "locked": false,
      "title": "Fix potential path traversal and local file read vulnerabilities",
      "user": {
        "login": "RinZ27",
        "id": 222222878,
        "node_id": "U_kgDODT7aHg",
        "avatar_url": "https://avatars.githubusercontent.com/u/222222878?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/RinZ27",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "## Description\n\nFixed multiple security vulnerabilities related to insecure path handling in both the document parsing phase and the multimodal query phase. These changes ensure that the system does not accidentally read or leak sensitive local files when processing untrusted document content or retrieval context.\n\n## Related Issues\n\nNone.\n\n## Changes Made\n\n- **raganything/parser.py**: Added a boundary check in `MinerUParser._read_output_files` to ensure that resolved image paths from MinerU's output JSON are strictly within the intended output directory.\n- **raganything/query.py**: Enhanced `_process_image_paths_for_vlm` to validate that any image paths matched in the retrieval context (via \"Image Path:\" markers) reside within safe, predefined directories (CWD, working directory, or output directory).\n- **raganything/utils.py**: Hardened `validate_image_file` to explicitly block symbolic links, preventing symlink-based path traversal attacks.\n\n## Checklist\n\n- [x] Changes reviewed for security impact\n- [x] Code follows project conventions\n- [x] Path sanitization logic tested against boundary cases\n\n## Additional Notes\n\nI noticed the VLM enhanced query mode uses regex to find image paths in the retrieved context. While powerful, this mechanism is susceptible to indirect prompt injection if a malicious document contains text like `Image Path: /etc/passwd`. These fixes add several layers of defense-in-depth to mitigate this and similar risks.",
      "created_at": "2026-02-14T08:33:51Z",
      "updated_at": "2026-02-14T08:33:51Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": "c91358770def9ff65cc2a5c1b239f9104dc5cbbd",
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "RinZ27:fix/path-traversal-vulnerabilities",
        "ref": "fix/path-traversal-vulnerabilities",
        "sha": "af0ecbe5e4c27a960986457bab930ead937ea43c",
        "user": {
          "login": "RinZ27",
          "id": 222222878,
          "node_id": "U_kgDODT7aHg",
          "avatar_url": "https://avatars.githubusercontent.com/u/222222878?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/RinZ27",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1157752022,
          "node_id": "R_kgDORQHk1g",
          "name": "RAG-Anything",
          "full_name": "RinZ27/RAG-Anything",
          "private": false,
          "owner": {
            "login": "RinZ27",
            "id": 222222878,
            "node_id": "U_kgDODT7aHg",
            "avatar_url": "https://avatars.githubusercontent.com/u/222222878?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/RinZ27",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": true,
          "url": "https://api.github.com/repos/RinZ27/RAG-Anything",
          "created_at": "2026-02-14T08:33:03Z",
          "updated_at": "2026-02-14T08:33:03Z",
          "pushed_at": "2026-02-14T08:33:23Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3197,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": null,
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 0,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {},
          "visibility": "public",
          "forks": 0,
          "open_issues": 0,
          "watchers": 0,
          "default_branch": "main"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "4bec8f56869755181391ae028d2b6d4293951b07",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/197"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/197"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/197"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/197/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/197/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/197/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/af0ecbe5e4c27a960986457bab930ead937ea43c"
        }
      },
      "author_association": "NONE",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/195",
      "id": 3265006858,
      "node_id": "PR_kwDOO3Bfkc7CnA0K",
      "number": 195,
      "state": "open",
      "locked": false,
      "title": "feat(parser): add remote URL support for DoclingParser",
      "user": {
        "login": "bueno12223",
        "id": 47345579,
        "node_id": "MDQ6VXNlcjQ3MzQ1NTc5",
        "avatar_url": "https://avatars.githubusercontent.com/u/47345579?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/bueno12223",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "- Implemented URL detection and secure downloading in Parser base class.\r\n- Added temporary file handling with automatic cleanup in DoclingParser.\r\n- Added user-agent headers to prevent 403 errors during document retrieval.\r\n- Included verification script for automated URL parsing tests.\r\n- add docling v2.72.0 to requirements\r\n- Closes #183\r\n\r\n## Description\r\nThis PR implements the ability to parse documents directly from a URL using the `DoclingParser`. It allows the RAG pipeline to ingest remote resources seamlessly by handling the download and cleanup process automatically.\r\n\r\n## Related Issues\r\nCloses #183\r\n\r\n## Changes Made\r\n* **`Parser` base class**: Added `_is_url()` for detection and `_download_file()` to handle retrieval with custom User-Agent headers.\r\n* **`DoclingParser` class**: Integrated the URL workflow into the `parse_document` method, using `try...finally` to ensure disk cleanup of temporary files.\r\n* **Verification script**: A verification script is included at scripts/test_url_parsing.py. I'm happy to remove it if you prefer to keep the scripts folder strictly for core utilities.\r\n\r\n## Checklist\r\n- [x] Changes tested locally\r\n- [x] Code reviewed\r\n- [x] Documentation updated (if necessary)\r\n- [x] Unit tests added (if applicable)\r\n\r\n## Additional Notes\r\nThe implementation mimics a browser User-Agent to avoid `403 Forbidden` errors from common document hosts (like S3 or GitHub).",
      "created_at": "2026-02-10T00:33:50Z",
      "updated_at": "2026-02-11T21:51:22Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": "2031cb187e994462ab73779bde90f7bf9ee3d792",
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "bueno12223:feat/parser-url-support",
        "ref": "feat/parser-url-support",
        "sha": "758ea1131a3ee1bb2597ec1a37cd1f2b9e889d19",
        "user": {
          "login": "bueno12223",
          "id": 47345579,
          "node_id": "MDQ6VXNlcjQ3MzQ1NTc5",
          "avatar_url": "https://avatars.githubusercontent.com/u/47345579?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/bueno12223",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1154039627,
          "node_id": "R_kgDORMk_Sw",
          "name": "RAG-Anything",
          "full_name": "bueno12223/RAG-Anything",
          "private": false,
          "owner": {
            "login": "bueno12223",
            "id": 47345579,
            "node_id": "MDQ6VXNlcjQ3MzQ1NTc5",
            "avatar_url": "https://avatars.githubusercontent.com/u/47345579?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/bueno12223",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": true,
          "url": "https://api.github.com/repos/bueno12223/RAG-Anything",
          "created_at": "2026-02-10T00:20:10Z",
          "updated_at": "2026-02-10T00:20:10Z",
          "pushed_at": "2026-02-11T21:49:05Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3208,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": null,
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 0,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {},
          "visibility": "public",
          "forks": 0,
          "open_issues": 0,
          "watchers": 0,
          "default_branch": "main"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "4bec8f56869755181391ae028d2b6d4293951b07",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/195"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/195"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/195"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/195/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/195/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/195/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/758ea1131a3ee1bb2597ec1a37cd1f2b9e889d19"
        }
      },
      "author_association": "NONE",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": [
        1,
        183
      ]
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/190",
      "id": 3185201515,
      "node_id": "PR_kwDOO3Bfkc692lFr",
      "number": 190,
      "state": "open",
      "locked": false,
      "title": "Feat: Released FastAPI Service for RAG-Anything",
      "user": {
        "login": "LaansDole",
        "id": 85084360,
        "node_id": "MDQ6VXNlcjg1MDg0MzYw",
        "avatar_url": "https://avatars.githubusercontent.com/u/85084360?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/LaansDole",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "<!--\r\nThanks for contributing to RAGAnything!\r\n\r\nPlease ensure your pull request is ready for review before submitting.\r\n\r\nAbout this template\r\n\r\nThis template helps contributors provide a clear and concise description of their changes. Feel free to adjust it as needed.\r\n-->\r\n\r\n## Description\r\n\r\nSpin up a minimal API server to query and process documents using RAG-Anything with any OpenAI-compatible backend (LM Studio, Ollama, vLLM, DeepSeek, etc.).\r\n\r\nQuick start (using uv):\r\n\r\n```bash\r\n# Install FastAPI and Uvicorn into the existing uv environment\r\nuv sync\r\n\r\n# Run the server (reload for dev)\r\nuv run uvicorn api.app:app --reload\r\n# or using make\r\nmake server\r\n```\r\n\r\n**Command Reference:**\r\n\r\n| Action | Make Command | Full UV Command |\r\n|--------|--------------|-----------------|\r\n| **Start Server** | `make server` | `uv run uvicorn api.app:app --reload` |\r\n| **Run Integration Test** | `make integration-test` | `uv run python api/core_endpoint_test.py api/datasets/patient_records_small.xlsx` |\r\n| **Run Mock Test** | `make mock-test` | `uv run python api/core_endpoint_test.py api/datasets/medical_symptoms_small.xlsx` |\r\n| **Dev Mode** | `make dev` | `uv run uvicorn api.app:app &` |\r\n| **Stop Server** | `make stop` | `pkill -f \"uvicorn api.app:app\"` |\r\n\r\n\r\n## Related Issues\r\n\r\n[Reference any related issues or tasks addressed by this pull request.]\r\n\r\n## Changes Made\r\n\r\n- FastAPI services based on RAG-Anything features\r\n- Enhancement on Excel processing supports\r\n\r\n## Checklist\r\n\r\n- [x] Changes tested locally\r\n- [x] Code reviewed\r\n- [x] Documentation updated (if necessary)\r\n- [x] Unit tests added (if applicable)\r\n\r\n## Additional Notes\r\n\r\n[Add any additional notes or context for the reviewer(s).]\r\n",
      "created_at": "2026-01-18T10:38:53Z",
      "updated_at": "2026-02-08T06:14:20Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": "f3fd4028b502e8f9d77da0144241bba2f6c98388",
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "LaansDole:feat/service",
        "ref": "feat/service",
        "sha": "90077f55e98afd3156cd32def5d8a8c163a4c233",
        "user": {
          "login": "LaansDole",
          "id": 85084360,
          "node_id": "MDQ6VXNlcjg1MDg0MzYw",
          "avatar_url": "https://avatars.githubusercontent.com/u/85084360?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/LaansDole",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1047820422,
          "node_id": "R_kgDOPnR4hg",
          "name": "RAG-Anything",
          "full_name": "LaansDole/RAG-Anything",
          "private": false,
          "owner": {
            "login": "LaansDole",
            "id": 85084360,
            "node_id": "MDQ6VXNlcjg1MDg0MzYw",
            "avatar_url": "https://avatars.githubusercontent.com/u/85084360?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/LaansDole",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "Contributor of RAG-Anything: All-in-One RAG System + LMStudio",
          "fork": true,
          "url": "https://api.github.com/repos/LaansDole/RAG-Anything",
          "created_at": "2025-08-31T10:03:17Z",
          "updated_at": "2026-01-28T14:17:54Z",
          "pushed_at": "2026-01-28T14:17:48Z",
          "homepage": "",
          "size": 4911,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": "Python",
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 0,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "fastapi",
            "1": "lmstudio",
            "2": "rag",
            "3": "streamlit"
          },
          "visibility": "public",
          "forks": 0,
          "open_issues": 0,
          "watchers": 0,
          "default_branch": "feat/service"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "4bec8f56869755181391ae028d2b6d4293951b07",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/190"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/190"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/190"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/190/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/190/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/190/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/90077f55e98afd3156cd32def5d8a8c163a4c233"
        }
      },
      "author_association": "CONTRIBUTOR",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": []
    },
    {
      "url": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/189",
      "id": 3161170351,
      "node_id": "PR_kwDOO3Bfkc68a6Gv",
      "number": 189,
      "state": "open",
      "locked": false,
      "title": "fix: discover MinerU output folder instead of constructing from method",
      "user": {
        "login": "bryanthrasher",
        "id": 64610,
        "node_id": "MDQ6VXNlcjY0NjEw",
        "avatar_url": "https://avatars.githubusercontent.com/u/64610?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/bryanthrasher",
        "type": "User",
        "user_view_type": "public",
        "site_admin": false
      },
      "body": "MinerU creates output folders with different naming conventions depending on the backend used:\r\n- pipeline backend: \"{method}/\" (e.g., \"auto/\")\r\n- vlm-* backends: \"vlm/\"\r\n- hybrid-* backends: \"hybrid_{method}/\" (e.g., \"hybrid_auto/\")\r\n\r\nThe _read_output_files() method was constructing the path using only the method parameter, which caused failures when using the default hybrid-* backend (the most common case).\r\n\r\nThis fix discovers the actual output folder by searching for the *_content_list.json file within subdirectories, with a fallback to the original method-based path for backwards compatibility.\r\n\r\nFixes #186\r\n\r\n🤖 Generated with [Claude Code](https://claude.com/claude-code)\r\n\r\n<!--\r\nThanks for contributing to RAGAnything!\r\n\r\nPlease ensure your pull request is ready for review before submitting.\r\n\r\nAbout this template\r\n\r\nThis template helps contributors provide a clear and concise description of their changes. Feel free to adjust it as needed.\r\n-->\r\n\r\n## Description\r\n\r\n[Briefly describe the changes made in this pull request.]\r\n\r\n## Related Issues\r\n\r\n[Reference any related issues or tasks addressed by this pull request.]\r\n\r\n## Changes Made\r\n\r\n[List the specific changes made in this pull request.]\r\n\r\n## Checklist\r\n\r\n- [ ] Changes tested locally\r\n- [ ] Code reviewed\r\n- [ ] Documentation updated (if necessary)\r\n- [ ] Unit tests added (if applicable)\r\n\r\n## Additional Notes\r\n\r\n[Add any additional notes or context for the reviewer(s).]\r\n",
      "created_at": "2026-01-09T23:28:25Z",
      "updated_at": "2026-01-13T11:53:53Z",
      "closed_at": null,
      "merged_at": null,
      "merge_commit_sha": null,
      "assignee": null,
      "assignees": {},
      "requested_reviewers": {},
      "requested_teams": {},
      "labels": {},
      "milestone": null,
      "draft": false,
      "head": {
        "label": "bryanthrasher:fix/mineru-output-folder-discovery",
        "ref": "fix/mineru-output-folder-discovery",
        "sha": "a82d31800f216f4af1dcd96928bd50da19057720",
        "user": {
          "login": "bryanthrasher",
          "id": 64610,
          "node_id": "MDQ6VXNlcjY0NjEw",
          "avatar_url": "https://avatars.githubusercontent.com/u/64610?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/bryanthrasher",
          "type": "User",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 1130384234,
          "node_id": "R_kgDOQ2BLag",
          "name": "RAG-Anything",
          "full_name": "bryanthrasher/RAG-Anything",
          "private": false,
          "owner": {
            "login": "bryanthrasher",
            "id": 64610,
            "node_id": "MDQ6VXNlcjY0NjEw",
            "avatar_url": "https://avatars.githubusercontent.com/u/64610?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/bryanthrasher",
            "type": "User",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": true,
          "url": "https://api.github.com/repos/bryanthrasher/RAG-Anything",
          "created_at": "2026-01-08T12:34:27Z",
          "updated_at": "2026-01-08T12:34:27Z",
          "pushed_at": "2026-01-08T12:38:41Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 2728,
          "stargazers_count": 0,
          "watchers_count": 0,
          "language": null,
          "has_issues": false,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": false,
          "forks_count": 0,
          "archived": false,
          "disabled": false,
          "open_issues_count": 0,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {},
          "visibility": "public",
          "forks": 0,
          "open_issues": 0,
          "watchers": 0,
          "default_branch": "main"
        }
      },
      "base": {
        "label": "HKUDS:main",
        "ref": "main",
        "sha": "9c15ba5de6195733ead45d1cfd28c6f1c0fbc683",
        "user": {
          "login": "HKUDS",
          "id": 118165258,
          "node_id": "O_kgDOBwsPCg",
          "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/HKUDS",
          "type": "Organization",
          "user_view_type": "public",
          "site_admin": false
        },
        "repo": {
          "id": 997220241,
          "node_id": "R_kgDOO3BfkQ",
          "name": "RAG-Anything",
          "full_name": "HKUDS/RAG-Anything",
          "private": false,
          "owner": {
            "login": "HKUDS",
            "id": 118165258,
            "node_id": "O_kgDOBwsPCg",
            "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/HKUDS",
            "type": "Organization",
            "user_view_type": "public",
            "site_admin": false
          },
          "description": "\"RAG-Anything: All-in-One RAG Framework\"",
          "fork": false,
          "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
          "created_at": "2025-06-06T06:47:29Z",
          "updated_at": "2026-02-17T02:38:43Z",
          "pushed_at": "2026-01-26T09:09:21Z",
          "homepage": "http://arxiv.org/abs/2510.12323",
          "size": 3273,
          "stargazers_count": 13485,
          "watchers_count": 13485,
          "language": "Python",
          "has_issues": true,
          "has_projects": true,
          "has_downloads": true,
          "has_wiki": false,
          "has_pages": false,
          "has_discussions": true,
          "forks_count": 1612,
          "archived": false,
          "disabled": false,
          "open_issues_count": 102,
          "license": {
            "key": "mit",
            "name": "MIT License",
            "spdx_id": "MIT",
            "url": "https://api.github.com/licenses/mit",
            "node_id": "MDc6TGljZW5zZTEz"
          },
          "allow_forking": true,
          "is_template": false,
          "web_commit_signoff_required": false,
          "has_pull_requests": true,
          "pull_request_creation_policy": "all",
          "topics": {
            "0": "multi-modal-rag",
            "1": "retrieval-augmented-generation"
          },
          "visibility": "public",
          "forks": 1612,
          "open_issues": 102,
          "watchers": 13485,
          "default_branch": "main"
        }
      },
      "_links": {
        "self": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/189"
        },
        "html": {
          "href": "https://github.com/HKUDS/RAG-Anything/pull/189"
        },
        "issue": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/189"
        },
        "comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/issues/189/comments"
        },
        "review_comments": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/189/comments"
        },
        "review_comment": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/comments{/number}"
        },
        "commits": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/pulls/189/commits"
        },
        "statuses": {
          "href": "https://api.github.com/repos/HKUDS/RAG-Anything/statuses/a82d31800f216f4af1dcd96928bd50da19057720"
        }
      },
      "author_association": "NONE",
      "auto_merge": null,
      "active_lock_reason": null,
      "linked_issues": [
        1,
        186
      ]
    }
  ],
  "discussions": [
    {
      "id": "D_kwDOO3Bfkc4AkCNf",
      "number": 194,
      "title": "Are there any plans to integrate RAG-Anything into LightRAG server?",
      "body": ".",
      "created_at": "2026-02-08T16:45:29Z",
      "updated_at": "2026-02-10T08:29:38Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "mrdantutunaru",
        "avatar_url": "https://avatars.githubusercontent.com/u/25580398?u=0c49d1af475de3c482d04efc8ca2f3688bc1f3bc&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4Aju9d",
      "number": 192,
      "title": "Feature Discussion: Audio Transcription Support",
      "body": "  ## Use Case\r\n  I'm building a therapy practice management system and needed to process\r\n  voice notes alongside documents. Audio transcription → RAG enables\r\n  searching across mixed media (PDFs + voice notes).\r\n\r\n  ## Implementation\r\n  I've implemented audio support following RAG-Anything's patterns:\r\n  - Added parse_audio() to Parser (like parse_pdf, parse_image)\r\n  - Optional dependencies (pip install raganything[audio])\r\n  - Config-driven (AUDIO_LANGUAGE, AUDIO_WHISPER_MODEL, etc.)\r\n  - Zero breaking changes\r\n\r\n  ## Questions\r\n  1. Is audio transcription in scope for RAG-Anything?\r\n  2. Would you accept a PR for this feature?\r\n  3. Any concerns about dependency weight (~500MB Whisper models)?\r\n\r\n  Code is ready if you're interested. Happy to adjust based on feedback.",
      "created_at": "2026-01-20T14:55:06Z",
      "updated_at": "2026-01-26T08:21:26Z",
      "category": {
        "name": "Ideas",
        "emoji": ":bulb:"
      },
      "answer": null,
      "user": {
        "login": "skogsbaeck",
        "avatar_url": "https://avatars.githubusercontent.com/u/12762671?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4Ajx1R",
      "number": 193,
      "title": "Need a help!!!!  Why I always get this connection problem",
      "body": "when  I run \"mineru -p /xx.pdf -o output -m auto\", no matter I choose huggingface or modelscope, I always get : 2026-01-23 18:06:50.755 | ERROR    | mineru.cli.client:parse_doc:211 - 'NoneType' object has no attribute 'get'\r\nTraceback (most recent call last):\r\n\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/bin/mineru\", line 7, in <module>\r\n    sys.exit(client.main())\r\n    │   │    │      └ <Command main>\r\n    │   │    └ <module 'mineru.cli.client' from '/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/cli/client....\r\n    │   └ <built-in function exit>\r\n    └ <module 'sys' (built-in)>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/click/core.py\", line 1485, in __call__\r\n    return self.main(*args, **kwargs)\r\n           │    │     │       └ {}\r\n           │    │     └ ()\r\n           │    └ <function Command.main at 0x10534c9a0>\r\n           └ <Command main>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/click/core.py\", line 1406, in main\r\n    rv = self.invoke(ctx)\r\n         │    │      └ <click.core.Context object at 0x104ec4110>\r\n         │    └ <function Command.invoke at 0x10534c680>\r\n         └ <Command main>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/click/core.py\", line 1269, in invoke\r\n    return ctx.invoke(self.callback, **ctx.params)\r\n           │   │      │    │           │   └ {'input_path': '/Users/hy/PycharmProjects/RAG-Anything/documents/消费信贷小结.pdf', 'output_dir': 'output', 'method': 'auto', 'back...\r\n           │   │      │    │           └ <click.core.Context object at 0x104ec4110>\r\n           │   │      │    └ <function main at 0x13732e980>\r\n           │   │      └ <Command main>\r\n           │   └ <function Context.invoke at 0x105347880>\r\n           └ <click.core.Context object at 0x104ec4110>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/click/core.py\", line 824, in invoke\r\n    return callback(*args, **kwargs)\r\n           │         │       └ {'input_path': '/Users/hy/PycharmProjects/RAG-Anything/documents/消费信贷小结.pdf', 'output_dir': 'output', 'method': 'auto', 'back...\r\n           │         └ ()\r\n           └ <function main at 0x13732e980>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/click/decorators.py\", line 34, in new_func\r\n    return f(get_current_context(), *args, **kwargs)\r\n           │ │                       │       └ {'input_path': '/Users/hy/PycharmProjects/RAG-Anything/documents/消费信贷小结.pdf', 'output_dir': 'output', 'method': 'auto', 'back...\r\n           │ │                       └ ()\r\n           │ └ <function get_current_context at 0x10532a020>\r\n           └ <function main at 0x13732f240>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/cli/client.py\", line 220, in main\r\n    parse_doc([Path(input_path)])\r\n    │          │    └ '/Users/hy/PycharmProjects/RAG-Anything/documents/消费信贷小结.pdf'\r\n    │          └ <class 'pathlib.Path'>\r\n    └ <function main.<locals>.parse_doc at 0x13732eb60>\r\n> File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/cli/client.py\", line 196, in parse_doc\r\n    do_parse(\r\n    └ <function do_parse at 0x13732e7a0>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/cli/common.py\", line 478, in do_parse\r\n    _process_hybrid(\r\n    └ <function _process_hybrid at 0x13732e5c0>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/cli/common.py\", line 337, in _process_hybrid\r\n    middle_json, infer_result, _vlm_ocr_enable = hybrid_doc_analyze(\r\n                                                 └ <function doc_analyze at 0x16a1ca2a0>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/backend/hybrid/hybrid_analyze.py\", line 398, in doc_analyze\r\n    predictor = ModelSingleton().get_model(backend, model_path, server_url, **kwargs)\r\n                │                          │        │           │             └ {}\r\n                │                          │        │           └ None\r\n                │                          │        └ None\r\n                │                          └ 'transformers'\r\n                └ <class 'mineru.backend.vlm.vlm_analyze.ModelSingleton'>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/backend/vlm/vlm_analyze.py\", line 57, in get_model\r\n    model_path = auto_download_and_get_model_root_path(\"/\",\"vlm\")\r\n                 └ <function auto_download_and_get_model_root_path at 0x1364fd9e0>\r\n  File \"/Users/hy/PycharmProjects/RAG-Anything/venv/lib/python3.11/site-packages/mineru/utils/models_download_utils.py\", line 21, in auto_download_and_get_model_root_path\r\n    root_path = local_models_config.get(repo_mode, None)\r\n                │                       └ 'vlm'\r\n                └ None\r\n\r\nAttributeError: 'NoneType' object has no attribute 'get'\r\n",
      "created_at": "2026-01-23T10:08:44Z",
      "updated_at": "2026-01-23T10:09:09Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "HYSMN",
        "avatar_url": "https://avatars.githubusercontent.com/u/32283625?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AgaBp",
      "number": 13,
      "title": "Integration with existing LightRAG server that uses Ollama",
      "body": "I have an existing LightRAG with a lot of content. I have been looking at whether something like Docling could improve the embedding as many of my PDFs are complicated and messy brochures...but RAGAnything seems to provide quite a nice preprocessing pipeline that one assumes will do what I'm looking for anyway.\n\nRAGAnything talks about use of existing LightRAG but I'm not clear how that works with LightRAG server and also as per another discussion quite how RAGAnything supports Ollama rather than OpenAI.\n\nI don't want to start again with a new LightRAG store as I'm not sure I can find all the documents I have uploaded nor do I really want to reprocess them just to see how RAGAnything improves ingest\n\nCan anyone help?",
      "created_at": "2025-06-25T20:45:45Z",
      "updated_at": "2025-12-23T21:42:28Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "helicalchris",
        "avatar_url": "https://avatars.githubusercontent.com/u/35872772?u=f8246e7b084741f5dd66a1995fdab016bfb297bd&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AjYPe",
      "number": 179,
      "title": "加载已有的RAG库时报错。",
      "body": "# Same working_dir as processing step!\r\n    config = RAGAnythingConfig(working_dir=\"./rag_storage\")\r\n\r\n    rag = RAGAnything(config=config, llm_model_func=llm_model_func, embedding_func=embedding_func)\r\n\r\n    # Query existing processed data\r\n    result = await rag.aquery(\r\n        \"RAG-anything是什么\",\r\n        mode=\"hybrid\"\r\n    )\r\n    print(result)                  \r\n运行出错: No LightRAG instance available. Please process documents first or provide a pre-initialized LightRAG instance.",
      "created_at": "2025-12-23T01:57:52Z",
      "updated_at": "2025-12-23T01:57:52Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "Zhaofan2021",
        "avatar_url": "https://avatars.githubusercontent.com/u/85424946?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AiOhm",
      "number": 132,
      "title": "Any better way to use RAG-Anything at Enterprises?",
      "body": "Any better way to use RAG-Anything at Enterprises?\r\ni.e in AWS or Azure with Azure OpenAI or Claude",
      "created_at": "2025-10-02T06:07:10Z",
      "updated_at": "2025-12-22T10:14:16Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "mahanteshimath",
        "avatar_url": "https://avatars.githubusercontent.com/u/13198314?u=e92f6c8c54fe81ccb29a95578eb01e85b54dda2b&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4Ag8nQ",
      "number": 66,
      "title": "How RAGAnything query work",
      "body": "\r\n<img width=\"570\" height=\"441\" alt=\"git\" src=\"https://github.com/user-attachments/assets/bbefd75d-8d8a-439e-bc7a-728ed0e03542\" />\r\n\r\n\r\nI want to know — does the output_dir play any role in answering the query?\r\n\r\nI processed the document once and was able to query it successfully. But when I commented out the process() function and kept only the query() function, it gave an error. Why doesn’t it pick up the already processed document? The path is already defined in the raganything config.\r\n\r\nAlso, is there a way to query previously processed documents?\r\nWith LightRAG, I just provided the path and it worked.\r\nSo then my question is: what kind of data resides in the output_dir, and Does is it role in querying?\r\n\r\nwhy i am asking the role of output_dir is  because that on first time process document function and query function is in the same file.\r\n",
      "created_at": "2025-07-27T06:13:09Z",
      "updated_at": "2025-12-15T16:59:33Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "vipan786",
        "avatar_url": "https://avatars.githubusercontent.com/u/44468694?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AjJwg",
      "number": 174,
      "title": "Struggling to query table (check pdf at the bottom of the post)",
      "body": "Hey guys,\r\n\r\nI am trying to query this table for about 3 weeks now. However, it was in vain. I tried a lot of configurations but i wasn't able to get 100% correct answers. Some of the rows turn out to be fully correct but others are not. It's because either 1) the LLM used the columns from another row or 2) mixed up values from different rows. Early on the LLM was even hallucinating values which are not part of the table. I managed to restrict that behaviour by:\r\n\r\n1. **Ultra-strict prompt**\r\n\r\n```\r\nmerged_prompt = f\"\"\"\r\nYou are a document-grounded assistant working on technical PDFs for mechanical components\r\n(e.g. spur gears, shafts, bearings).\r\n\r\nYou are given CONTEXT extracted from the PDF. CONTEXT may include:\r\n- OCR'd tables with dimensions and properties\r\n- Text paragraphs (materials, fits, tolerances, notes)\r\n- Images of 2D technical drawings portraying geometry\r\n- Descriptions of formulas and equations\r\n\r\nGLOBAL RULES (MUST OBEY)\r\n- Use ONLY information that appears in the CONTEXT.\r\n- You are NOT allowed to invent, modify, or \"correct\" any numeric value.\r\n- Do NOT use external standards, formulas, or domain knowledge.\r\n- Do NOT interpolate, smooth, or approximate values.\r\n- If you output a numeric value, it MUST literally appear somewhere in the CONTEXT\r\n(same digits, commas/dots, units, spacing).\r\n- If the information needed to answer the question is missing or incomplete, answer EXACTLY:\r\nCANNOT_FIND_ANSWER_IN_DOCUMENT\r\nand do not guess.\r\n\r\nSPECIAL CASE: STRICT TABLE ROW COPY (MUST FOLLOW EXACTLY)\r\n- If the user asks for a specific table row to be \"copied\", \"returned\", or \"extracted\"\r\n(e.g. “full row for Z2=25 teeth”, “give me the row where Z2 = 25”, etc.):\r\n1) Locate the `table_body` string inside CONTEXT (the HTML fragment starting with <table> and ending with </table>).\r\n2) Inside this `table_body` HTML, search ONLY within the <tr>...</tr> rows.\r\n3) Find all <tr>...</tr> rows where the FIRST <td> cell exactly matches the requested key\r\n    (e.g. 25 for “Z2=25”; do NOT approximate, do NOT use closest value).\r\n4) If you find one or more matching rows:\r\n    - OUTPUT ONLY the exact characters of each matching \"<tr> ... </tr>\" substring from the input.\r\n    - Preserve all characters exactly: tags, spaces, commas, dots, decimal separators, units, and line breaks.\r\n    - Do NOT wrap the output in Markdown, tables, code fences, or any extra text.\r\n    - Do NOT reorder columns. Do NOT explain the result. Just output the raw <tr>...</tr> line(s).\r\n5) If you do NOT find any matching row:\r\n    - Output EXACTLY: NOT_IN_CONTEXT\r\n    - Do NOT construct a plausible row, do NOT guess, and do NOT reuse another row as a template.\r\n\r\nGENERAL TASK (NON-STRICT CASES)\r\n- For all other questions:\r\n- Read the user query.\r\n- Search the CONTEXT (tables, drawings, text, equations) for information relevant to the query.\r\n- You may combine information from tables, drawings, equations, and text,\r\n    but NEVER create new numerical values.\r\n- Prefer precise table entries and dimension annotations from drawings over vague text.\r\n\r\nOUTPUT FORMAT FOR GENERAL QUESTIONS\r\n- When listing dimensions, use bullets:\r\n- <symbol or name> = <value> <unit> — <short description>\r\n\r\nCONTEXT START\r\n{system_prompt}\r\nCONTEXT END\r\n\r\nUser query:\r\n{prompt or \"\"}\r\n\"\"\"\r\n\r\nsystem_prompt = (\r\n    \"You are a deterministic, document-grounded assistant. \"\r\n    \"You must treat the CONTEXT as the only source of truth and you must never invent, \"\r\n    \"modify, or reformat numeric values. When the STRICT TABLE ROW COPY rules apply, \"\r\n    \"you MUST follow them exactly.\"\r\n)\r\n```\r\n2. **Use this configuration:**\r\n```\r\nconfig = RAGAnythingConfig(\r\n    working_dir=\"./rag_storage_multimodal_ollama\",\r\n    parser=\"mineru\",\r\n    parse_method=\"auto\",\r\n    enable_image_processing=True,\r\n    enable_table_processing=True,\r\n    enable_equation_processing=True,\r\n    context_mode=\"chunk\",\r\n    max_context_tokens=4096,           # enough for the full table chunk\r\n    context_window=0,                       # just that page\r\n    include_headers = True,                # Include headers for technical content\r\n    include_captions = True,               # Include captions for images/tables\r\n    context_filter_content_types = [\"text\", \"table\"]           \r\n)\r\nrag = RAGAnything(\r\n    config=config,\r\n    lightrag_kwargs = { \"enable_llm_cache\": False,\r\n                       \"llm_model_kwargs\": \r\n                        {\"options\": {\r\n                                \"num_ctx\": 32768,\r\n                                \"temperature\": 0.05,   # almost deterministic\r\n                                \"top_p\": 0.8,          # reduce tail sampling\r\n                                \"top_k\": 40,           # limit to top 40 probable tokens\r\n                                \"repeat_penalty\": 1.05 # mild, to avoid weird loops\r\n                            }\r\n                        }\r\n                       },\r\n    llm_model_func=ollama_llm_model,\r\n    vision_model_func=vision_model_func,\r\n    embedding_func=EmbeddingFunc(\r\n        embedding_dim=768,\r\n        func=lambda texts: ollama_embed(texts, embed_model=EMBEDDING_MODEL_NAME)\r\n    )\r\n)\r\ntext_result = await rag.aquery(\r\n      \"From the table with dimensions for polyacetal spur gears, please extract the full row for ZZ=60 teeth\",\r\n      vlm_enhanced=False,\r\n      enable_rerank=False,\r\n      chunk_top_k=6,        # because my pdf contains 2 imgs, 1 table, 1 txt paragraph and 2 discarded elements\r\n      mode=\"naive\",\r\n  )\r\n```\r\n---\r\n\r\nWhat am I doing wrong? I read that all current LLMs are not reliably 100% when given a long table in the context and ask it “give me the entire row where Z2=12”. any workaround?\r\n\r\nP.S: \r\n> 1) The model sees all context at query time\r\n>  2) I am using ollama local model llama3.2-vision:11b-instruct-q4_K_M for both \"llm_model_func\" and \"vision_model_func\" and nomic-embed-text\" for embeddings.\r\n[gear_span.pdf](https://github.com/user-attachments/files/23988497/gear_span.pdf)\r\n",
      "created_at": "2025-12-06T14:58:15Z",
      "updated_at": "2025-12-06T14:58:15Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "Raouf71",
        "avatar_url": "https://avatars.githubusercontent.com/u/72918044?u=29585180ae7c880806949d3f4f9a7d352963567f&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AhmlN",
      "number": 96,
      "title": "Does RAG Anything supports multiple documents?",
      "body": "Hi!\r\n\r\nDoes this  library support multiple documents?\r\n\r\nAs per below code, it only supports single document\r\n\r\n # Process a document\r\n    await rag.process_document_complete(\r\n        file_path=\"path/to/your/document.pdf\",\r\n        output_dir=\"./output\",\r\n        parse_method=\"auto\"\r\n    )\r\n\r\n\r\nThank you\r\n",
      "created_at": "2025-08-28T01:11:21Z",
      "updated_at": "2025-11-28T04:58:22Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "naveenoxygen",
        "avatar_url": "https://avatars.githubusercontent.com/u/165016822?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AgTz_",
      "number": 5,
      "title": "Does this have ollama support?",
      "body": "Does this have ollama support?",
      "created_at": "2025-06-18T19:21:53Z",
      "updated_at": "2025-11-27T20:37:11Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "nasaisrllycool",
        "avatar_url": "https://avatars.githubusercontent.com/u/215014816?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AjB2L",
      "number": 169,
      "title": "Scalable RAG",
      "body": "Are the project planning to extend the document ingestion RAG-Anything to be more scalable? For example with Ray or Anyscale.",
      "created_at": "2025-11-26T20:18:14Z",
      "updated_at": "2025-11-26T20:18:15Z",
      "category": {
        "name": "Ideas",
        "emoji": ":bulb:"
      },
      "answer": null,
      "user": {
        "login": "r3006",
        "avatar_url": "https://avatars.githubusercontent.com/u/49816401?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AizZG",
      "number": 160,
      "title": "Does RAG-anything support querying between different rag_storage",
      "body": "I mean, if I build two rag_storage paths with different files, how could I query by path indicated?\r\n```python\r\nlightrag_instance = LightRAG(working_dir='/user/namespace_1',\r\n                                            llm_model_func=llm_model_func,\r\n                                            embedding_func=embedding_func)\r\n```\r\nIn this example, LightRAG only receives a single path\r\n",
      "created_at": "2025-11-10T04:41:42Z",
      "updated_at": "2025-11-10T04:41:42Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "Chen1303005809",
        "avatar_url": "https://avatars.githubusercontent.com/u/59914914?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AiraM",
      "number": 152,
      "title": "Support for Chunk Metadata when dealing with text",
      "body": "Hey everyone —\r\nI’ve been using RAG-Anything, and while it’s great for multimodal stuff (text, tables, images, etc.), I’m hitting a wall with chunk metadata.\r\n\r\nYou can extract metadata for non-text elements, but there doesn’t seem to be any way to know which doc or page a text chunk actually came from. Once it’s chunked, all that context seems gone.\r\n\r\nAm I missing something? Or is this just not supported right now?",
      "created_at": "2025-10-30T23:31:43Z",
      "updated_at": "2025-10-30T23:31:43Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "DLOVRIC2",
        "avatar_url": "https://avatars.githubusercontent.com/u/66421606?u=20b7137c45b4589868ae3d45cb84cdfc67268074&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AinSz",
      "number": 148,
      "title": "Support for Supabase Vector Database",
      "body": "Is there plan to integrate this framework to work with Supabase Vector Stores?",
      "created_at": "2025-10-26T19:23:01Z",
      "updated_at": "2025-10-26T19:23:02Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "JDBem",
        "avatar_url": "https://avatars.githubusercontent.com/u/74302008?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AidDs",
      "number": 141,
      "title": "How to fix this on gg colab",
      "body": "I followed the path to install and run the first usage explanation to try it out. I replaced the api key and document path but there is more problems\r\nINFO: RAGAnything initialized with config:\r\nINFO:   Working directory: ./rag_storage\r\nINFO:   Parser: mineru\r\nINFO:   Parse method: auto\r\nINFO:   Multimodal processing - Image: True, Table: True, Equation: True\r\nINFO:   Max concurrent files: 1\r\nINFO: Parser 'mineru' installation verified\r\nINFO: Initializing LightRAG with parameters: {'working_dir': './rag_storage'}\r\nINFO: [_] Created new empty graph file: ./rag_storage/graph_chunk_entity_relation.graphml\r\nINFO: Multimodal processors initialized with context support\r\nINFO: Available processors: ['image', 'table', 'equation', 'generic']\r\nINFO: Context configuration: ContextConfig(context_window=1, context_mode='page', max_context_tokens=2000, include_headers=True, include_captions=True, filter_content_types=['text'])\r\nINFO: LightRAG, parse cache, and multimodal processors initialized\r\nINFO: Starting complete document processing: /1307045.pdf\r\nINFO: Starting document parsing: /1307045.pdf\r\nINFO: Using cached parsing result for: /1307045.pdf\r\nINFO: * Total blocks in cached content_list: 56\r\nINFO: Content separation complete:\r\nINFO:   - Text content length: 23326 characters\r\nINFO:   - Multimodal items count: 4\r\nINFO:   - Multimodal type distribution: {'image': 3, 'table': 1}\r\nINFO: Setting content source for context-aware multimodal processing...\r\nINFO: Content source set with format: minerU\r\nINFO: Content source set with format: minerU\r\nINFO: Content source set with format: minerU\r\nINFO: Content source set with format: minerU\r\nINFO: Content source set for context extraction (format: minerU)\r\nINFO: Starting text content insertion into LightRAG...\r\nWARNING: Ignoring document ID (already exists): doc-50d7ecd2cdb2e73645fbbdd75c97184d (1307045.pdf)\r\nWARNING: No new unique documents were found.\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n[/tmp/ipython-input-4020778013.py](https://localhost:8080/#) in <cell line: 1>()\r\n    128     print(\"Multimodal query result:\", multimodal_result)\r\n    129 \r\n--> 130 await main()\r\n\r\n5 frames\r\n[/tmp/ipython-input-4020778013.py](https://localhost:8080/#) in main()\r\n    102 \r\n    103     # Process a document\r\n--> 104     await rag.process_document_complete(\r\n    105         file_path=\"/1307045.pdf\",\r\n    106         output_dir=\"./output\",\r\n\r\n[/content/RAG-Anything/raganything/processor.py](https://localhost:8080/#) in process_document_complete(self, file_path, output_dir, parse_method, display_stats, split_by_character, split_by_character_only, doc_id, **kwargs)\r\n   1475         if text_content.strip():\r\n   1476             file_name = os.path.basename(file_path)\r\n-> 1477             await insert_text_content(\r\n   1478                 self.lightrag,\r\n   1479                 input=text_content,\r\n\r\n[/content/RAG-Anything/raganything/utils.py](https://localhost:8080/#) in insert_text_content(lightrag, input, split_by_character, split_by_character_only, ids, file_paths)\r\n    164 \r\n    165     # Use LightRAG's insert method with all parameters\r\n--> 166     await lightrag.ainsert(\r\n    167         input=input,\r\n    168         file_paths=file_paths,\r\n\r\n[/usr/local/lib/python3.12/dist-packages/lightrag/lightrag.py](https://localhost:8080/#) in ainsert(self, input, split_by_character, split_by_character_only, ids, file_paths, track_id)\r\n    928 \r\n    929         await self.apipeline_enqueue_documents(input, ids, file_paths, track_id)\r\n--> 930         await self.apipeline_process_enqueue_documents(\r\n    931             split_by_character, split_by_character_only\r\n    932         )\r\n\r\n[/usr/local/lib/python3.12/dist-packages/lightrag/lightrag.py](https://localhost:8080/#) in apipeline_process_enqueue_documents(self, split_by_character, split_by_character_only)\r\n   1379             # Ensure only one worker is processing documents\r\n   1380             if not pipeline_status.get(\"busy\", False):\r\n-> 1381                 processing_docs, failed_docs, pending_docs = await asyncio.gather(\r\n   1382                     self.doc_status.get_docs_by_status(DocStatus.PROCESSING),\r\n   1383                     self.doc_status.get_docs_by_status(DocStatus.FAILED),\r\n\r\n[/usr/local/lib/python3.12/dist-packages/lightrag/kg/json_doc_status_impl.py](https://localhost:8080/#) in get_docs_by_status(self, status)\r\n    116                         if \"error_msg\" not in data:\r\n    117                             data[\"error_msg\"] = None\r\n--> 118                         result[k] = DocProcessingStatus(**data)\r\n    119                     except KeyError as e:\r\n    120                         logger.error(\r\n\r\nTypeError: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' ",
      "created_at": "2025-10-16T02:25:49Z",
      "updated_at": "2025-10-16T02:25:50Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "alexdan1816",
        "avatar_url": "https://avatars.githubusercontent.com/u/170811406?u=e2b5bd6be4e459013904dd16e49b37f966c2f81f&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AiaFe",
      "number": 137,
      "title": "How can `Rag Anyhting` connect with `Milvus data base` ?",
      "body": "  Load .env first\r\nload_dotenv()\r\nAPI_KEY = os.environ.get(\"OPENAI_API_KEY\")\r\n\r\nasync def build_lightrag():\r\n    light = LightRAG(\r\n        working_dir=\"./rag_storage\",\r\n        vector_storage=\"MilvusVectorDBStorage\",\r\n        workspace=os.getenv(\"MILVUS_WORKSPACE\"),\r\n        llm_model_func=lambda prompt, system_prompt=None,history_messages=[],**kwargs: openai_complete_if_cache(\r\n            \"gpt-4o\",\r\n            prompt,\r\n            system_prompt=system_prompt,\r\n            history_messages=history_messages,\r\n            api_key=API_KEY,\r\n        ),\r\n        embedding_func=EmbeddingFunc(\r\n            embedding_dim=3072,\r\n            max_token_size=8192,\r\n            func=lambda texts: openai_embed(\r\n                texts,\r\n                model=\"text-embedding-3-large\",\r\n                api_key=API_KEY,\r\n            )\r\n        ),\r\n    )\r\n    # Required LightRAG initialization\r\n    await light.lightrag.initialize_storages()\r\n    await initialize_pipeline_status()\r\n    return light\r\n\r\n\r\ndef build_rag() -> RAGAnything:\r\n    lightrag_instance = build_lightrag()\r\n    config = RAGAnythingConfig(\r\n        working_dir=\"./rag_storage\",\r\n        parser=\"generic\",\r\n        parse_method=\"auto\",\r\n        enable_image_processing=False,\r\n        enable_table_processing=False,\r\n        enable_equation_processing=False,\r\n    )\r\n    return RAGAnything(lightrag=lightrag_instance, config=config)  # RAG-Anything on LightRAG \r\n\r\nasync def main():\r\n    ap = argparse.ArgumentParser()\r\n    ap.add_argument(\"--file\", default=\"./rag_storage\")\r\n    ap.add_argument(\"--question\", default=\"what is a vector database in this milvus?\")\r\n    args = ap.parse_args()\r\n\r\n    rag = build_rag()\r\n\r\n    # # Required LightRAG initialization\r\n    # await rag.lightrag.initialize_storages()\r\n    # await initialize_pipeline_status()\r\n\r\n    # Ingest and query via Milvus-backed LightRAG \r\n    await rag.process_document_complete(\r\n        file_path=args.file,\r\n        output_dir=\"./output\",\r\n        parse_method=\"auto\",\r\n    )\r\n    result = await rag.aquery_with_multimodal(args.question, mode=\"hybrid\")\r\n    print(result)\r\n\r\nif __name__ == \"__main__\":\r\n    asyncio.run(main())int(result)\r\n\r\nif __name__ == \"__main__\":\r\n    asyncio.run(main())",
      "created_at": "2025-10-13T05:34:07Z",
      "updated_at": "2025-10-13T05:43:48Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "AMRENDRASINGH-COM",
        "avatar_url": "https://avatars.githubusercontent.com/u/172993640?u=6134116841acd0d1219383ccccfbeeae5a30ce49&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AiHJ1",
      "number": 123,
      "title": "LightRAG Instance",
      "body": "I'm using RAG-Anything and was able to embed a few documents. \r\n\r\nHow can I \"fire up\" the LightRAG interface to handle the data I have processed using RAG-Anything?",
      "created_at": "2025-09-24T17:45:21Z",
      "updated_at": "2025-09-28T17:35:35Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": {
        "id": "DC_kwDOO3Bfkc4A3bN9",
        "body": "🚀 **Great question!** RAG-Anything and LightRAG can work together seamlessly. Here's how to access your processed data through the LightRAG interface:\r\n\r\n## 🔧 **Method 1: Direct Instance Access**\r\n\r\nAfter processing documents with RAG-Anything, you can access the underlying LightRAG instance:\r\n\r\n```python\r\nfrom raganything import RAGAnything, RAGAnythingConfig\r\n\r\n# Initialize RAG-Anything\r\nconfig = RAGAnythingConfig(working_dir=\"./rag_storage\")\r\nrag = RAGAnything(config=config, llm_model_func=your_llm, embedding_func=your_embedding)\r\n\r\n# Process your documents\r\nawait rag.process_document_complete(\r\n    file_path=\"document.pdf\",\r\n    output_dir=\"./output\"\r\n)\r\n\r\n# Access the LightRAG instance directly\r\nlightrag_instance = rag.lightrag\r\n\r\n# Now you can use LightRAG methods directly\r\nresult = await lightrag_instance.aquery(\"Your query here\")\r\nprint(result)\r\n```\r\n\r\n## 📊 **Method 2: Using Existing Storage**\r\n\r\nIf you've already processed documents, you can initialize LightRAG pointing to the same storage:\r\n\r\n```python\r\nfrom lightrag import LightRAG, QueryParam\r\n\r\n# Point to the same working directory\r\nlightrag = LightRAG(\r\n    working_dir=\"./rag_storage\",  # Same as RAG-Anything working_dir\r\n    llm_model_func=your_llm_function,\r\n    embedding_func=your_embedding_function\r\n)\r\n\r\n# Query your processed data\r\nresult = await lightrag.aquery(\r\n    \"What are the key insights from the processed documents?\",\r\n    param=QueryParam(mode=\"hybrid\")\r\n)\r\n```\r\n\r\n## 🔍 **Method 3: Graph Visualization Interface**\r\n\r\nFor visual exploration of your knowledge graph:\r\n\r\n```python\r\n# After processing with RAG-Anything\r\nfrom lightrag.utils import EmbeddingFunc\r\nimport networkx as nx\r\n\r\n# Access the graph data\r\ngraph_data = rag.lightrag.chunk_entity_relation_graph\r\n\r\n# Create visualization\r\ndef visualize_knowledge_graph():\r\n    G = nx.Graph()\r\n    \r\n    # Add nodes and edges from your processed data\r\n    for entity in graph_data.entities:\r\n        G.add_node(entity.name, **entity.properties)\r\n    \r\n    for relation in graph_data.relations:\r\n        G.add_edge(relation.source, relation.target, \r\n                  weight=relation.strength)\r\n    \r\n    return G\r\n\r\n# Generate and display graph\r\ngraph = visualize_knowledge_graph()\r\n```\r\n\r\n## ⚙️ **Method 4: Custom Query Interface**\r\n\r\nBuild a simple interface to interact with your processed data:\r\n\r\n```python\r\nimport asyncio\r\n\r\nclass RAGInterface:\r\n    def __init__(self, rag_anything_instance):\r\n        self.rag = rag_anything_instance\r\n        self.lightrag = rag_anything_instance.lightrag\r\n    \r\n    async def interactive_query(self):\r\n        print(\"🤖 RAG-Anything Interactive Interface\")\r\n        print(\"Type 'exit' to quit\")\r\n        \r\n        while True:\r\n            query = input(\"\\n📝 Enter your query: \")\r\n            if query.lower() == 'exit':\r\n                break\r\n                \r\n            try:\r\n                # Use different query modes\r\n                hybrid_result = await self.lightrag.aquery(query, mode=\"hybrid\")\r\n                print(f\"\\n🔍 Hybrid Search Result:\\n{hybrid_result}\")\r\n                \r\n            except Exception as e:\r\n                print(f\"❌ Error: {e}\")\r\n\r\n# Usage\r\ninterface = RAGInterface(rag)\r\nawait interface.interactive_query()\r\n```\r\n\r\n## 🛠️ **Configuration Alignment**\r\n\r\nMake sure your LightRAG configuration matches RAG-Anything settings:\r\n\r\n```python\r\n# Check RAG-Anything config\r\nprint(f\"Working directory: {rag.config.working_dir}\")\r\nprint(f\"Enable image processing: {rag.config.enable_image_processing}\")\r\n\r\n# Configure LightRAG with same settings\r\nlightrag_config = {\r\n    \"working_dir\": rag.config.working_dir,\r\n    \"enable_image\": rag.config.enable_image_processing,\r\n    \"enable_table\": rag.config.enable_table_processing,\r\n    \"chunk_token_size\": rag.config.chunk_token_size,\r\n    \"chunk_overlap_token_size\": rag.config.chunk_overlap_token_size\r\n}\r\n```\r\n\r\n## 🚨 **Troubleshooting Common Issues**\r\n\r\n1. **Storage Path Mismatch**:\r\n```python\r\n# Ensure paths match exactly\r\nassert rag.config.working_dir == lightrag.working_dir\r\n```\r\n\r\n2. **Missing Dependencies**:\r\n```bash\r\npip install lightrag networkx matplotlib plotly\r\n```\r\n\r\n3. **Memory Issues with Large Datasets**:\r\n```python\r\n# Use streaming queries for large datasets\r\nasync def stream_query(query):\r\n    for chunk in await lightrag.aquery_stream(query):\r\n        yield chunk\r\n```\r\n\r\n## 📈 **Advanced Integration Example**\r\n\r\nHere's a complete example combining both systems:\r\n\r\n```python\r\nimport asyncio\r\nfrom raganything import RAGAnything, RAGAnythingConfig\r\nfrom lightrag import LightRAG\r\n\r\nasync def full_rag_workflow():\r\n    # Step 1: Process documents with RAG-Anything\r\n    config = RAGAnythingConfig(\r\n        working_dir=\"./unified_rag_storage\",\r\n        enable_image_processing=True,\r\n        enable_table_processing=True\r\n    )\r\n    \r\n    rag_anything = RAGAnything(config=config)\r\n    \r\n    # Process your documents\r\n    await rag_anything.process_folder_complete(\"./documents\")\r\n    \r\n    # Step 2: Access via LightRAG interface\r\n    lightrag = rag_anything.lightrag\r\n    \r\n    # Step 3: Advanced querying\r\n    queries = [\r\n        \"Summarize the main topics across all documents\",\r\n        \"What are the key relationships between entities?\",\r\n        \"Find contradictions or inconsistencies in the data\"\r\n    ]\r\n    \r\n    results = {}\r\n    for query in queries:\r\n        result = await lightrag.aquery(\r\n            query, \r\n            mode=\"hybrid\",\r\n            only_need_context=False\r\n        )\r\n        results[query] = result\r\n    \r\n    return results\r\n\r\n# Run the workflow\r\nresults = await full_rag_workflow()\r\nfor query, result in results.items():\r\n    print(f\"Query: {query}\")\r\n    print(f\"Result: {result}\\n{'-'*50}\\n\")\r\n```\r\n\r\n## 💡 **Pro Tips**\r\n\r\n- **Use `hybrid` mode** for best results when querying processed RAG-Anything data\r\n- **Check working_dir consistency** between RAG-Anything and LightRAG instances\r\n- **Monitor memory usage** with large document sets\r\n- **Use async/await** for better performance with multiple queries\r\n\r\nThe key is that RAG-Anything builds on top of LightRAG, so you can access the underlying LightRAG instance directly or create a new one pointing to the same storage directory! 🎯\r\n\r\nLet me know if you need help with any specific integration scenario!"
      },
      "user": {
        "login": "baartho",
        "avatar_url": "https://avatars.githubusercontent.com/u/19720384?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AiF3i",
      "number": 121,
      "title": "ChromaDB Deprecated ?",
      "body": "I want to use ChromaDB as the vector DB. Is it supported or deprecated ? ",
      "created_at": "2025-09-23T12:33:34Z",
      "updated_at": "2025-09-26T22:19:38Z",
      "category": {
        "name": "Q&A",
        "emoji": ":pray:"
      },
      "answer": null,
      "user": {
        "login": "gstazure",
        "avatar_url": "https://avatars.githubusercontent.com/u/27543824?v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AhQ9A",
      "number": 81,
      "title": "Rag anything working with LightRAG Server and WebUI",
      "body": "I would like to ask if the raganything extension can work with the LightRAG Server and WebUI properly with the full capabilities of the raganything module. ",
      "created_at": "2025-08-12T14:01:54Z",
      "updated_at": "2025-09-26T15:16:30Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "thanbskt",
        "avatar_url": "https://avatars.githubusercontent.com/u/51063695?u=4a4b8bb79f03f9baf18840ea7c439aacc870f87b&v=4"
      }
    },
    {
      "id": "D_kwDOO3Bfkc4AgaNr",
      "number": 16,
      "title": "CUDA accelerate MinerU in process_folder_complete",
      "body": "It is explained that device=cuda flag in process_document_complete can enable CUDA acceleration for MinerU, but this does not seem to work in the process_folder_complete function. What's the best way to approach this?",
      "created_at": "2025-06-26T03:10:52Z",
      "updated_at": "2025-06-26T03:10:53Z",
      "category": {
        "name": "General",
        "emoji": ":speech_balloon:"
      },
      "answer": null,
      "user": {
        "login": "arjun-vdc",
        "avatar_url": "https://avatars.githubusercontent.com/u/187897619?v=4"
      }
    }
  ],
  "details": {
    "id": 997220241,
    "node_id": "R_kgDOO3BfkQ",
    "name": "RAG-Anything",
    "full_name": "HKUDS/RAG-Anything",
    "private": false,
    "owner": {
      "login": "HKUDS",
      "id": 118165258,
      "node_id": "O_kgDOBwsPCg",
      "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/HKUDS",
      "type": "Organization",
      "user_view_type": "public",
      "site_admin": false
    },
    "description": "\"RAG-Anything: All-in-One RAG Framework\"",
    "fork": false,
    "url": "https://api.github.com/repos/HKUDS/RAG-Anything",
    "created_at": "2025-06-06T06:47:29Z",
    "updated_at": "2026-02-17T02:38:43Z",
    "pushed_at": "2026-01-26T09:09:21Z",
    "homepage": "http://arxiv.org/abs/2510.12323",
    "size": 3273,
    "stargazers_count": 13485,
    "watchers_count": 13485,
    "language": "Python",
    "has_issues": true,
    "has_projects": true,
    "has_downloads": true,
    "has_wiki": false,
    "has_pages": false,
    "has_discussions": true,
    "forks_count": 1612,
    "archived": false,
    "disabled": false,
    "open_issues_count": 102,
    "license": {
      "key": "mit",
      "name": "MIT License",
      "spdx_id": "MIT",
      "url": "https://api.github.com/licenses/mit",
      "node_id": "MDc6TGljZW5zZTEz"
    },
    "allow_forking": true,
    "is_template": false,
    "web_commit_signoff_required": false,
    "has_pull_requests": true,
    "pull_request_creation_policy": "all",
    "topics": {
      "0": "multi-modal-rag",
      "1": "retrieval-augmented-generation"
    },
    "visibility": "public",
    "forks": 1612,
    "open_issues": 102,
    "watchers": 13485,
    "default_branch": "main",
    "permissions": {
      "admin": false,
      "maintain": false,
      "push": false,
      "triage": false,
      "pull": true
    },
    "temp_clone_token": "",
    "custom_properties": {},
    "organization": {
      "login": "HKUDS",
      "id": 118165258,
      "node_id": "O_kgDOBwsPCg",
      "avatar_url": "https://avatars.githubusercontent.com/u/118165258?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/HKUDS",
      "type": "Organization",
      "user_view_type": "public",
      "site_admin": false
    },
    "network_count": 1612,
    "subscribers_count": 86
  },
  "lastFetched": 1771297611507
}