Quantcast
Channel: phpBB.com
Viewing all articles
Browse latest Browse all 2717

Extensions in Development • [3.3][DEV] phpBB Vectorial search

$
0
0
I asked Manus AI to build a phpBB vectorial search extension for phpBB and it coded this extension.

phpBB properly recognizes it as a valid extension. Not sure what is missing or to be fixed. Feedback is very welcome.

The idea is to support vectorial search using Pinecone to empower phpBB with embeddings capabilities.

This entire readme.md was written by Manus:
# Vector Search Extension for phpBB (Integration with Pinecone)

## 1. Introduction

This extension integrates semantic vector search capabilities into phpBB forums using [Pinecone](https://www.pinecone.io/) as the vector database. It allows users to find relevant content based on the meaning of their query, rather than just keyword matches.

The extension is designed to be configurable via the phpBB Administration Control Panel (ACP), enabling administrators to manage the Pinecone connection, select which content to index, and initiate the indexing process.

**Current Version:** 1.0.0

## 2. Key Features

* Integration with the Pinecone vector database.
* Indexing of posts from selected forums.
* Configuration of Pinecone connection and indexing parameters through the ACP.
* ACP panel to manually trigger batch indexing.
* Display of basic Pinecone index statistics in the ACP.
* Vector search capability (integration with phpBB’s search interface is handled via events).
* Text processing to clean content before vectorization.
* Modular design for maintainability and future enhancements.

## 3. Requirements

* **phpBB:** Version 3.3.0 or higher.
* **PHP:** Version 7.1 or higher (7.4+ recommended).
* **Composer:** Required for installing PHP dependencies.
* **Pinecone Account:**

* An active account on [Pinecone](https://www.pinecone.io/).
* A **Pinecone API Key**.
* The **Environment** of your Pinecone project (e.g., `us-west1-gcp`).
* A **preexisting Pinecone index**. The extension does not create the index automatically—you must create it manually in the Pinecone console with the correct dimension matching your embedding model.
* **Embedding Service (Vector Generation):**

* The extension needs a mechanism to convert text into embeddings. Currently, the `generate_embedding` logic in `core/vector_search_service.php` is a placeholder returning dummy vectors.
* **You must implement real embedding generation.** Common options:

1. **External API:** Use services like OpenAI (e.g., `text-embedding-ada-002`), Cohere, etc. This requires an additional API key and may incur costs.
2. **Self-hosted Microservice:** Deploy a small service (e.g., in Python with Flask/FastAPI using `sentence-transformers`) that the PHP extension can call.
3. **PHP Libraries (Limited):** Investigate PHP libraries capable of running embeddings locally (e.g., via ONNX Runtime if PHP support exists). This approach is often more complex and limited.
* The Pinecone index dimension must match the dimension of the vectors produced by your chosen embedding model (e.g., 1536 for OpenAI’s `text-embedding-ada-002`).
* **PHP Extensions:**

* `curl` (usually enabled by default, for API communication).
* `json` (usually enabled by default).

## 4. Installation

1. **Download/Clone the Extension:**

* Obtain the extension files. If it’s a Git repository, clone it.

2. **Upload Files to the Server:**

* Create the directory structure `ext/acme/vectorsearch/` in your phpBB installation if it doesn’t exist.
* Copy all extension files into `ext/acme/vectorsearch/`.

3. **Install Dependencies with Composer:**

* In your phpBB root via CLI, run:

```bash
composer require probots-io/pinecone-php:"^0.1.0"
```
* Alternatively, if the extension includes a `vendor` directory, this step may be unnecessary. The recommended practice is to manage dependencies via the root Composer or within the extension itself. If phpBB isn’t set up for centralized extension dependencies, you might need:

```bash
cd ext/acme/vectorsearch/
composer install --no-dev
```

4. **Enable the Extension in the ACP:**

* In the phpBB Administration Control Panel, go to **Customize**.
* Under **Manage Extensions**, locate “Vector Search Extension” among disabled extensions.
* Click **Enable**.

## 5. Configuration

After enabling, configure the extension:

1. **Access the Extension Settings:**

* In the ACP, locate the “Vector Search Configuration” section (as defined in `acme_vectorsearch_info.php`), which may be under `ACP_CAT_DOT_MODS` or a similar category.

2. **Settings Mode:**

* **Pinecone API Key:** Enter your Pinecone API key.
* **Pinecone Environment:** Enter your Pinecone environment (e.g., `us-east-1-aws`, `europe-west4-gcp`).
* **Pinecone Index Name:** Enter the exact index name you created in your Pinecone dashboard.
* **Pinecone Index Dimension:** Enter the numeric dimension of your embedding vectors (e.g., `1536` for OpenAI’s `text-embedding-ada-002`). This must match your Pinecone index configuration.
* **Embedding Service API Key (Optional):** If using an external embedding service, enter its API key.
* **Embedding Model Name (Optional):** Specify the external model name (e.g., `text-embedding-ada-002`).
* **Forums to Index:** Select which forums’ posts you want to index (multiple selection allowed).
* Save your settings.

3. **Manual Pinecone Index Creation:**

* Remember, the extension **does not** auto-create the Pinecone index—you must create it manually in the Pinecone dashboard.
* Ensure the index’s dimension matches your embedding model output.
* Choose an appropriate metric (e.g., `cosine`) for text embeddings.

## 6. Usage

### 6.1. Content Indexing

1. **Go to Indexing Management:**

* In the ACP under the “Vector Search” settings, switch to the “Indexing Management” view.
2. **Start Indexing:**

* You’ll see your Pinecone connection status and a **Start Full Re-index** button.
* Click it to begin. The extension will iterate through selected forum posts, generate embeddings (via your implementation), and upsert them to Pinecone.
* Depending on content volume and embedding/API speed, this process may take time.
* The ACP will display completion status and any batch-level errors.
3. **Monitor and Logs:**

* Check phpBB’s admin logs for detailed messages on indexing, including errors or skipped posts.

### 6.2. Vector Search (End User)

* The vector search option will appear in phpBB’s search interface if `main_listener.php` correctly handles events like `core.search_register_search_types` and executes queries.
* Users selecting “Vector” or “Semantic” search will have their query embedded and Pinecone queried for the most similar posts.

## 7. Troubleshooting

* **Missing Configuration Error:** If logs report “Pinecone configuration missing,” verify all fields (API Key, Environment, Index Name, Dimension) are correctly saved in the ACP.
* **Pinecone Errors:** For connectivity/upsert/query errors, check:

* Your API key and environment.
* That the index name exists in Pinecone.
* That index dimension matches your embeddings.
* Pinecone service status.
* **Embedding Failures:**

* Ensure your `generate_embedding()` in `core/vector_search_service.php` is correctly implemented and can reach the embedding service or local model.
* Verify any external API keys if used.
* **No Indexed Posts or No Search Results:**

* Confirm forums are selected for indexing.
* Check that posts contain text after being cleaned.
* Ensure indexing completed without excessive errors (review logs).
* Verify your search listener and `vector_search_service.php` implementations are correct.

## 8. Uninstallation

1. **Disable the Extension:**

* In the ACP under **Customize → Manage Extensions**, find “Vector Search Extension” and click **Disable**.
2. **Purge Extension Data:**

* After disabling, choose **Purge Data**. This runs the extension’s `purge_step` (if implemented) to remove settings and stored data from the phpBB database.
* **Note:** This does not delete your Pinecone index or data—manage that in your Pinecone account separately.
3. **Remove Files:**

* Delete the `ext/acme/vectorsearch/` directory from your server.

## 9. Development Notes (Current Implementation)

* **Embedding Generation:** The `generate_embedding()` in `core/vector_search_service.php` is a placeholder. Replace it with real embedding logic, as searches won’t work correctly otherwise.
* **Pinecone Client:** Uses `probots-io/pinecone-php`. Index creation must be done in the Pinecone dashboard.
* **Text Processing:** `text_processor.php` provides basic BBCode/HTML cleaning. You may need more robust handling.
* **phpBB Search Integration:** Ensure your event listener (`main_listener.php`) correctly registers and handles the new search type.
* **Error Handling & Logging:** Basic logging is in place—consider enhancing error granularity and admin feedback.
* **Indexing Scalability:** Current batch indexing in the ACP is synchronous. For very large forums, consider offloading to background jobs or a queue to avoid ACP timeouts.

## 10. License

This extension is distributed under the **GPL-2.0-only** license, as specified in `composer.json`.
Extension download: https://github.com/FiveTechSoft/FWH_too ... ension.zip

Image

Image

Statistics: Posted by Antonio Linares — Tue May 06, 2025 2:44 pm



Viewing all articles
Browse latest Browse all 2717